User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Hybrid DPE Installation Guide

For hybrid deployment, Data Processing Engine (DPE) can be installed on a Linux host using the Ansible installation packages.

This method of installation is best suited for the following use cases:

  • On-premise deployments

  • Proof-of-concept test deployments

  • Platform as a service (PaaS) deployments

As outlined in the Infrastructure Preparation, DPE in hybrid deployment scenarios can currently only be installed on Linux hosts.

The following guide provides installation instructions for both RHEL and Ubuntu OS and it covers the following topics:

  • Installing Ansible and its dependencies.

  • Installing the Ataccama client-side components (DPE and Fluent Bit) and dependencies (Java).

    The recommended version of Java is 11LTS or higher.
  • Testing the connection between DPE and the Ataccama ONE PaaS.

  • Starting to work with the Ataccama ONE Platform.

Prerequisites

Before you continue with the installation, make sure the following resources have been prepared:

  • Linux host instances with a working Internet connection are set up and running.

  • You have the necessary access rights and permissions to install Ansible and its prerequisites.

  • DPE license (license.plf file, distributed by Ataccama).

    For more information about the infrastructure requirements, see Infrastructure Preparation.

If you encounter an issue during installation, refer to the guide Troubleshooting Hybrid Deployment, which describes some of the more common exceptions and possible solutions.

Download the installation package

The Ataccama ONE Hybrid Deployment package is stored on the Amazon S3 repository.

The installation structure that you create in this section is used throughout this guide, both for installing Ansible and the Ataccama client-side components.
  1. Download the Hybrid DPE Deployment Package (Linux) from our Downloads page.

  2. Unpack the downloaded installation package to the following location: ~/one/. To do this, run the following commands:

    Make sure to replace the version number in the command accordingly.
    Extract the installation package
    mkdir -p ~/one
    unzip ansible-hybrid-<version-number>.zip -d ~/one

Install Ansible

Ansible, an open-source tool for server orchestration, is a Python application that requires specific versions of Python libraries. To make sure that there are no conflicts with other installed modules, it is necessary to create a separate virtual environment using the Python venv module. As the last step in the installation procedure, the ansible-galaxy tool is used to download a number of community-maintained Ansible roles (libraries).

If you are not familiar with Ansible, you can find more information in the following resources:

The required version of Ansible is 2.11.
  1. Start by installing Python 3 pip tool, the Python package manager. Depending on your OS, execute the following commands:

    • Red Hat Enterprise Linux (RHEL)

      Install Python 3 pip
      $ sudo dnf makecache
      $ sudo dnf install unzip git python3-pip curl
    • Ubuntu

      Install Python 3 pip
      $ sudo apt update
      $ sudo apt install unzip git python3-pip python3-venv curl
  2. Create and activate a virtual environment for Ansible installation. This helps prevent potential version conflicts between the system modules and the modules necessary for setting up hybrid DPE. Execute the following commands:

    Set up virtual environment
    python3 -m venv ~/venv
    . ~/venv/bin/activate

    Your command prompt now starts with (venv), which indicates that all Python processes now use the modules from the virtual environment. This also means that you need to have your virtual environment active anytime you want to work with this Ansible repository.

    To start the virtual environment, use the following command:

    Start virtual environment
    . ~/venv/bin/activate
  3. Install Ansible and the remaining dependencies within your virtual environment. To do so, execute the following commands:

    Install Ansible and dependencies
    cd ~/one/ansible
    pip install --upgrade 'pip>=20.3'
    pip install wheel 'ansible<4.6'
    pip install -r requirements-pip.txt

    Once the installation is completed, the last line of the expected output is as follows:

    Successfully installed ansible-4.5.0 ansible-core-2.11.12 resolvelib-0.5.4
    Proceed to the following sections only if all previous steps have been successfully completed.

Configure Ansible

Ansible must be correctly configured before it can be used, which is why we provide a basic configuration file for Ansible (ansible-example.cfg). One of the key options in the configuration is log_path. This way, Ansible tracks all playbook runs in a single log file (in this case, ~/.ansible/ansible.log).

  1. To make sure the configuration file is always applied, copy it to your home directory using the following command. Make sure to adapt the path to the ansible-example.cfg file depending on your current working directory.

    Copy Ansible configuration file to home directory
    cp ~/one/ansible/ansible-example.cfg ~/.ansible.cfg
  2. Verify that the configuration file has been correctly set up. To do so, check the Ansible version using the following command:

    Verify Ansible version
    ansible --version

    The expected output is as follows. The exact paths might vary depending on your environment.

    Verify Ansible version console output
    ansible [core 2.11.12]
      config file = None
      configured module search path = ['/home/<user>/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
      ansible python module location = /home/<user>/venv/lib/python3.11/site-packages/ansible
      ansible collection location = /home/<user>/.ansible/collections:/usr/share/ansible/collections
      executable location = /home/<user>/venv/bin/ansible
      python version = 3.11.2 (main, Feb 16 2023, 02:55:59) [Clang 14.0.0 (clang-1400.0.29.202)]
      jinja version = 3.1.2
      libyaml = False

    Once you have a working Ansible installation, proceed with installing external dependency roles.

Install dependency roles

Before running any playbook, you need to install dependency roles (libraries) from Ansible Galaxy, a repository for Ansible roles. Which roles are installed is defined in the requirements.yml file.

There are two such files in the downloaded installation package (one/ansible/collections and one/ansible/roles).

To do this, execute the following commands:

Install dependency roles
cd ~/one/ansible
ansible-galaxy install -r collections/requirements.yml
ansible-galaxy install -r roles/requirements.yml

Verify Ansible installation

In addition to checking the Ansible version (ansible --version, described in more details in Configure Ansible), you can use Ansible to run a true command on your machine. This verifies whether Ansible is minimally functional.

To test this, execute the following commands:

Verify Ansible installation
cd ~/one/ansible
ansible-playbook verify-local.yml

The expected output is as follows. While the warnings about the missing inventory and hosts list are expected, the play recap must not contain any failed steps.

Verify Ansible installation console output
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available.
Note that the implicit localhost does not match 'all'

PLAY [verify Ansible is usable locally] *******************************************************************************************************************************************************************

TASK [Gathering Facts gather_subset=['all'], gather_timeout=10] *******************************************************************************************************************************************
ok: [127.0.0.1]

TASK [run dummy command _raw_params=true] *****************************************************************************************************************************************************************
changed: [127.0.0.1]

TASK [give result msg=Minimal functionality confirmed!] ***************************************************************************************************************************************************
ok: [127.0.0.1] => {
    "msg": "Minimal functionality confirmed!"
}

PLAY RECAP ************************************************************************************************************************************************************************************************
127.0.0.1                  : ok=3    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Install client-side components

Once you have installed Ansible, you can proceed with installing DPE.

Make sure your Internet connection is running for the duration of this procedure.

Build the Ansible inventory

An inventory is a list of servers that are provisioned through Ansible. Building an inventory includes defining all the necessary hosts and variables in a dedicated folder structure.

To prepare an inventory for hybrid DPE installation:

  1. Create a copy of the sample inventory (one/ansible/inventories/example_hybrid). The inventory should be stored in the same parent folder (inventories) under a different name, for example, customer.

    To do this, you can use the following command:

    Create a new inventory
    cd ~/one/ansible/inventories
    cp -r example_hybrid/. <new_inventory>
  2. Add your provisioned hosts (managed nodes) to the hosts file. This is done by modifying the hosts.yml file located in the inventory created in step 1 (one/ansible/inventories/<inventory>). The file has the following structure:

    Structure of hosts.yml file
    all:
      children:
        processing:
          hosts:
            <dpe-server-hostname>:
            <dpe-2-server-hostname>:    optional, but at least one server in the processing group is required
              key1: value1              for any server (host), one can optionally define variables, which are host-specific (e.g. license)
            ...

    Replace the hostname placeholders (such as <dpe-server-hostname>) with the correct hostnames for all DPE servers that you want to work with. The processing group can have one or multiple hosts and it is also possible to add host-specific variables. For example, to specify the license that a DPE instance should use, the variable needs to be declared as follows:

    ...
            <dpe-server-hostname>:
              license: /path/to/license.plf
  3. Provide the necessary variables. Variables are declared in the vars.yml file, located in one/ansible/inventories/<inventory>/group_vars/all. This is where you define the various endpoints and secrets that are used to connect to the Ataccama ONE PaaS environment.

    1. Some variables are specific for each environment and therefore must be set. The following list contains all the variables you must update accordingly:

      • dpe_license_file: The path to the DPE license file on the Ansible controller. You can define it in the group variables (vars.yml) or, when provisioning multiple DPEs, for each host separately (hosts.yml, see step 2).

      • keycloak_url: The PaaS Keycloak authentication server endpoint: https://[customer].[env].ataccama.online/auth/.

      • dpe_token_client, dpe_admin_client: The credentials for the DPE token and admin clients.

        dpe_token_client:
          client: dpe-token-client
          secret: dpe-token-client-s3cret
        dpe_admin_client:
          client: dpe-admin-client
          secret: dpe-admin-client-s3cret
      • minio_url: The PaaS ONE Object Storage (MinIO) endpoint: https://minio.[customer].[env].ataccama.online.

      • minio: MinIO credentials (access key and secret key).

        minio:
          access_key: minio
          secret_key: minio-secret
      • dpe_jwt_key: The private key of the on-premise DPE. Provided by Ataccama together with other necessary credentials.

      • dpm: The gRPC host where DPM is available. The gRPC and HTTP port numbers should remain unchanged.

        dpm:
          host: dpm-grpc.[customer].[env].ataccama.online
          grpc_port: 443
          http_port: 8031
      • dpm_jwt_key: The public key of the DPM module. Provided by Ataccama together with other necessary credentials.

        dpm_jwt_key:
          name: dpm-prod-key
           jwt key content
          content: {kty":"EC",crv":"P-256","kid":"vcjAli5Xm_pvtE8ItBkd3aT_FWi_23WieMf5f-lppBI","x":"Hbs53V5zC-1DjNf5RtJ1bNHlxvzM5jST7J1ADVePV9g","y":"4pVfzrF7FMHt_Xx2FgLauvLZuJqbpL9crdOxvTXWb64","alg":"ES256"}
           jwt key fingerprint
          fp: vcjAli5Xm_pvtE8ItBkd3aT_FWi_23WieMf5f-lppBI
    2. Define any additional configuration for the DPE module under the variable dpe_additional_config, depending on your requirements. The following example shows how to enable communication over bidirectional gRPC stream between DPE and DPM. For context, when this option is set, the two modules require only one open connection to communicate.

      dpe_additional_config: |
         Additional DPM connection properties - enable bidirectional streaming via TLS and trust all certificates
        ataccama.client.connection.dpm.grpc.tls.enabled=true
        ataccama.client.connection.dpm.grpc.tls.trust-all=true
        ataccama.one.dpe.service.dpm.connection.mode=FIREWALL_FRIENDLY_REGISTRATION
        ataccama.one.dpe.label=dpe-hybrid
         Additional DPE data sources / drivers configuration
        plugin.jdbcdatasource.ataccama.one.driver.redshift.disabled=false

      Regarding ataccama.one.dpe.label, it is important you assign a unique value to the DPE configuration label to ensure proper functioning and avoid conflicts with remote configurations. The label serves as the DPE’s unique identifier.

      The default value is dpe. For engines with different configurations, each must have a distinct label value.

      By default, the variables file is also preconfigured to download several JDBC drivers for some of commonly used data sources. The list of downloaded drivers is provided in the dpe_drivers variable using the following syntax:

      dpe_drivers:
        - name: redshift
          remote_url: https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/2.0.0.1/redshift-jdbc42-2.0.0.1.jar
      For a comprehensive list of all available properties for the DPE module, see DPE Configuration.

Install DPE and its dependencies

After you have prepared the Ansible inventory for DPE, you can continue with installing DPE. This is done by running the Ansible playbook hybrid-dpe.yml, located in /one/ansible, as described in the following steps.

Ansible uses playbooks to configure and orchestrate tasks needed for deploying complex applications. In addition to installing DPE, the hybrid-dpe.yml playbook also includes installing Java and Fluent Bit, as well as a number of preinstallation checks.

To install DPE, users need to have SSH access to all the hosts with root privileges (for example, passwordless sudo).
  1. Navigate to the Ansible directory:

    cd ~/one/ansible
  2. If it is not already active, start the virtual environment (as explained in Install Ansible, step 3):

    . ~/venv/bin/activate
  3. Execute the following command:

    Install DPE
    ansible-playbook -i <path/to/inventory> -u <username> --private-key <path/to/private-key> -b hybrid-dpe.yml

    For example, if you used the paths provided in this guide and want to connect through the admin account, with the private key located in ~/.ssh/admin-private, the command should be updated as follows:

    Install DPE Example
    ansible-playbook -i inventories/customer -u admin --private-key ~/.ssh/admin-private -b hybrid-dpe.yml

    If the installation finished successfully, the expected output of the play recap is as follows. To verify that DPE can also communicate with the Ataccama ONE PaaS, make sure to go through sections Installation checks and Check DPE status as well.

    ...
    PLAY RECAP ************************************************************************************************************************************************************************************************
    dpe-1-server-hostname      : ok=86   changed=31   unreachable=0    failed=0    skipped=32   rescued=0    ignored=2
    dpe-2-server-hostname      : ok=86   changed=31   unreachable=0    failed=0    skipped=32   rescued=0    ignored=2

Installation checks

As mentioned in the previous section, executing the hybrid-dpe.yml playbook initiates several verifications whose main goal is to test the connection to the Ataccama ONE PaaS before the installation starts as well as the availability of DPE once the installation finishes. Pre-installation connectivity checks are done for the following components:

  • Data Processing Module (DPM): dpm-grpc.[customer].[env].ataccama.online:443

  • Keycloak: https://[customer].[env].ataccama.online/auth

  • ONE Object Storage (MinIO): https://minio.[customer].[env].ataccama.online

If all the checks have been successfully completed, the expected output is as follows. The example provided here is based on an installation with two DPE nodes.

Pre-installation checks
TASK [Hybrid preinstallation checks] *************************************************************************************************************************************************

TASK [system : Check connectivity to Keycloak] ***************************************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]

TASK [system : Check connectivity to MinIO] ******************************************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]

TASK [system : Check connectivity to dpm grpc endpoint] ******************************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]

In case any of these checks fail, the Ataccama PaaS environment cannot be reached from the DPE nodes, which typically indicates a firewall or networking issue. Before proceeding further, investigate the issue and make sure the connection is working. For more information, see Infrastructure Preparation.

A post-installation check included in the Ansible play verifies that DPE is running without issues. If that is the case, the following output is expected:

Post-installation check successful
TASK [dpe : Wait for dpe to come up (check monitoring endpoint ready)] ***************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]

In case this task fails, more information about the issue can be found in the output. The following example illustrates the error that occurs when DPE is not able to reach DPM (the expected response code from the monitoring endpoint is 200 OK, however, 503 Service Unavailable was received).

Post-installation check failed
fatal: [dpe-1-server-hostname]: FAILED! => {"attempts": 30, .... "json": {"components": {"db": {"details": {"database": "H2", "validationQuery": "isValid()"}, "status": "UP"}, "diskSpace": {"details": {"exists": true, "free": 23339401216, "threshold": 10485760,
"total": 31036686336}, "status": "UP"}, "dpm": {"details": {"state": "CONNECTING"}, "status": "DOWN"}, "livenessState": {"status": "UP"}, "ping": {"status": "UP"}, "readinessState": {"status": "UP"}}, "groups": ["liveness", "readiness"], "status": "DOWN"}, "msg": "Status code was 503 and not [200]: HTTP Error 503: ",
"redirected": false, "status": 503, "transfer_encoding": "chunked", "url": "http://dpe-1-server-hostname:8034/actuator/health", "x_correlation_id": "8524f5"}

Post-installation steps

Check DPE status

Hybrid DPEs send liveness and readiness checks to the Ataccama ONE PaaS DPM every few seconds. This serves as the monitoring and availability alert for PaaS users. To make sure that DPE has successfully registered with DPM and check its status, do the following:

  1. Navigate to the DPM Admin Console, available at https://dpm.[customer].[env].ataccama.online.

  2. On the Engines tab, verify that all the DPE engines that you have set up are in the READY status.

    DPM Admin Console - Engines

Get started with Ataccama ONE

After you have completed all the steps described in this guide, you are now ready to start working with your data. Your data sources can be accessed through the Ataccama ONE PaaS web application, available at \https://[customer].[env].ataccama.online.

To create a new data source, navigate to Knowledge Catalog > Sources and select Create. When setting up the data source connection, all the data sources configured for your hybrid deployment are included in the list.

Create source
For more detailed instructions about connecting to data sources, see Connect to a Source.

Was this page useful?