Hybrid DPE Installation Guide
For hybrid deployment, Data Processing Engine (DPE) can be installed on a Linux host using the Ansible installation packages.
This method of installation is best suited for the following use cases:
-
On-premise deployments
-
Proof-of-concept test deployments
-
Platform as a service (PaaS) deployments
As outlined in the Infrastructure Preparation, DPE in hybrid deployment scenarios can currently only be installed on Linux hosts.
The following guide provides installation instructions for both RHEL and Ubuntu OS and it covers the following topics:
-
Installing Ansible and its dependencies.
-
Installing the Ataccama client-side components (DPE and Fluent Bit) and dependencies (Java).
The recommended version of Java is 11LTS or higher. -
Testing the connection between DPE and the Ataccama ONE PaaS.
-
Starting to work with the Ataccama ONE Platform.
Prerequisites
Before you continue with the installation, make sure the following resources have been prepared:
-
Linux host instances with a working Internet connection are set up and running.
-
You have the necessary access rights and permissions to install Ansible and its prerequisites.
-
DPE license (
license.plf
file, distributed by Ataccama).For more information about the infrastructure requirements, see Infrastructure Preparation.
If you encounter an issue during installation, refer to the guide Troubleshooting Hybrid Deployment, which describes some of the more common exceptions and possible solutions. |
Download the installation package
The Ataccama ONE Hybrid Deployment package is stored on the Amazon S3 repository.
The installation structure that you create in this section is used throughout this guide, both for installing Ansible and the Ataccama client-side components. |
-
Download the Hybrid DPE Deployment Package (Linux) from our Downloads page.
-
Unpack the downloaded installation package to the following location:
~/one/
. To do this, run the following commands:Make sure to replace the version number in the command accordingly. Extract the installation packagemkdir -p ~/one unzip ansible-hybrid-<version-number>.zip -d ~/one
Install Ansible
Ansible, an open-source tool for server orchestration, is a Python application that requires specific versions of Python libraries.
To make sure that there are no conflicts with other installed modules, it is necessary to create a separate virtual environment using the Python venv
module.
As the last step in the installation procedure, the ansible-galaxy
tool is used to download a number of community-maintained Ansible roles (libraries).
If you are not familiar with Ansible, you can find more information in the following resources: |
The required version of Ansible is 2.11. |
-
Start by installing Python 3 pip tool, the Python package manager. Depending on your OS, execute the following commands:
-
Red Hat Enterprise Linux (RHEL)
Install Python 3 pip$ sudo dnf makecache $ sudo dnf install unzip git python3-pip curl
-
Ubuntu
Install Python 3 pip$ sudo apt update $ sudo apt install unzip git python3-pip python3-venv curl
-
-
Create and activate a virtual environment for Ansible installation. This helps prevent potential version conflicts between the system modules and the modules necessary for setting up hybrid DPE. Execute the following commands:
Set up virtual environmentpython3 -m venv ~/venv . ~/venv/bin/activate
Your command prompt now starts with
(venv)
, which indicates that all Python processes now use the modules from the virtual environment. This also means that you need to have your virtual environment active anytime you want to work with this Ansible repository.To start the virtual environment, use the following command:
Start virtual environment. ~/venv/bin/activate
-
Install Ansible and the remaining dependencies within your virtual environment. To do so, execute the following commands:
Install Ansible and dependenciescd ~/one/ansible pip install --upgrade 'pip>=20.3' pip install wheel 'ansible<4.6' pip install -r requirements-pip.txt
Proceed to the following sections only if all previous steps have been successfully completed.
Configure Ansible
Ansible must be correctly configured before it can be used, which is why we provide a basic configuration file for Ansible (ansible-example.cfg
).
One of the key options in the configuration is log_path
.
This way, Ansible tracks all playbook runs in a single log file (in this case, ~/.ansible/ansible.log
).
-
To make sure the configuration file is always applied, copy it to your home directory using the following command. Make sure to adapt the path to the
ansible-example.cfg
file depending on your current working directory.Copy Ansible configuration file to home directorycp ~/one/ansible/ansible-example.cfg ~/.ansible.cfg
-
Verify that the configuration file has been correctly set up. To do so, check the Ansible version using the following command:
Verify Ansible versionansible --version
The expected output is as follows. The exact paths might vary depending on your environment.
Verify Ansible version console outputansible [core 2.11.5] config file = None configured module search path = ['/home/<user>/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /home/<user>/lib/python3.9/site-packages/ansible ansible collection location = /home/<user>/.ansible/collections:/usr/share/ansible/collections executable location = /home/<user>/bin/ansible python version = 3.9.7 (default, Sep 28 2021, 18:41:28) [GCC 10.2.1 20210110] jinja version = 2.11.3 libyaml = True
Once you have a working Ansible installation, proceed with installing external dependency roles.
Install dependency roles
Before running any playbook, you need to install dependency roles (libraries) from Ansible Galaxy, a repository for Ansible roles.
Which roles are installed is defined in the requirements.yml
file.
There are two such files in the downloaded installation package (one/ansible/collections
and one/ansible/roles
).
To do this, execute the following commands:
cd ~/one/ansible
ansible-galaxy install -r collections/requirements.yml
ansible-galaxy install -r roles/requirements.yml
Verify Ansible installation
In addition to checking the Ansible version (ansible --version
, described in more details in Configure Ansible), you can use Ansible to run a true
command on your machine.
This verifies whether Ansible is minimally functional.
To test this, execute the following commands:
cd ~/one/ansible
ansible-playbook verify-local.yml
The expected output is as follows. While the warnings about the missing inventory and hosts list are expected, the play recap must not contain any failed steps.
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available.
Note that the implicit localhost does not match 'all'
PLAY [verify Ansible is usable locally] *******************************************************************************************************************************************************************
TASK [Gathering Facts gather_subset=['all'], gather_timeout=10] *******************************************************************************************************************************************
ok: [127.0.0.1]
TASK [run dummy command _raw_params=true] *****************************************************************************************************************************************************************
changed: [127.0.0.1]
TASK [give result msg=Minimal functionality confirmed!] ***************************************************************************************************************************************************
ok: [127.0.0.1] => {
"msg": "Minimal functionality confirmed!"
}
PLAY RECAP ************************************************************************************************************************************************************************************************
127.0.0.1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Install client-side components
Once you have installed Ansible, you can proceed with installing DPE.
Make sure your Internet connection is running for the duration of this procedure. |
Build the Ansible inventory
An inventory is a list of servers that are provisioned through Ansible. Building an inventory includes defining all the necessary hosts and variables in a dedicated folder structure.
To prepare an inventory for hybrid DPE installation:
-
Create a copy of the sample inventory (
one/ansible/inventories/example_hybrid
). The inventory should be stored in the same parent folder (inventories
) under a different name, for example,customer
.To do this, you can use the following command:
Create a new inventorycd ~/one/ansible/inventories cp -r example_hybrid/. <new_inventory>
-
Add your provisioned hosts (managed nodes) to the hosts file. This is done by modifying the
hosts.yml
file located in the inventory created in step 1 (one/ansible/inventories/<inventory>
). The file has the following structure:Structure of hosts.yml fileall: children: processing: hosts: <dpe-server-hostname>: <dpe-2-server-hostname>: optional, but at least one server in the processing group is required key1: value1 for any server (host), one can optionally define variables, which are host-specific (e.g. license) ...
Replace the hostname placeholders (such as
<dpe-server-hostname>
) with the correct hostnames for all DPE servers that you want to work with. The processing group can have one or multiple hosts and it is also possible to add host-specific variables. For example, to specify the license that a DPE instance should use, the variable needs to be declared as follows:... <dpe-server-hostname>: license: /path/to/license.plf
-
Provide the necessary variables. Variables are declared in the
vars.yml
file, located inone/ansible/inventories/<inventory>/group_vars/all
. This is where you define the various endpoints and secrets that are used to connect to the Ataccama ONE PaaS environment.-
Some variables are specific for each environment and therefore must be set. The following list contains all the variables you must update accordingly:
-
dpe_license_file
: The path to the DPE license file on the Ansible controller. You can define it in the group variables (vars.yml
) or, when provisioning multiple DPEs, for each host separately (hosts.yml
, see step 2). -
keycloak_url
: The PaaS Keycloak authentication server endpoint:https://[customer].[env].ataccama.online/auth/
. -
dpe_token_client
,dpe_admin_client
: The credentials for the DPE token and admin clients.dpe_token_client: client: dpe-token-client secret: dpe-token-client-s3cret dpe_admin_client: client: dpe-admin-client secret: dpe-admin-client-s3cret
-
minio_url
: The PaaS ONE Object Storage (MinIO) endpoint:https://minio.[customer].[env].ataccama.online
. -
minio
: MinIO credentials (access key and secret key).minio: access_key: minio secret_key: minio-secret
-
dpe_jwt_key
: The private key of the on-premise DPE. Provided by Ataccama together with other necessary credentials. -
dpm
: The gRPC host where DPM is available. The gRPC and HTTP port numbers should remain unchanged.dpm: host: dpm-grpc.[customer].[env].ataccama.online grpc_port: 443 http_port: 8031
-
dpm_jwt_key
: The public key of the DPM module. Provided by Ataccama together with other necessary credentials.dpm_jwt_key: name: dpm-prod-key jwt key content content: {kty":"EC",crv":"P-256","kid":"vcjAli5Xm_pvtE8ItBkd3aT_FWi_23WieMf5f-lppBI","x":"Hbs53V5zC-1DjNf5RtJ1bNHlxvzM5jST7J1ADVePV9g","y":"4pVfzrF7FMHt_Xx2FgLauvLZuJqbpL9crdOxvTXWb64","alg":"ES256"} jwt key fingerprint fp: vcjAli5Xm_pvtE8ItBkd3aT_FWi_23WieMf5f-lppBI
-
-
Define any additional configuration for the DPE module under the variable
dpe_additional_config
, depending on your requirements. The following example shows how to enable communication over bidirectional gRPC stream between DPE and DPM. For context, when this option is set, the two modules require only one open connection to communicate.dpe_additional_config: | Additional DPM connection properties - enable bidirectional streaming via TLS and trust all certificates ataccama.client.connection.dpm.grpc.tls.enabled=true ataccama.client.connection.dpm.grpc.tls.trust-all=true ataccama.one.dpe.service.dpm.connection.mode=FIREWALL_FRIENDLY_REGISTRATION Additional DPE data sources / drivers configuration plugin.jdbcdatasource.ataccama.one.driver.redshift.disabled=false
By default, the variables file is also preconfigured to download several JDBC drivers for some of commonly used data sources. The list of downloaded drivers is provided in the
dpe_drivers
variable using the following syntax:dpe_drivers: - name: redshift remote_url: https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/2.0.0.1/redshift-jdbc42-2.0.0.1.jar
For a comprehensive list of all available properties for the DPE module, see DPE Configuration.
-
Install DPE and its dependencies
After you have prepared the Ansible inventory for DPE, you can continue with installing DPE.
This is done by running the Ansible playbook hybrid-dpe.yml
, located in /one/ansible
, as described in the following steps.
Ansible uses playbooks to configure and orchestrate tasks needed for deploying complex applications.
In addition to installing DPE, the hybrid-dpe.yml
playbook also includes installing Java and Fluent Bit, as well as a number of preinstallation checks.
To install DPE, users need to have SSH access to all the hosts with root privileges (for example, passwordless sudo ).
|
-
Navigate to the Ansible directory:
cd ~/one/ansible
-
If it is not already active, start the virtual environment (as explained in Install Ansible, step 3):
. ~/venv/bin/activate
-
Execute the following command:
Install DPEansible-playbook -i <path/to/inventory> -u <username> --private-key <path/to/private-key> -b hybrid-dpe.yml
For example, if you used the paths provided in this guide and want to connect through the
admin
account, with the private key located in~/.ssh/admin-private
, the command should be updated as follows:Install DPE Exampleansible-playbook -i inventories/customer -u admin --private-key ~/.ssh/admin-private -b hybrid-dpe.yml
If the installation finished successfully, the expected output of the play recap is as follows. To verify that DPE can also communicate with the Ataccama ONE PaaS, make sure to go through sections Installation checks and Check DPE status as well.
... PLAY RECAP ************************************************************************************************************************************************************************************************ dpe-1-server-hostname : ok=86 changed=31 unreachable=0 failed=0 skipped=32 rescued=0 ignored=2 dpe-2-server-hostname : ok=86 changed=31 unreachable=0 failed=0 skipped=32 rescued=0 ignored=2
Installation checks
As mentioned in the previous section, executing the hybrid-dpe.yml
playbook initiates several verifications whose main goal is to test the connection to the Ataccama ONE PaaS before the installation starts as well as the availability of DPE once the installation finishes.
Pre-installation connectivity checks are done for the following components:
-
Data Processing Module (DPM):
dpm-grpc.[customer].[env].ataccama.online:443
-
Keycloak:
https://[customer].[env].ataccama.online/auth
-
ONE Object Storage (MinIO):
https://minio.[customer].[env].ataccama.online
If all the checks have been successfully completed, the expected output is as follows. The example provided here is based on an installation with two DPE nodes.
TASK [Hybrid preinstallation checks] *************************************************************************************************************************************************
TASK [system : Check connectivity to Keycloak] ***************************************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]
TASK [system : Check connectivity to MinIO] ******************************************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]
TASK [system : Check connectivity to dpm grpc endpoint] ******************************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]
In case any of these checks fail, the Ataccama PaaS environment cannot be reached from the DPE nodes, which typically indicates a firewall or networking issue. Before proceeding further, investigate the issue and make sure the connection is working. For more information, see Infrastructure Preparation.
A post-installation check included in the Ansible play verifies that DPE is running without issues. If that is the case, the following output is expected:
TASK [dpe : Wait for dpe to come up (check monitoring endpoint ready)] ***************************************************************************************************************
ok: [dpe-1-server-hostname]
ok: [dpe-2-server-hostname]
In case this task fails, more information about the issue can be found in the output.
The following example illustrates the error that occurs when DPE is not able to reach DPM (the expected response code from the monitoring endpoint is 200 OK
, however, 503 Service Unavailable
was received).
fatal: [dpe-1-server-hostname]: FAILED! => {"attempts": 30, .... "json": {"components": {"db": {"details": {"database": "H2", "validationQuery": "isValid()"}, "status": "UP"}, "diskSpace": {"details": {"exists": true, "free": 23339401216, "threshold": 10485760,
"total": 31036686336}, "status": "UP"}, "dpm": {"details": {"state": "CONNECTING"}, "status": "DOWN"}, "livenessState": {"status": "UP"}, "ping": {"status": "UP"}, "readinessState": {"status": "UP"}}, "groups": ["liveness", "readiness"], "status": "DOWN"}, "msg": "Status code was 503 and not [200]: HTTP Error 503: ",
"redirected": false, "status": 503, "transfer_encoding": "chunked", "url": "http://dpe-1-server-hostname:8034/actuator/health", "x_correlation_id": "8524f5"}
Post-installation steps
Check DPE status
Hybrid DPEs send liveness and readiness checks to the Ataccama ONE PaaS DPM every few seconds. This serves as the monitoring and availability alert for PaaS users. To make sure that DPE has successfully registered with DPM and check its status, do the following:
-
Navigate to the DPM Admin Console, available at
https://dpm.[customer].[env].ataccama.online
. -
On the Engines tab, verify that all the DPE engines that you have set up are in the
READY
status.
Get started with Ataccama ONE
After you have completed all the steps described in this guide, you are now ready to start working with your data.
Your data sources can be accessed through the Ataccama ONE PaaS web application, available at \https://[customer].[env].ataccama.online
.
To create a new data source, navigate to Knowledge Catalog > Sources and select Create. When setting up the data source connection, all the data sources configured for your hybrid deployment are included in the list.
For more detailed instructions about connecting to data sources, see Connect to a Source. |
Was this page useful?