User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Hybrid Deployment Architecture

Ataccama ONE PaaS is designed to provide secure, efficient, and reliable service in various scenarios depending on the location of customer data sources and the customer infrastructure design. Therefore, its architecture combines a standardized, highly automated core service with flexible client-side components that are easy to scale.

Thanks to this approach, the implementation process is simple while the customer data is always kept confidential and secure. Currently, Ataccama ONE PaaS supports hybrid integration with Data Processing Engine (DPE).

The following sections contain more information about possible deployment options and scenarios, integration modes (between customer and Ataccama PaaS environments and between DPE and data sources), and supported data sources.

For more information about preparing the customer infrastructure for hybrid deployment and installing DPE, see hybrid-deployment:hybrid-deployment-guide.adoc.

Supported deployment scenarios

The following sections describe which standardized scenarios and use cases Ataccama supports within the Ataccama ONE PaaS hybrid offering based on where the customer data sources are located.

Customer premises or data centers

The following diagram shows the hybrid deployment architecture in cases where customer data is located on the premises or in data centers.

Data located on premises or in data centers
Location of customer data

Data is located on the premises (or in data centers) that are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the Amazon Web Services (AWS) cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run on the customer premises, close to the source data.

    • The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the public Internet:

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Internet as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Customer cloud subscription - Amazon Web Services

The following diagram shows the hybrid deployment architecture in cases where customer data is located only in Amazon Web Services cloud.

Data located in a customer cloud subscription - AWS
Location of customer data

Data is located in one or more Amazon Web Services (AWS) cloud subscriptions that are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in the customer AWS VPC, close to the source data.

    • The DPE server requires x86-based computing power (virtual server) and should be used only for client-side components in order to provide the expected performance and scalability options.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link):

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Customer cloud subscription - Microsoft Azure

The following diagram shows the hybrid deployment architecture in cases where customer data is located only in Microsoft Azure cloud.

Data located in a customer cloud subscription - Azure
Location of customer data

Data is located in one or more Microsoft Azure cloud subscriptions that are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the Microsoft Azure cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Network (VNet).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in the customer Microsoft Azure VNet, close to the source data.

    • The DPE server requires x86-based computing power (virtual server) and should be used only for client-side components in order to provide the expected performance and scalability options.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the Azure Private Link service:

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Customer premises or data centers and customer cloud AWS subscription

Data located in on premises
Location of customer data

Data is located in one or more Amazon Web Services (AWS) cloud subscriptions as well as on premises or in a data center. Both locations are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer AWS VPC and on the customer premises.

    • The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link). All DPE servers located on the customer premises use the existing connection between the premises and the customer AWS VPC to communicate with the Ataccama ONE PaaS services.

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Customer premises or data centers and customer cloud Azure subscription

The following diagram shows the hybrid deployment architecture in cases where customer data is located both on premises or in a data center and in Microsoft Azure cloud.

Data located in on premises
Location of customer data

Data is located in one or more Microsoft Azure cloud subscriptions as well as on premises or in a data center. Both locations are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the Microsoft Azure cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Network (VNet).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer Microsoft Azure VNet and on the customer premises.

    • The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the Azure Private Link service. All DPE servers located on the customer premises use the existing connection between the premises and the customer Microsoft Azure VNet to communicate with the Ataccama ONE PaaS services.

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VNet as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Customer cloud AWS and Azure subscriptions

The following diagram shows the hybrid deployment architecture in cases where customer data is located both in Amazon Web Services and Microsoft Azure clouds.

Data located in a customer cloud subscription - AWS and Azure
Location of customer data

Data is located in one or more Amazon Web Services (AWS) and Microsoft Azure cloud subscriptions. Both locations are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer AWS VPC and the customer Microsoft Azure VNet.

    • The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link). All DPE servers located in the customer Microsoft Azure subscription use the existing connection between the customer Microsoft Azure VNet and the customer AWS VPC to communicate with the Ataccama ONE PaaS services.

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Customer premises or data centers, customer cloud AWS Subscription, and Snowflake

The following diagram shows the hybrid deployment architecture in cases where customer data is located on premises or in a data center, in Amazon Web Services cloud, and in Snowflake.

Data located in a customer cloud subscription - AWS
Location of customer data

Data is located in one or more Amazon Web Services (AWS) cloud subscriptions, on premises or in a data center, as well as in a third-party service, Snowflake. All locations are owned or controlled by the customer.

Location of Ataccama ONE PaaS systems
  • Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).

  • Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer AWS VPC and on the customer premises.

    • The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.

    • The DPE server for working with data stored in Snowflake is located in the AWS VPC that is connected to the Snowflake systems. If Snowflake is located in another environment, the DPE server needs to be running in the same environment to remain close to the source data.

      For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
Connectivity between customer environment and Ataccama ONE PaaS

The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link). All DPE servers located on the customer premises use the existing connection between the premises and the customer AWS VPC to communicate with the Ataccama ONE PaaS services.

  • All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).

  • All connections are secured through TLS encryption (TCP port 443).

  • Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.

  • A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.

User access to Ataccama ONE PaaS

Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.

Integrate customer and the Ataccama ONE PaaS core environments

The Ataccama ONE PaaS supports the following connection types when it comes to establishing communication between the client environment where the DPE server is located and the Ataccama ONE PaaS environment.

The Private Link service is available for both Amazon Web Services (AWS) and Microsoft Azure environments and is recommended for cases where some DPE servers are located in one of the two supported public clouds (AWS or Microsoft Azure). The Ataccama ONE PaaS environment provides the Private Link service while the customer environment represents the endpoint. Thanks to this design, the connection can be set up only from the customer environment towards the Ataccama ONE PaaS.

IP allow-listing

This option is best suited for scenarios where DPE servers need to communicate through the Internet (that is, in cases where DPE servers are located on the customer premises or in a data center). Allow-listing is then applied on the Ataccama ONE PaaS side so that access to the customer Ataccama ONE PaaS services is only granted to servers whose public IP addresses have been previously approved. Customers define which IP addresses are allow-listed while the list itself is managed by the Ataccama Support team according to customer requests.

Communication between DPE and DPM

DPE units communicate with the DPM module, located in PaaS, in client-server mode, with DPE acting as a client and DPM as a server. The role of DPM within Ataccama ONE is to receive user-submitted job requests and parameters for data processing and transmit them to DPE based on the engine’s capacities and capabilities. DPE then processes the data as instructed and returns the results of those operations. The communication between DPM and DPE relies on the following gRPC protocol-based stack:

DPM and DPE communication protocol stack

When the default configuration is used, DPE initiates a TCP connection with DPM in the Ataccama ONE PaaS. By default, the communication is encrypted using TLS and the TCP session is established on port 443. During an active TCP session, the modules actually communicate through gRPC.

DPM and DPE communication

Connection security

By default, the communication between DPE and DPM is secured on three levels:

  • Network layer (on the PaaS side): IP allow-listing, which allows connections only from the previously approved IP addresses.

  • Transport layer: TLS encryption.

    As mentioned previously, DPE starts a session as a client using the TLS protocol suite, with the TLS handshake sending a symmetrical encryption key. On the DPM side, an X.509 certificate signed by the public certificate authority (CA) Let’s Encrypt is installed.

  • Application layer: Module-to-module authentication using JSON Web Tokens (JWT).

    This is the standard authentication mode used between all Ataccama ONE modules, in which authentication keys are transferred through JWT. The secret key is provided to the DPE configuration manually.

DPE and DPM data flows

As described in the previous section, DPE and DPM function in client-server mode, with DPE assuming the client role and DPM the server role. DPM transfers the parameters necessary for completing data processing jobs to DPE.

After receiving the job information, DPE retrieves data from the relevant data source and starts processing. Once the job is finished, the information related to processing (runtime logs and engine processing events), as well as aggregated DQ results and attribute fingerprints are forwarded to DPM for further use.

DPE also stores data processing results in ONE Object Storage (MinIO), from where they are made available to DPM. This includes profiling and processing results, invalid data samples, lookups, and files generated through post-processing plans. Users can access this information through the web application after the data processing is complete.

For more information about the specific data flows marked in the diagram, refer to the table provided.

DPE and DPM data flows
Flow number Asset Description

1

DPE Job Metadata and Data

Definitions of jobs consisting of submitted (the serialized message) and associated metadata, which covers the job owner, name, timestamps, correlation identifier.

2

Raw Source Data

Data from customer sources that is processed by DPE based on a particular job definition.

3

DQ Engine Data

During DQ evaluation, DPE processes customer data and produces aggregated DQ results and, optionally, invalid samples. Invalid samples are records that did not pass DQ checks.

Engine Processing Events

Events describing the state of processing for the jobs handled by the engine. These events are stored in the DPE database of choice (embedded or external) and forwarded to DPM clients.

In some cases, for example, when describing the cause of a job failure, processing events can contain metadata.

Fingerprints

Fingerprints are created in DPE during profiling, after which they are pulled by DPM as part of the profiling results and stored temporarily in ONE Object Storage. Fingerprints can contain metadata and data samples.

NOTE: Fingerprints represent the level of similarity between catalog item attributes and are used for calculating term suggestions.

Runtime Logs

System and job logs. No customer metadata or data is collected here.

4

Catalog Item Profiles

Profiling results contain various data metrics and data samples. DPM and DPE temporarily store profile job results to ONE Object Storage in order to transport data between modules.

Invalid Samples

Records that did not pass DQ checks are stored in the drillthrough bucket in ONE Object Storage. Records can contain metadata and data samples.

Storing data samples can be disabled.

Lookups

Lookup data files (.lkp) are stored in the lookups bucket in ONE Object Storage. Lookups can contain metadata and data samples.

Post-Processing Plan Exports

Exports are created by DPE based on post-processing plans executed during the post-processing phase. After this, DPM pushes them to the shared bucket in ONE Object Storage.

Processing Results

Processing results are created in DPE during job processing, after which they are moved to ONE Object Storage to be processed by DPM. Processing results can contain metadata and data samples.

Storing data samples can be disabled.

DNS configuration

All Ataccama ONE modules, as well as all application users, rely on hostnames to communicate with the Ataccama ONE PaaS services, which is why using a DNS service is necessary to correctly resolve IP addresses. How DNS is used and configured differs slightly depending on the integration scenario implemented between the customer environment and the Ataccama ONE PaaS.

As mentioned previously, the Private Link service is available for both Amazon Web Services (AWS) and Microsoft Azure environments. In this integration scenario, DNS is handled as follows for application users and the client-side components:

  • Application users:

    • Users access the Ataccama ONE PaaS web application through the Internet using the hostname one.[customer].[env].ataccama.online.

    • The DNS record for this hostname is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS service in the public domain.

  • Client-side components:

    • Client-side components access the Ataccama ONE PaaS environment using the Private Link between the customer VPC or VNet, depending on the selected cloud provider, and the Ataccama ONE PaaS VPC or VNet. The hostname used is dpm.[customer].[env].ataccama.online.

    • The DNS record for this hostname is configured and managed by the customer and resolved by the customer internal DNS service as a private domain name.

      DNS configuration - Private Link

IP allow-listing

In this integration scenario, DNS is handled as follows for application users and the client-side components:

  • Application users:

    • Users access the Ataccama ONE PaaS web application through the Internet using the hostname one.[customer].[env].ataccama.online.

    • The DNS record for this hostname is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS Service in the public domain.

  • Client-side components:

    • Client-side components access the Ataccama ONE PaaS environment using the Private Link between the customer VPC or VNet, depending on the selected cloud provider, and the Ataccama ONE PaaS VPC or VNet, respectively. The hostname used is dpm.[customer].[env].ataccama.online.

    • The DNS record for the hostname can be managed based on one of the following two scenarios:

      1. The DNS record is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS service in the public domain. The DPE server requires direct access to Route 53 DNS service.

        DNS configuration - IP allow-listing scenario 1
      2. The DNS record is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS service in the public domain. The DPE server cannot directly access Route 53 DNS service and uses an internal customer DNS service instead. In this case, the customer DNS service should be synchronized with Route 53 DNS service for the domain .[customer].[env].ataccama.online.

        DNS configuration - IP allow-listing scenario 2

DPE integration with customer data sources

As described in previous sections, Ataccama ONE client-side components consist of Data Processing Engine (DPE) instances and the logging modules. DPE, which is orchestrated through Data Processing Module (DPM), connects to customer data sources for a variety of data processing tasks, such as populating the data catalog, profiling data, or evaluating data quality.

Thanks to their flexible design, the Ataccama ONE client-side components can seamlessly integrate with any internal customer infrastructure, which in turn results in optimized performance.

Best practices

DPE servers should be installed as close as possible to customer data sources in a separate subnet. Network paths to and from the data source need to provide adequate throughput depending on the amount and structure of customer data while the round-trip time (RTT) between DPE and the data source should be kept under 5 ms. For more information, see Sizing Guidelines.

Supported Data Sources

DPE comes with out-of-the-box support for a number of data sources while integration with other data sources is available upon request. If different data sources are used, we recommend deploying one DPE server for each data source type.

However, for processing data sources with a large number of records, we recommend deploying two or more DPE servers for each data source type. When DPE servers are deployed as described, Data Processing Module (DPM) can distribute the related workload across all instances dedicated to a particular data type, which contributes to better performance and scalability.

The following table shows which JDBC drivers are already included or can be downloaded through the automated installation process.

The data sources that are used in the Ataccama ONE PaaS hybrid deployment are agreed upon with the customer when defining the deployment specification. If you need any additional JDBC drivers once the deployment specification has been finalized, contact the Ataccama professional services team.
Data source JDBC driver included Data source JDBC driver included

Amazon Aurora PostgreSQL

Yes

MariaDB

Yes

Amazon Redshift

Automatically downloaded

MSSQL

Yes

Amazon S3

N/A

Oracle

Yes

Apache Derby

Automatically downloaded

PostgreSQL

Yes

Azure Synapse Analytics

Yes

Snowflake

Yes

Databricks

No

Teradata

Yes

H2

Yes

Scaling

DPE servers can be scaled as follows:

  • Scaled up or down by increasing or reducing the number of CPUs and the RAM size on the DPE server.

  • Scaled out or in by installing or removing individual DPE servers.

Scaling up and scaling out can be performed without interrupting the service provided that the virtualization technology that the customer uses supports this option. On the other hand, scaling down, whether it includes decreasing the number of CPUs and the RAM size or reducing the disk size, typically requires restarting the DPE server. This limitation stems from the operating system design and functionality.

Scaling is done manually by the customer as needed. This way, customers are given full control over performance and cost of DPE servers as well as the possibility to easily adjust processing power in response to data management and processing demands. For more information about how to measure and fine-tune the performance of DPE, see Sizing Guidelines.

Was this page useful?