Hybrid Deployment Architecture
Ataccama ONE PaaS is designed to provide secure, efficient, and reliable service in various scenarios depending on the location of customer data sources and the customer infrastructure design. Therefore, its architecture combines a standardized, highly automated core service with flexible client-side components that are easy to scale.
Thanks to this approach, the implementation process is simple while the customer data is always kept confidential and secure. Currently, Ataccama ONE PaaS supports hybrid integration with Data Processing Engine (DPE).
The following sections contain more information about possible deployment options and scenarios, integration modes (between customer and Ataccama PaaS environments and between DPE and data sources), and supported data sources.
For more information about preparing the customer infrastructure for hybrid deployment and installing DPE, see hybrid-deployment:hybrid-deployment-guide.adoc. |
Supported deployment scenarios
The following sections describe which standardized scenarios and use cases Ataccama supports within the Ataccama ONE PaaS hybrid offering based on where the customer data sources are located.
Customer premises or data centers
The following diagram shows the hybrid deployment architecture in cases where customer data is located on the premises or in data centers.
- Location of customer data
-
Data is located on the premises (or in data centers) that are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the Amazon Web Services (AWS) cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run on the customer premises, close to the source data.
-
The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the public Internet:
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Internet as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Customer cloud subscription - Amazon Web Services
The following diagram shows the hybrid deployment architecture in cases where customer data is located only in Amazon Web Services cloud.
- Location of customer data
-
Data is located in one or more Amazon Web Services (AWS) cloud subscriptions that are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in the customer AWS VPC, close to the source data.
-
The DPE server requires x86-based computing power (virtual server) and should be used only for client-side components in order to provide the expected performance and scalability options.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link):
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Customer cloud subscription - Microsoft Azure
The following diagram shows the hybrid deployment architecture in cases where customer data is located only in Microsoft Azure cloud.
- Location of customer data
-
Data is located in one or more Microsoft Azure cloud subscriptions that are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the Microsoft Azure cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Network (VNet).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in the customer Microsoft Azure VNet, close to the source data.
-
The DPE server requires x86-based computing power (virtual server) and should be used only for client-side components in order to provide the expected performance and scalability options.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the Azure Private Link service:
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Customer premises or data centers and customer cloud AWS subscription
- Location of customer data
-
Data is located in one or more Amazon Web Services (AWS) cloud subscriptions as well as on premises or in a data center. Both locations are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer AWS VPC and on the customer premises.
-
The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link). All DPE servers located on the customer premises use the existing connection between the premises and the customer AWS VPC to communicate with the Ataccama ONE PaaS services.
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Customer premises or data centers and customer cloud Azure subscription
The following diagram shows the hybrid deployment architecture in cases where customer data is located both on premises or in a data center and in Microsoft Azure cloud.
- Location of customer data
-
Data is located in one or more Microsoft Azure cloud subscriptions as well as on premises or in a data center. Both locations are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the Microsoft Azure cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Network (VNet).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer Microsoft Azure VNet and on the customer premises.
-
The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the Azure Private Link service. All DPE servers located on the customer premises use the existing connection between the premises and the customer Microsoft Azure VNet to communicate with the Ataccama ONE PaaS services.
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VNet as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Customer cloud AWS and Azure subscriptions
The following diagram shows the hybrid deployment architecture in cases where customer data is located both in Amazon Web Services and Microsoft Azure clouds.
- Location of customer data
-
Data is located in one or more Amazon Web Services (AWS) and Microsoft Azure cloud subscriptions. Both locations are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer AWS VPC and the customer Microsoft Azure VNet.
-
The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link). All DPE servers located in the customer Microsoft Azure subscription use the existing connection between the customer Microsoft Azure VNet and the customer AWS VPC to communicate with the Ataccama ONE PaaS services.
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Customer premises or data centers, customer cloud AWS Subscription, and Snowflake
The following diagram shows the hybrid deployment architecture in cases where customer data is located on premises or in a data center, in Amazon Web Services cloud, and in Snowflake.
- Location of customer data
-
Data is located in one or more Amazon Web Services (AWS) cloud subscriptions, on premises or in a data center, as well as in a third-party service, Snowflake. All locations are owned or controlled by the customer.
- Location of Ataccama ONE PaaS systems
-
-
Ataccama ONE core systems: Located in the AWS cloud in a subscription that is owned by Ataccama and provided as a service to the customer in a dedicated Virtual Private Cloud (VPC).
-
Ataccama client-side components: Consist of Data Processing Module (DPE) and the logging tools. These run in both customer locations, close to the source data: in the customer AWS VPC and on the customer premises.
-
The DPE server requires x86-based computing power (virtual or physical server) and should be used only for client-side components in order to provide the expected performance and scalability options.
-
The DPE server for working with data stored in Snowflake is located in the AWS VPC that is connected to the Snowflake systems. If Snowflake is located in another environment, the DPE server needs to be running in the same environment to remain close to the source data.
For more information about the principles and best practices that should be followed when integrating the DPE server with the data sources, see DPE integration with customer data sources. For more information about the DPE server sizing recommendations, see Sizing Guidelines.
-
-
- Connectivity between customer environment and Ataccama ONE PaaS
-
The Ataccama ONE PaaS and the client-side components are connected through the AWS VPC endpoint services (AWS Private Link). All DPE servers located on the customer premises use the existing connection between the premises and the customer AWS VPC to communicate with the Ataccama ONE PaaS services.
-
All connections are opened from the DPE server towards the Ataccama ONE PaaS, with no open connections required in the opposite direction (that is, towards the customer environment).
-
All connections are secured through TLS encryption (TCP port 443).
-
Firewall and routing settings must allow outbound traffic from the DPE server to the Ataccama ONE PaaS VPC as well as the return of packets from the TCP session initiated from the client-side components.
-
A DNS service is required as client-side components use URLs for connections towards the Ataccama ONE PaaS. For more information, see DNS configuration.
-
- User access to Ataccama ONE PaaS
-
Users can access the Ataccama ONE web application in the PaaS environment through the Internet using the HTTPS (TCP 443) protocol.
Integrate customer and the Ataccama ONE PaaS core environments
The Ataccama ONE PaaS supports the following connection types when it comes to establishing communication between the client environment where the DPE server is located and the Ataccama ONE PaaS environment.
Private Link
The Private Link service is available for both Amazon Web Services (AWS) and Microsoft Azure environments and is recommended for cases where some DPE servers are located in one of the two supported public clouds (AWS or Microsoft Azure). The Ataccama ONE PaaS environment provides the Private Link service while the customer environment represents the endpoint. Thanks to this design, the connection can be set up only from the customer environment towards the Ataccama ONE PaaS.
IP allow-listing
This option is best suited for scenarios where DPE servers need to communicate through the Internet (that is, in cases where DPE servers are located on the customer premises or in a data center). Allow-listing is then applied on the Ataccama ONE PaaS side so that access to the customer Ataccama ONE PaaS services is only granted to servers whose public IP addresses have been previously approved. Customers define which IP addresses are allow-listed while the list itself is managed by the Ataccama Support team according to customer requests.
Communication between DPE and DPM
DPE units communicate with the DPM module, located in PaaS, in client-server mode, with DPE acting as a client and DPM as a server. The role of DPM within Ataccama ONE is to receive user-submitted job requests and parameters for data processing and transmit them to DPE based on the engine’s capacities and capabilities. DPE then processes the data as instructed and returns the results of those operations. The communication between DPM and DPE relies on the following gRPC protocol-based stack:
When the default configuration is used, DPE initiates a TCP connection with DPM in the Ataccama ONE PaaS. By default, the communication is encrypted using TLS and the TCP session is established on port 443. During an active TCP session, the modules actually communicate through gRPC.
Connection security
By default, the communication between DPE and DPM is secured on three levels:
-
Network layer (on the PaaS side): IP allow-listing, which allows connections only from the previously approved IP addresses.
-
Transport layer: TLS encryption.
As mentioned previously, DPE starts a session as a client using the TLS protocol suite, with the TLS handshake sending a symmetrical encryption key. On the DPM side, an X.509 certificate signed by the public certificate authority (CA) Let’s Encrypt is installed.
-
Application layer: Module-to-module authentication using JSON Web Tokens (JWT).
This is the standard authentication mode used between all Ataccama ONE modules, in which authentication keys are transferred through JWT. The secret key is provided to the DPE configuration manually.
DPE and DPM data flows
As described in the previous section, DPE and DPM function in client-server mode, with DPE assuming the client role and DPM the server role. DPM transfers the parameters necessary for completing data processing jobs to DPE.
After receiving the job information, DPE retrieves data from the relevant data source and starts processing. Once the job is finished, the information related to processing (runtime logs and engine processing events), as well as aggregated DQ results and attribute fingerprints are forwarded to DPM for further use.
DPE also stores data processing results in ONE Object Storage (MinIO), from where they are made available to DPM. This includes profiling and processing results, invalid data samples, lookups, and files generated through post-processing plans. Users can access this information through the web application after the data processing is complete.
For more information about the specific data flows marked in the diagram, refer to the table provided.
Flow number | Asset | Description |
---|---|---|
1 |
DPE Job Metadata and Data |
Definitions of jobs consisting of submitted (the serialized message) and associated metadata, which covers the job owner, name, timestamps, correlation identifier. |
2 |
Raw Source Data |
Data from customer sources that is processed by DPE based on a particular job definition. |
3 |
DQ Engine Data |
During DQ evaluation, DPE processes customer data and produces aggregated DQ results and, optionally, invalid samples. Invalid samples are records that did not pass DQ checks. |
Engine Processing Events |
Events describing the state of processing for the jobs handled by the engine. These events are stored in the DPE database of choice (embedded or external) and forwarded to DPM clients. In some cases, for example, when describing the cause of a job failure, processing events can contain metadata. |
|
Fingerprints |
Fingerprints are created in DPE during profiling, after which they are pulled by DPM as part of the profiling results and stored temporarily in ONE Object Storage. Fingerprints can contain metadata and data samples. NOTE: Fingerprints represent the level of similarity between catalog item attributes and are used for calculating term suggestions. |
|
Runtime Logs |
System and job logs. No customer metadata or data is collected here. |
|
4 |
Catalog Item Profiles |
Profiling results contain various data metrics and data samples. DPM and DPE temporarily store profile job results to ONE Object Storage in order to transport data between modules. |
Invalid Samples |
Records that did not pass DQ checks are stored in the Storing data samples can be disabled. |
|
Lookups |
Lookup data files ( |
|
Post-Processing Plan Exports |
Exports are created by DPE based on post-processing plans executed during the post-processing phase.
After this, DPM pushes them to the |
|
Processing Results |
Processing results are created in DPE during job processing, after which they are moved to ONE Object Storage to be processed by DPM. Processing results can contain metadata and data samples. Storing data samples can be disabled. |
DNS configuration
All Ataccama ONE modules, as well as all application users, rely on hostnames to communicate with the Ataccama ONE PaaS services, which is why using a DNS service is necessary to correctly resolve IP addresses. How DNS is used and configured differs slightly depending on the integration scenario implemented between the customer environment and the Ataccama ONE PaaS.
Private Link
As mentioned previously, the Private Link service is available for both Amazon Web Services (AWS) and Microsoft Azure environments. In this integration scenario, DNS is handled as follows for application users and the client-side components:
-
Application users:
-
Users access the Ataccama ONE PaaS web application through the Internet using the hostname
one.[customer].[env].ataccama.online
. -
The DNS record for this hostname is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS service in the public domain.
-
-
Client-side components:
-
Client-side components access the Ataccama ONE PaaS environment using the Private Link between the customer VPC or VNet, depending on the selected cloud provider, and the Ataccama ONE PaaS VPC or VNet. The hostname used is
dpm.[customer].[env].ataccama.online
. -
The DNS record for this hostname is configured and managed by the customer and resolved by the customer internal DNS service as a private domain name.
-
IP allow-listing
In this integration scenario, DNS is handled as follows for application users and the client-side components:
-
Application users:
-
Users access the Ataccama ONE PaaS web application through the Internet using the hostname
one.[customer].[env].ataccama.online
. -
The DNS record for this hostname is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS Service in the public domain.
-
-
Client-side components:
-
Client-side components access the Ataccama ONE PaaS environment using the Private Link between the customer VPC or VNet, depending on the selected cloud provider, and the Ataccama ONE PaaS VPC or VNet, respectively. The hostname used is
dpm.[customer].[env].ataccama.online
. -
The DNS record for the hostname can be managed based on one of the following two scenarios:
-
The DNS record is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS service in the public domain. The DPE server requires direct access to Route 53 DNS service.
-
The DNS record is configured and managed by the Ataccama Operations team and resolved by Route 53 DNS service in the public domain. The DPE server cannot directly access Route 53 DNS service and uses an internal customer DNS service instead. In this case, the customer DNS service should be synchronized with Route 53 DNS service for the domain
.[customer].[env].ataccama.online
.
-
-
DPE integration with customer data sources
As described in previous sections, Ataccama ONE client-side components consist of Data Processing Engine (DPE) instances and the logging modules. DPE, which is orchestrated through Data Processing Module (DPM), connects to customer data sources for a variety of data processing tasks, such as populating the data catalog, profiling data, or evaluating data quality.
Thanks to their flexible design, the Ataccama ONE client-side components can seamlessly integrate with any internal customer infrastructure, which in turn results in optimized performance.
Best practices
DPE servers should be installed as close as possible to customer data sources in a separate subnet. Network paths to and from the data source need to provide adequate throughput depending on the amount and structure of customer data while the round-trip time (RTT) between DPE and the data source should be kept under 5 ms. For more information, see Sizing Guidelines.
Supported Data Sources
DPE comes with out-of-the-box support for a number of data sources while integration with other data sources is available upon request. If different data sources are used, we recommend deploying one DPE server for each data source type.
However, for processing data sources with a large number of records, we recommend deploying two or more DPE servers for each data source type. When DPE servers are deployed as described, Data Processing Module (DPM) can distribute the related workload across all instances dedicated to a particular data type, which contributes to better performance and scalability.
The following table shows which JDBC drivers are already included or can be downloaded through the automated installation process.
The data sources that are used in the Ataccama ONE PaaS hybrid deployment are agreed upon with the customer when defining the deployment specification. If you need any additional JDBC drivers once the deployment specification has been finalized, contact the Ataccama professional services team. |
Data source | JDBC driver included | Data source | JDBC driver included |
---|---|---|---|
Amazon Aurora PostgreSQL |
Yes |
MariaDB |
Yes |
Amazon Redshift |
Automatically downloaded |
MSSQL |
Yes |
Amazon S3 |
N/A |
Oracle |
Yes |
Apache Derby |
Automatically downloaded |
PostgreSQL |
Yes |
Azure Synapse Analytics |
Yes |
Snowflake |
Yes |
Databricks |
No |
Teradata |
Yes |
H2 |
Yes |
Scaling
DPE servers can be scaled as follows:
-
Scaled up or down by increasing or reducing the number of CPUs and the RAM size on the DPE server.
-
Scaled out or in by installing or removing individual DPE servers.
Scaling up and scaling out can be performed without interrupting the service provided that the virtualization technology that the customer uses supports this option. On the other hand, scaling down, whether it includes decreasing the number of CPUs and the RAM size or reducing the disk size, typically requires restarting the DPE server. This limitation stems from the operating system design and functionality. Scaling is done manually by the customer as needed. This way, customers are given full control over performance and cost of DPE servers as well as the possibility to easily adjust processing power in response to data management and processing demands. For more information about how to measure and fine-tune the performance of DPE, see Sizing Guidelines. |
Was this page useful?