User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Architecture Overview

Ataccama ONE Platform as a Service (PaaS) is hosted on Amazon Web Service (AWS) or Microsoft Azure cloud depending on your preference.

Within either cloud, you can select a region or geography where the service will be hosted in, with minimal requirements in terms of available services (see Appendix 1). Simply put, the PaaS model means you get to focus on the things that really matter to your organization.

Hardware and software maintenance, operating system patching and upgrades, networking and security are the responsibility of Ataccama while you take care of users and their rights within the system, the data and the surrounding processes depending on how the system fits in your organization.

The following table provides an overview of how responsibilities are shared between you and Ataccama, depending on the selected deployment mode (the asterisk designates Ataccama’s responsibilities).

On-Premise deployment PaaS

User identity and access

User identity and access

Data

Data

Application usage

Application usage

Application maintenance

Application maintenance*

Guest OS

Guest OS*

Virtualization

Virtualization*

Security

Security*

Network

Network*

Infrastructure

Infrastructure*

Physical

Physical*

In hybrid deployment, Data Processing Engine (DPE) is moved to your organization’s system, which means the module falls outside of Ataccama’s scope of responsibility. For more information, see Hybrid Deployment Architecture.

Ataccama ONE PaaS deployment options

Ataccama ONE PaaS solution can be deployed in one of two distinct ways:

  • Pure PaaS deployment: All components of the Ataccama ONE Platform are run and operated by Ataccama and hosted on AWS or Azure cloud.

  • Hybrid PaaS deployment: All components of the Ataccama ONE Platform with the exception of Data Processing Engine (DPE) are run and operated by Ataccama and hosted on AWS or Azure cloud. In this setup, you run and operate DPE in your corporate network.

Pure PaaS deployment

The following diagram shows the pure PaaS layout, where all Ataccama components are hosted in the AWS or Azure cloud. In this case, Ataccama is fully responsible for operating, maintaining, and upgrading the Ataccama ONE Platform.

Pure PaaS deployment overview

Hybrid PaaS deployment

With the hybrid approach, your organization hosts and operates one or more Ataccama components inside your corporate network. Although at first glance this might seem counterintuitive, given that Platform as a Service implies that Ataccama runs and operates the platform on your behalf, this solution is particularly well suited for cases when one of the following conditions must be met:

  • Legal or regulatory requirements: The hybrid approach is typically used in cases when data must be restricted to a certain location or network due to legal or other requirements. As DPE processes your data and produces metadata about your data and data sources, it is the only module that needs to access your data sources.

    If your instance of ONE includes the Ataccama ONE MDM Suite, MDM and RDM contain your master and reference data respectively and might therefore be subject to the same requirements as your other data sources.

    In this case, or in case your metadata also needs to remain within your corporate network, Ataccama offers a fully self-hosted installation where you host, operate, and run the Ataccama software on (virtual) machines in your data center or with any cloud provider of your choice.

    This way, you can fully respect your company policies about data mobility while making full use of all Ataccama tools.

  • Network limitations: To produce metadata about your data sources, DPE must first read the database contents. Depending on how large the data source is and whether there are any network limitations such as insufficient bandwidth or latency, this process could become inefficient and time-consuming or run into issues.

    These issues are easily resolved if DPE is hosted inside your network close to your data sources. In this case, instead of all information available, only a fraction of the metadata is transmitted to Ataccama ONE PaaS. In addition, DPE can be configured to use bidirectional communication with the Ataccama cloud or to poll the Ataccama cloud periodically, which ensures that all communication with Ataccama ONE PaaS can only be initiated from your side.

    If you are working with a variety of data sources, with some residing in a highly restricted environment, you can also opt for a solution in which some DPE instances are set up within the restricted network while your remaining data sources are processed by DPEs installed outside that network or even within the Ataccama cloud.

The following diagram shows the hybrid PaaS layout where DPEs are located in your network. For more detailed information about the hybrid PaaS, including architecture, network settings and communication details, data flows, DPE sizing and installation instructions, see Hybrid Deployment Architecture and hybrid-deployment:hybrid-deployment-guide.adoc.

PaaS hybrid deployment overview

Ataccama ONE suites

Ataccama ONE PaaS offers three suites, which, depending on your needs, can be combined or used separately:

  • Ataccama ONE Data Governance Suite

  • Ataccama ONE Data Quality and Governance Suite

  • Ataccama ONE MDM Suite (two variants: ONE MDM and ONE RDM)

Each suite consists of a number of required and optional components, as shown in the following table:

Ataccama ONE suites

Ataccama ONE components

The following diagram shows all Ataccama ONE components as they would be deployed in the cloud of your choice:

Ataccama ONE components
MMM

The Metadata Management Module (MMM) is the core of the Ataccama ONE Platform.

It consists of two parts:

  • A web application: The entry point to the Ataccama ONE Platform for all users.

  • A backend server component: The module that all other components connect to directly or indirectly.

DPM

The Data Processing Module (DPM) is responsible for dispatching jobs to one or more DPEs. It acts as a load balancer and has an overview of the data sources that each DPE has access to.

DPE

The Data Processing Engine (DPE) connects to data sources and produces metadata describing the sources and their contents. DPE receives jobs from DPM and returns the processing results back to DPM.

AI services

Microservices leveraging artificial intelligence capabilities (AI Matching, Anomaly Detection, Term Suggestions). These services work with metadata to determine, for example, what terms best describe the contents of a particular attribute in a catalog item or to detect anomalies in data.

Audit

The component for logging user interactions with the Ataccama ONE Platform.

ONE Data

The component allowing you to store, explore, cleanse, and export data. Integrates with the Data Quality module to provide insights about the data.

DQIT

The Data Quality Incident Tracker (DQIT) allows data stewards to track data issues and resolve them in a structured and efficient way.

MDM

The Master Data Management (MDM) module provides a flexible way of consolidating and authoring master and reference data, managing golden records, and providing data to other systems, processes, and users.

RDM

The Reference Data Management (RDM) module allows you to maintain valid reference data in a centrally managed solution and provide it to the whole organization for analytics, operational use cases, and data warehousing.

Self-service portal

Available for each Ataccama ONE PaaS deployment and used for platform administration and submitting and tracking of support tickets.

Third-party components

In addition to Ataccama modules, the Ataccama ONE Platform relies on the following third-party components:

MinIO

Functions as an abstraction layer between cloud-specific storage solutions and the Ataccama ONE Platform.

Keycloak

The authentication and identity management solution of the Ataccama ONE Platform. Allows Ataccama ONE PaaS to integrate with any identity management system that supports the SAML, OAuth 2.0, or OIDC protocol, as well as the Kerberos bridge.

PostgreSQL

Used for storing internal platform data, which can be anything from simple tabular data sets to complex data structures used by various Ataccama ONE components.

MANTA

An automated data lineage platform that provides a comprehensive overview of your data flows and helps you make full use of your data governance framework. The component is an optional, separately licensed product resold by Ataccama and produced by MANTA.

MANTA is only supported in Ataccama ONE standard platform as a service deployment, where DPEs are also located in the cloud and both MANTA and DPEs have direct network access to data sources. In addition, data lineage is available only for the following databases: Amazon Aurora PostgreSQL, MS SQL, Oracle, PostgreSQL, Power BI, Tableau, Teradata, Snowflake.

Deployment strategy

In order to provide maximum isolation between customer deployments, Ataccama creates a new account (AWS) or a new subscription (Azure) for each customer. This way, your data is logically isolated from all other Ataccama PaaS customers starting from the level of the cloud provider.

This single-tenant solution does not imply dedicated physical hardware. Ataccama uses the cloud provider’s standard shared hardware, which means that Ataccama ensures the level of separation between customers that matches the one that cloud providers offer to their customers.

For more information about how AWS and Microsoft Azure handle account separation and security, see Shared responsibility model and Shared responsibility in the cloud respectively.

The following diagrams show the AWS and Azure setups respectively. On both clouds, Ataccama ONE PaaS is deployed using the cloud provider’s Kubernetes service, specifically Elastic Kubernetes Service (EKS) for AWS and Azure Kubernetes Service (AKS) for Azure.

For backend storage Ataccama ONE uses PostgreSQL databases, which are in both cases hosted through the cloud provider’s relational database as a service solution.

For object storage, MinIO is used to abstract away the specifics of the cloud provider’s object storage solution.

PaaS tenant isolation in AWS
Figure 1. Deployment Strategy in AWS
PaaS tenant isolation in Azure
Figure 2. Deployment Strategy in Microsoft Azure

Interfaces

Ataccama ONE PaaS makes available a number of interfaces, as described in the following sections.

Web applications

Users can access a web interface for each primary component of Ataccama ONE, alongside an endpoint for the interface of the Ataccama PaaS solution. The following list contains the paths to each service:

  • Ataccama ONE: https://<CLIENT>.<ENV>.ataccama.online

  • DPM Admin Console: https://dpm.<CLIENT>.<ENV>.ataccama.online

  • Master Data Management (MDM): https://mdm.<CLIENT>.<ENV>.ataccama.online

  • Reference Data Management (RDM): https://rdm.<CLIENT>.<ENV>.ataccama.online

  • Data Quality Issue Tracker (DQIT): https://dqit.<CLIENT>.<ENV>.ataccama.online

  • Audit: https://audit.<CLIENT>.<ENV>.ataccama.online

  • Keycloak: https://<CLIENT>.<ENV>.ataccama.online/auth

  • GitLab cloud: https://git.ataccama.cloud

  • Kibana: https://<CLIENT>.<ENV>.ataccama.online/logs

  • MiniIO: https://minio.<CLIENT>.<ENV>.ataccama.online

Cloud portal

  • Self-service portal: https://portal.<CLIENT>.<ENV>.ataccama.online

Backups and data recovery

The Ataccama ONE PaaS solution ensures protection against disasters and service disruption thanks to the high availability (HA) design of its components and the backup and restore services that it provides. Backup and restore services cover both the customer data and the PaaS configuration and settings. For more detailed information, Backup and Restore Architecture.

Integration strategy

When talking about the Ataccama ONE PaaS integration strategy, we distinguish on one hand the connectivity between your organization’s network and users and the Ataccama ONE PaaS deployment and on the other the connectivity between the Ataccama ONE PaaS deployment and your infrastructure, such as databases, authentication and authorization or business intelligence platforms, data streams, and more.

Connectivity integrations

There are three ways to connect Ataccama ONE PaaS and your organization’s network.

IP allowlisting

Data is transferred over the public internet, with the connections secured through SSL/TLS and only specific IP addresses or ranges allowed to access the services. This option can be used for user access to Ataccama ONE PaaS and combined with other connectivity options described here for data access.

In other words, even though the user interface is available over the public internet and restricted to certain IP addresses or ranges, the access to data sources can be set up in such a way that they can be reached only through another path and not the public internet.

Site-to-site virtual private network (VPN)

In this case, data is transferred over a virtual private network across the public internet. This setup is typically used to connect company offices and data centers and is considered suitable for sensitive data flows.

A site-to-site VPN connects the two networks together and requires firewalls on both sides to limit the access that each network has to the other side of the connection. This integration is the most complex to set up compared to other supported options, since you need to specify two network subnets that do not overlap with your existing subnets.

For more detailed technical architecture, see AWS Site-to-Site VPN and Azure VPN Gateway.

The Private Link is a good option for all data types including sensitive data. In this case, specific endpoints from one AWS or Azure network are exposed to the other AWS or Azure network respectively. This way, data remains within the cloud network away from the public internet while lowering latency and data transfer costs. Setting up firewalls between the networks is not necessary.

A Private Link setup is an AWS or Azure native solution that is considered by both cloud providers as the most secure way to connect individual services between two AWS or Azure networks. Although the Private Link service is also capable of establishing connection to your on-premise network, Ataccama ONE PaaS does not support this feature.

For more detailed technical architecture, see AWS PrivateLink and Azure Private Link.

Platform integrations

The Ataccama ONE Platform integrates with a wide range of data sources. It easily connects to relational and NoSQL databases, data lakes, data warehouses, cloud storage, metastores, streams, files while also working with major BI tools and analytical platforms.

The following are some of the data sources and identity management tools that Ataccama customers most commonly use and for which the Ataccama Platform provides out-of-the-box support.

Depending on the specific module, some data sources listed here might not be available.
Relational databases Cloud databases and data warehouses Data lake platforms Business intelligence and analytics tools
  • Oracle Database

  • SAP S/4HANA

  • Teradata

  • Microsoft SQL Server

  • PostgreSQL

  • IBM Db2

  • MySQL

  • Azure Synapse

  • Amazon Redshift

  • Google BigQuery

  • Snowflake

  • Databricks

  • Amazon Relational Database Service (RDS)

  • Azure Data Lake Storage

  • Hive

  • Amazon Athena

  • Hive

  • Hortonworks

  • Databricks

  • MapR

  • Cloudera

  • AWS Glue

  • Amazon EMR

  • Google Dataproc

  • Tableau

  • Looker

  • Power BI

  • Dataiku

  • Qlik

  • ThoughtSpot

  • GoodData

File formats and cloud storages External metadata Identity management
  • Amazon S3

  • CSV (any format)

  • Microsoft Excel

  • XML

  • JSON

  • Azure Data Lake Storage

  • Apache Avro

  • Apache ORC

  • Blob storage

  • Cloud storage

  • Apache Parquet

  • Collibra

  • MANTA

  • OIDC

  • OAuth 2.0

  • Kerberos-based SSO

  • SAML 2.0

  • Active Directory

  • SecureAuth

  • Okta

  • Multi-factor authentication

  • Support for social logins

As explained previously, this list of data sources is not exhaustive and Ataccama can integrate with many more sources not mentioned here. If your relational database comes with a corresponding JDBC driver, Ataccama is able to connect to it.

The list also does not include any log or metric aggregation solutions, such as Splunk or Datadog, since Ataccama ONE PaaS does not integrate with them directly. However, application logs can be made available to these tools if needed. For more information, see security-overview.adoc, section Logging and monitoring.

Appendix 1: Ataccama ONE PaaS deployment locations

Ataccama PaaS can be deployed in a geography, region, and availability zone of your choice provided that the following services are available in those locations and that the location has sufficient hardware capacity.

AWS
  • Multiple availability zones (AZ)

  • EKS support

  • PostgreSQL RDS

  • m5a.2xlarge nodes

Azure
  • Multiple availability zones (AZ)

  • AKS support

  • Azure PostgreSQL Server

  • Dsv3-series nodes available

Although this is a rare occurrence, cloud providers occasionally might not have sufficient capacity to allow for new deployments at that time. If this happens, Ataccama can assist you in identifying alternative deployment locations and working with the cloud provider to determine a date when the selected location would again be available, therefore helping you make an informed decision on how to proceed.

Was this page useful?