User Community Service Desk Downloads

Edge Architecture

Edge architecture is built on two core principles:

Bring your own storage: All sensitive data and processing results stay in object storage you own. Edge currently uses AWS S3. For customer-managed encryption of data at rest in S3 and Aurora, see Bring Your Own Key (BYOK).

Bring your own cloud: Your organization provides the AWS account and retains full ownership of the underlying infrastructure. Processing components are deployed there — by Ataccama for managed deployments, by you for self-managed.

Architecture overview

Edge architecture separates processing and data management across four functional areas.

Edge architecture overview

This structure is the same for both deployment options. The deployed components differ; see the following diagrams for details.

Ataccama-managed deployment

Components shown here are deployed by Ataccama through CloudFormation and Terraform Cloud automation.

Edge architecture overview - Ataccama-managed deployment on AWS

Self-managed deployment

The edge runs as a set of containerized workloads on AWS ECS with Fargate runners inside your virtual private cloud (VPC). Interactive work (testing a connection, browsing schemas) is handled by AWS Lambda functions.

All communication is outbound from your account to Ataccama; no inbound connections are required.

Edge architecture overview - Self-managed deployment on AWS

For the full list of AWS resources deployed by Terraform, see Appendix: AWS resources deployed by Terraform.

Edge compute

Location Description Management

Your AWS account.

Runs the components that process data from the data sources. The edge compute doesn’t directly store primary data or processing results.

Ataccama-managed software running on customer-owned infrastructure.

Data sources

Location Description Management

Your environment.

Holds your primary data and the sensitive results produced by the edge compute.

Primary data is retrieved by the edge compute on demand for processing and viewing; the only data that crosses components persistently is reference data you explicitly import.

Customer-owned and managed.

Control plane

Location Description Management

Ataccama infrastructure (Ataccama ONE).

Lets you browse metadata, set up DQ rules, and view processing results.

The control plane has no direct access to your data: sensitive data is loaded from your storage on demand when you view or work with it in Ataccama ONE, and isn’t stored, derived, or cached.

Ataccama-owned and managed.

Ataccama Cloud Portal

Location Description Management

Ataccama infrastructure.

The admin and configuration console where you manage edge instances and access deployment artifacts and upgrades.

How the edge compute is run depends on the deployment option:

  • Ataccama-managed deployments: The Cloud Portal runs the automation that provisions, upgrades, and configures the edge compute on your behalf.

  • Self-managed deployments: You run the deployment and upgrade steps using artifacts distributed by Ataccama.

Ataccama-owned and managed.

Security and communication

All communication between components uses TLS 1.3. Sensitive data and processing results are encrypted at the application level both in transit and at rest.

Key communication patterns are as follows:

  • Cloud Portal to edge compute: No direct access. Configuration is delivered through Terraform-based automation: ArgoCD and Terraform Cloud for Ataccama-managed deployments, Terraform bundles applied by your team for self-managed.

  • Control plane to edge compute: Communication goes through message queues, with pre-signed URLs used for transient storage.

  • Edge compute to Ataccama: All cross-account traffic is initiated outbound by the edge compute: pulling container images from an Ataccama-managed registry, exchanging control plane messages over SQS, and assuming roles via STS. No inbound connections to your environment are required.

  • Job execution flow:

    1. The control plane sends a job request via the message queue.

    2. The edge compute retrieves the job image from the OCI registry.

    3. The job processes data and writes results to your storage.

    4. Metadata returns to the control plane.

  • No traffic between components traverses the public internet.

  • Jobs run on demand. There are no persistent workloads when the instance is idle.

  • For high availability, the control plane spans at least two availability zones.

For a detailed breakdown of which data types are stored where and how each is protected, contact your Customer Success Manager.

The edge compute retrieves credentials from a provided secrets store and writes results to your storage using the IAM role you supplied during edge instance setup.

Was this page useful?