Backup and Restore Architecture
This chapter focuses on the design of the Ataccama PaaS ONE service for backup and restoration of the platform and its infrastructure.
Backup
The following sections provide details regarding the architecture of Ataccama ONE PaaS and customer data backup services.
Ataccama ONE PaaS backup architecture
Ataccama ONE PaaS is created in accordance with infrastructure as code principles and techniques. This means that the whole Ataccama ONE PaaS solution, including Ataccama ONE itself, is deployed based on code and application images stored and maintained in the Ataccama cloud GitLab.
Thanks to this architecture, backing up the whole infrastructure in practice corresponds to backing up the Ataccama cloud GitLab resources. The following diagram shows the backup service design of Ataccama ONE PaaS:
Ataccama ONE PaaS backup parameters are optimized to balance data security and efficiency of the backup process.
The parameters are described in the following table:
Customer environment | Backup area | Backup frequency | Backup retention | Backup location |
---|---|---|---|---|
Ataccama ONE PaaS |
GitLab repository files |
1 hour |
10 days |
The same cloud provider region as for original data. |
Customer data backup architecture
Customer data backup is an integral part of the Ataccama ONE PaaS service that is fully automated and preconfigured, with no customer action required. For disaster recovery purposes, backup snapshots are stored in an S3 bucket. At rest, backups are encrypted using AES-256 encryption.
The Amazon S3 service ensures durability of stored objects by replicating them across multiple facilities located in the same regulatory region.
Ataccama ONE PaaS uses two different backup techniques:
-
Automated backups of customer data that are stored in a relational database hosted by the cloud provider as part of PostgreSQL managed services.
Data snapshots are created using an internal database mechanism and kept in reserved storage used by the database managed service. Backups are always stored in the same cloud provider region as original data (in AWS environments) or in a paired region (in Azure environments).
-
Automated backups of customer data and Ataccama ONE module configurations that are stored in files using the Velero service integrated with Ataccama ONE PaaS.
Velero receives a list of persistent volumes that need to be backed up, takes snapshots of them, and transfers them to a block storage unit dedicated to the customer. Backups are always stored in the same cloud provider region as original data.
The following diagrams provide an overview of the backup service design in Ataccama ONE PaaS deployed in an AWS or Azure environment:
Customer data backup parameters are optimized to balance data security and efficiency of the backup process as well as the amount of storage needed to hold backed up snapshots.
The parameters are described in the following table:
Customer environment | Backup area | Backup frequency | Backup retention | Backup location |
---|---|---|---|---|
PROD |
PaaS relational database |
5 minutes |
10 days |
The same cloud provider region as the original data. |
File system (ONE module configurations) |
1 hour |
10 days |
The same cloud provider region as the original data. |
|
DEV, TEST |
PaaS relational database |
5 minutes |
10 days |
The same cloud provider region as the original data. |
File system (ONE module configurations) |
24 hours |
10 days |
The same cloud provider region as the original data. |
Restore
The following sections describe the restore service architecture for Ataccama ONE PaaS and the customer data.
Ataccama ONE PaaS restore architecture
As mentioned previously, Ataccama ONE PaaS is created in accordance with infrastructure as code principles and techniques, meaning that the whole Ataccama ONE PaaS solution, including Ataccama ONE itself, is deployed based on code and application images stored and maintained in the Ataccama cloud GitLab.
Thanks to this, restoring the platform and its infrastructure corresponds in practice to a fresh installation from Ataccama cloud GitLab resources using Terraform, Helm charts, and other automation tools.
The following diagram shows the restore service design of Ataccama ONE PaaS:
Customer data restore architecture
The customer data restore process uses the same mechanisms and services as the backup service. Restore mechanisms are an integral, highly automated part of Ataccama ONE PaaS. The restore process is initiated and managed by the Ataccama Operations team following a disaster situation.
For more information about restore scenarios for disaster situations, see Disaster recovery.
Ataccama ONE PaaS uses two different restore techniques, in analogy to backup methods:
-
Automated restore of customer data from relational database backups.
An Ataccama operator initiates this process using native tools and options offered by the cloud provider as part of managed database services. Data can be recovered from a specific snapshot based on your request. In case of recovery from a disaster situation, the latest snapshot is used unless requested otherwise.
-
Automated restore of customer data (originally stored on a filesystem) from backups using the Velero service integrated with Ataccama ONE PaaS.
An Ataccama operator initiates this process using native tools and options offered by the Velero system. A snapshot is provided to Velero, which then transfers the data back to its original location. Data can be recovered from a specific snapshot based on your request. In case of recovery from a disaster situation, the latest snapshot is used unless requested otherwise.
The following diagrams provide an overview of the restore service design in Ataccama ONE PaaS deployed in an AWS or Azure environment:
Disaster recovery
The disaster recovery (DR) process is activated based on the scenarios defined in the following section and in cooperation with the customer.
Data loss or corruption: Data stored on filesystem
During usual system operation, files produced by Ataccama ONE are stored to an external directly mounted filesystem provided as a service by the cloud provider. Corruption or loss of data can occur due to software bugs, human error, or storage service problems.
Once the Ataccama Operations team detects the issue, they inform the customer and perform a semi-automated restore, which means that files are restored to their last backed up state (unless agreed otherwise).
To ensure the consistency of the Ataccama ONE Platform data set, files must also be restored on the filesystem used by Ataccama ONE PaaS. The specifics of this procedure are then discussed with the customer, including the exact recovery point that will be restored.
The following restore service parameters apply in this scenario:
Customer environment | RTO | RPO |
---|---|---|
PROD |
Best effort approach* |
1 hour (or longer, if agreed upon with the customer for a particular case) |
DEV, TEST |
Best effort approach* |
1 hour (or longer, if agreed upon with the customer for a particular case) |
*Restoring only a part of the environment requires a hands-on course of action and coordination with data stewards. Therefore, RTO depends on the specific situation and its complexity.
Data loss or corruption: Data stored in database
During usual system operation, data produced by Ataccama ONE is stored to the database provided as a service by the cloud provider. Corruption or loss of data can occur due to software bugs, human error, or storage service problems.
Once the Ataccama Operations team detects the issue, they inform the customer and perform a semi-automated restore, which means that the data is restored to its last backed up state (unless agreed otherwise).
To ensure the consistency of the Ataccama ONE Platform dataset, the data stored in the database within Ataccama ONE PaaS must also be restored. The specifics of this procedure are then discussed with the customer, including the exact recovery point that will be restored.
The following restore service parameters apply in this scenario:
Customer environment | RTO | RPO |
---|---|---|
PROD |
Best effort approach* |
5 minutes (or longer, if agreed upon with the customer for a particular case) |
DEV, TEST |
Best effort approach* |
5 minutes (or longer, if agreed upon with the customer for a particular case) |
*Restoring only a part of the environment requires a hands-on course of action and coordination with data stewards. Therefore, RTO depends on the specific situation and its complexity.
Environment loss or corruption: Ataccama ONE PaaS
A substantial part or the whole environment of Ataccama ONE PaaS is corrupted, lost, or misconfigured to the point that it cannot be operated without restore. Corruption or loss of the environment can occur due to human error, serious connectivity problems, cloud service provider malfunctions, or software bugs.
Once the Ataccama Operations team detects the issue, they inform the customer and restore the service in the same or different cloud provider region depending on the mutual agreement. The specific location is selected based on the disaster situation and must be approved by the customer.
In case the environment needs to be completely restored, a fresh installation is performed instead while customer data and configurations are retrieved using automation tools. All customer data is restored to its latest backed up state unless agreed otherwise.
The following restore service parameters apply in this scenario:
Customer environment | RTO | RPO |
---|---|---|
PROD |
48 hours |
1 hour (or longer, if agreed upon with the customer for a particular case) |
DEV, TEST |
48 hours |
1 hour (or longer, if agreed upon with the customer for a particular case) |
Was this page useful?