Retention Settings
Overview
You can select how long you would like to store the results of monitoring projects and results of data quality evaluation on catalog items. This can be for maintenance reasons or for security, for example, in case there is a company policy that prevents data from being kept longer.
Retention period can be based on the number of days or runs. In other words, you can choose to delete results older than the defined number of days or runs.
You can set retention separately for the following:
-
Monitoring projects:
-
Results of monitoring projects.
-
-
Catalog items:
-
Results of data quality (DQ) evaluation on catalog items.
-
Invalid samples, that is, the samples of records that have failed the data quality evaluation on the catalog item.
-
The retention period is set on the global level, as described in this article.
In addition, for monitoring projects, you can configure retention settings per project. For more information, see Monitoring Project Results, Reports, and Notifications, section Configure retention per monitoring project.
Where can I find retention settings?
For monitoring project retention settings, go to Global settings > Retention settings > Monitoring Project retention.
For catalog item retention settings, go to Global settings > Retention settings > Catalog Item retention.
When is data deleted?
For monitoring projects, the data is deleted when:
-
A run is older than either the global or project configuration allows.
-
There are more runs than either the global or project configuration allows.
-
A power user manually deletes the processing in Metadata Management Module (MMM).
For catalog items, the data is deleted when:
-
A run is older than the global configuration allows.
-
There are more runs than the global configuration allows.
What data is deleted?
For monitoring projects:
-
Processing in MMM.
-
Invalid samples from ONE Object Storage.
-
DQ results from DQ storage.
-
Filter values from DQ storage.
-
Post-processing plan (PPP) exports from ONE Object Storage.
For catalog items (results of DQ evaluation):
-
Processing in MMM.
-
Invalid samples from ONE Object Storage.
-
DQ results from DQ storage.
-
Filter values from DQ storage.
-
Post-processing plan (PPP) exports from ONE Object Storage.
For catalog items (invalid samples):
-
Invalid samples from ONE Object Storage.
When is the retention policy checked?
ONE checks the configured retention policy and, if needed, removes the stored results in the following instances:
-
For monitoring projects:
-
A monitoring project run is started.
-
After a global retention policy is changed and the changes are published.
-
A regular retention policy check that is scheduled using the external property
plugin.monitoring-project.ataccama.one.retention-check.interval
. The default interval is 1 hour. For more information, see MMM Configuration.
-
-
For catalog items:
-
DQ evaluation for the catalog item is started (either adhoc or scheduled).
-
After a global retention policy is changed and the changes are published.
-
A regular retention policy check that is scheduled using the external property
plugin.dqeval.ataccama.one.retention-check.interval
. The default interval is 1 hour. For more information, see MMM Configuration.
-
Set monitoring project retention
By default, monitoring project results are stored indefinitely.
In addition to setting the retention globally, it is possible to configure retention individually for each monitoring project. For more information, see Monitoring Project Results, Reports, and Notifications, section Configure retention per monitoring project. |
To configure the deletion of data quality results:
-
Go to Global settings > Retention settings > Monitoring project retention.
-
In Data Quality results, select Delete DQ results storage after.
-
Choose between Days and Runs, and enter the number as required:
-
If you select Runs, the number corresponds to the number of runs that are kept. For example, Delete DQ results storage after 50 runs means that only the results for the latest 50 runs are stored. This is most suitable if your aim is to reduce the amount of unnecessary data that is retained.
-
If you select Days, the number corresponds to the number of days the results are stored for. For example, Delete DQ results after 50 days means all results older than 50 days are removed. This is most suitable if data can only be stored for a limited amount of time for security reasons.
-
Set catalog item retention
Set retention period on DQ results and data samples
By default, DQ results are stored indefinitely.
To configure the deletion of data quality results:
-
Go to Global settings > Retention settings > Catalog Item retention.
-
In Data Quality results, select Delete DQ results storage after.
-
Choose between Days and Runs, and enter the number as required:
-
If you select Runs, the number corresponds to the number of runs that are kept. For example, Delete DQ results storage after 50 runs means that only the results for the latest 50 runs are stored. This is most suitable if your aim is to reduce the amount of unnecessary data that is retained.
-
If you select Days, the number corresponds to the number of days the results are stored for. For example, Delete DQ results after 50 days means all results older than 50 days are removed. This is most suitable if data can only be stored for a limited amount of time for security reasons.
-
Set retention period on invalid data samples
Storing invalid samples is turned off by default.
To set this functionality globally, follow these steps. Once this is done, then the catalog item owner can set up invalid samples individually for catalog items (see Catalog Items, section Invalid samples).
To enable storing invalid samples and configure their retention:
-
Navigate to Global settings > Retention settings > Catalog Item retention.
-
In Invalid samples, select Enable storage of invalid samples by default.
-
Select Delete invalid samples after set time period.
-
Choose between Days and Runs, and enter the number as required:
-
If you select Runs, the number corresponds to the number of runs that are kept. For example, Delete invalid samples after 50 runs means that only the results for the latest 50 runs are stored. This is most suitable if your aim is to reduce the amount of unnecessary data that is retained.
-
If you select Days, the number corresponds to the number of days the results are stored for. For example, Delete invalid samples after 50 days means all results older than 50 days are removed. This is most suitable if data can only be stored for a limited amount of time for security reasons.
-
-
The data owner or steward can now enable a preview of invalid samples for selected catalog items. For more information, see Catalog Items, section Invalid samples.
See all catalog items with invalid samples enabled
Once the Enable storage of invalid samples option is on, you can select Display list of Catalog items to view a list of all catalog items that have a preview of invalid samples enabled.
The option is available only to users with the Administrator
global role assigned.
See also Invalid Samples.
This is particularly useful for monitoring sensitive data. |
Was this page useful?