Data Remediation Plans
Overview
Using data remediation plans, users can load data, metadata, and DQ results from monitoring projects into ONE Data for data remediation.
This means, for example, if a data steward is monitoring data and getting reports with poor data quality results, they can fix the issues without leaving the ONE platform.
The plans are initialized from monitoring projects and run automatically with each run of the monitoring project: existing data is overwritten by the new data with each monitoring project run.
When writing to tables in ONE Data, you can decide whether to write all records or only invalid records.
Rules and terms applied to the catalog item in the monitoring project are copied to ONE Data.
Only the rules applied in the monitoring project are active and visible in the ONE Data table, not those associated with applied terms. To facilitate this, rules from terms are disabled by default on the ONE Data table: they are enabled only when they match the rules configured in the monitoring project. It is possible to manually reactivate the disabled rules if required. |
Data remediation plans are located under Data Quality > Transformation plans.
Data remediation plans utilize the same plan canvas design as Data Transformation Plans. As is the case in transformation plans, you can:
-
Preview the data that is output as a result of a selected step, and verify it is functioning as you planned (use the dropdown to switch between steps) For more information, see Data preview.
-
Validate the plan as you go, by selecting Validate plan in the bottom right corner of the canvas. You are alerted of any errors, for example, incorrect expression conditions. Validation issues result in failed monitoring project runs.
Prerequisites
The data source connection of the catalog item you want to use in your remediation plan must have data export enabled. To verify this, go to Data Catalog > Sources > [your data source] > Connections. In the three dots menu ( ⋮ ) of your connection, select Show details. In Connection details, Enable exporting and loading of data should be enabled.
If not, use the three dots menu and select Edit.
Create remediation plan
The start point for data remediation plans is found in monitoring projects.
Data remediation plans do not function correctly unless initialized from monitoring projects. |
-
Navigate to the required project, and select the Configuration & Results tab.
-
For the catalog item in which you want to remediate data, select the three dots menu and then Add data remediation plan.
-
Confirm the auto-generated plan name, or create a new one.
-
You are redirected to the transformation plans canvas that is populated with two steps: A monitoring project post-processing input step that is already populated with your chosen catalog item, and a data remediation output step that needs to be configured according to Configure output step.
Click anywhere on the blue banner to minimize a step. You can then access the three dots menu. For easier editing you can switch to Full screen mode Zoom in or out as needed, and use the compass to recenter when editing. |
Configure output step
-
In Filter records by DQ results select either All records or Invalid records:
-
All records: all records are written to ONE Data.
-
Invalid records: only invalid records are written to ONE Data.
-
-
In ONE Data table, select a table to write into, or create a new one:
-
To create a new table, type the name you want to use for the new table, then select Add {table name}.
If a catalog item with this name already exists, and you have sufficient permissions, the existing table is overridden by the outputs of the plan. -
To write into an existing table, select the table from the list available.
-
Run plan
In the associated monitoring project (i.e. the project from which you created the remediation plan), select Run monitoring.
every run of the plan replaces existing data in the ONE Data table with new data. The plan is run with every monitoring project run. |
After the plan has run, your data table is available in ONE Data. For more information about data remediation in ONE Data, see Data Remediation with ONE Data.
Available steps
The necessary steps for data remediation are provided when you create a data remediation plan. However, there are additional steps available which can be utilized to achieve more advanced use cases. The full list of steps available for you to use and information on how to configure them can be seen below.
To add more steps, use the plus icon in the connection, or in the bottom-left corner of the screen.
Whilst you can select more steps than those listed here, they are not compatible with remediation plans and cause validation to fail. |
File output
Writes data into a text file in ONE object storage.
-
In Name, provide a custom name for the step (optional).
-
In File name, provide the name for the output file.
When writing to a text file, columns will be separated by delimiters. |
Text files are only available to end-users when they are generated as part of a DQ monitoring project, i.e. as part of plans using the Monitoring project post-processing input step which are initialized from the monitoring project. You can find them in the Export tab of the relevant monitoring project. |
Data remediation output
Use ONE Data to create & maintain reference data, as well as fix and remediate errors. For more information, see Data Remediation with ONE Data. |
Writes data into new or existing ONE Data catalog items for data remediation.
-
Add a description to help locate the table in ONE Data (optional).
-
In Filter records by DQ results select either All records or Invalid records:
-
All records: all records are written to ONE Data.
-
Invalid records: only invalid records are written to ONE Data.
-
-
In ONE Data table, select a table to write into, or create a new one:
-
To create a new table, type the name you want to use for the new table, then select Add {table name}.
If a catalog item with this name already exists, and you have sufficient permissions, the existing table is overridden by the outputs of the plan. -
To write into an existing table, select the table from the list available.
-
What is the difference between the data remediation output step and the ONE Data writer step?
The data remediation output step writes the term and rules metadata from catalog items to ONE Data: ONE Data writer step does not copy information about applied terms or rules.
Due to this, there are fewer data transformation options in data remediation plans (which use the data remediation output step) than in general transformation plans, as it is not possible to change attribute structure.
The remediation step also stores DQ results in ONE Data, meaning you can view DQ results and also filter by DQ results.
The data remediation output is designed for the specific use case of import of issues for manual resolution.
ONE Data writer
Writes data into new or existing ONE Data catalog items.
-
In Name, provide a custom name for the step (optional).
-
In ONE Data table, select a table to write into, or create a new one:
-
To create a new table, type the name you want to use for the new table, then select Add {table name}.
If a catalog item with this name already exists, and you have sufficient permissions, the existing table is overridden by the outputs of the plan. -
To write into an existing table, select the table from the list available.
-
-
Use Add attribute and add all the attributes you want to be present in your ONE Data table. Map the attributes to your input data. Attribute names can match the source data, or you can provide alternative names.
Select Write all attributes to include all attributes which are inputs to the step, without having to add them manually.
Output
A generic output step which can be used to embed the plan into another.
-
In Name, provide a custom name for the step (optional).
-
Select Enforce format to limit the visible fields in the output to the attributes defined. Otherwise, the format is defined by the inputs to this step.
-
Using Add attribute, add input attribute names and data types.
Typically, these would correspond with the input endpoints of another plan.
Add attributes
Adds new attributes.
-
In Name, provide a custom name for the step (optional).
-
Provide the attribute name and data type, and define the attribute using ONE expressions. For example, to create a new attribute,
full name
, which concatenates two existing attributes,first name
andlast name
, and also trims the value, you would use the expressiontrim(contactfirstname+' '+contactlastname)
. -
Select Add Expression attribute to add additional attributes, and repeat step 2 as necessary.
Condition
Splits the data flow into two streams based on the ONE expression condition.
-
In Name, provide a custom name for the step (optional).
-
In Condition, define a condition to separate the data using ONE expressions. Data that satisfies the condition will be sent to the output
out_true
, data that does not satisfy the condition will be sent to the outputout_false
. From these outputs, create connections to further steps as required.
Filter
Filters out all records which don’t pass the ONE expression condition. To use:
-
In Name, provide a custom name for the step (optional).
-
Add a condition using ONE expressions. Only data which satisfies this condition will be output to the next step of the plan.
Split
Splits the data flow into three streams based on the ONE expression condition.
-
In Name, provide a custom name for the step (optional).
-
In Condition, define a condition to split the data using ONE expressions. Data which satisfies the condition will be sent to the output
out_true
, data which does not satisfy the condition will be sent to the outputout_false
. All input data will be output toout_all
. From these outputs, create connections to further steps as required.
Monitoring project post processing input
Inputs the data and DQ evaluation results from a monitoring project.
When using the monitoring project post processing input step, initiate the plan from the monitoring project itself to ensure correct processing: don’t add the step in the plan editor using Add data input or Add step. |
-
Locate the required catalog item in the Configuration & Results tab of a monitoring project.
-
Select the three dots menu and then Add post-processing transformation.
-
Provide a name for the plan and select Confirm.
-
Your input step is ready to use.
Data preview
When remediation plans are run, the data gets stored in internal tables in ONE Data (these are not visible in the application).
Select a step to see a preview of the resultant data.
If the step has multiple outputs, you can further select for which output you want to view the data, for example, out_true
or out_false
.
The preview will be retained if you close the plan and re-open it. Likewise, after editing a plan, the preview is not updated unless you manually recompute it: to do this, select Recompute preview.
You can also view two different previews side-by-side, using the Show secondary preview toggle.
Preview is a separate job.
If the job fails, check the Processing center for more details.
Remediation plan preview jobs can be found by going to Processing center > Base jobs > Transformation plan preview jobs, or by searching for job Type DQC_PREVIEW
.
Next steps
With ONE Data you can now view issues, filter results by DQ, edit records, and make other necessary changes before exporting the data back to the data source. For more information, see Data Remediation with ONE Data.
Was this page useful?