User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Data Remediation plans

Overview

plan canvas

Using data remediation plans, users can load data, metadata, and DQ results from monitoring projects into ONE Data for data remediation.

This means, for example, if a data steward is monitoring data and getting reports with poor data quality results, they can fix the issues without leaving the ONE platform.

The plans are initialized from monitoring projects and run automatically with each run of the monitoring project: existing data is overwritten by the new data with each monitoring project run.

When writing to tables in ONE Data, you can decide whether to write all records or only invalid records.

Rules and terms applied to the catalog item in the monitoring project are copied to ONE Data.

Only the rules applied in the monitoring project are active and visible in the ONE Data table, not those associated with applied terms. To facilitate this, rules from terms are disabled by default on the ONE Data table: they are enabled only when they match the rules configured in the monitoring project. It is possible to manually reactivate the disabled rules if required.

Data remediation plans are located under Data Quality > Transformation plans.

listing

Data remediation plans utilize the same plan canvas design as Data Transformation Plans. As is the case in transformation plans, you can:

  • Preview the data that is output as a result of a selected step, and verify it is functioning as you planned (use the dropdown to switch between steps) For more information, see Data preview.

    preview data
  • Validate the plan as you go, by selecting Validate plan in the bottom right corner of the canvas. You are alerted of any errors, for example, incorrect expression conditions. Validation issues result in failed monitoring project runs.

    validate plan

Prerequisites

The data source connection of the catalog item you want to use in your remediation plan must have data export enabled. To verify this, go to Data Catalog > Sources > [your data source] > Connections. In the three dots menu ( ⋮ ) of your connection, select Show details. In Connection details, Enable exporting and loading of data should be enabled. Enable load and export

If not, use the three dots menu and select Edit.

Create remediation plan

The start point for data remediation plans is found in monitoring projects.

Data remediation plans do not function correctly unless initialized from monitoring projects.
  1. Navigate to the required project, and select the Configuration & Results tab.

  2. For the catalog item in which you want to remediate data, select the three dots menu and then Add data remediation plan.

    add remediation plan
  3. Confirm the auto-generated plan name, or create a new one.

    name plan

  4. You are redirected to the transformation plans canvas that is populated with two steps: A monitoring project post-processing input step that is already populated with your chosen catalog item, and a data remediation output step that needs to be configured according to Configure output step.

    automatically configured input and output

Click anywhere on the blue banner to minimize a step. You can then access the three dots menu.

minimize step
show three dots menu

For easier editing you can switch to Full screen mode

switch to full screen mode

Zoom in or out as needed, and use the compass to recenter when editing.

zoom or recenter plan

Configure output step

  1. In Filter records by DQ results select either All records or Invalid records:

    • All records: all records are written to ONE Data.

    • Invalid records: only invalid records are written to ONE Data.

  2. In ONE Data table, select a table to write into, or create a new one:

    1. To create a new table, type the name you want to use for the new table, then select Add {table name}.

      Write new table
      If a catalog item with this name already exists, and you have sufficient permissions, the existing table is overridden by the outputs of the plan.
    2. To write into an existing table, select the table from the list available.

Run plan

In the associated monitoring project (i.e. the project from which you created the remediation plan), select Run monitoring.

every run of the plan replaces existing data in the ONE Data table with new data. The plan is run with every monitoring project run.

After the plan has run, your data table is available in ONE Data. For more information about data remediation in ONE Data, see Data Remediation with ONE Data.

Available steps

The necessary steps for data remediation are provided when you create a data remediation plan. However, there are additional steps available which can be utilized to achieve more advanced use cases. The full list of steps available for you to use and information on how to configure them can be seen below.

To add more steps, use the plus icon in the connection, or in the bottom-left corner of the screen.

Whilst you can select more steps than those listed here, they are not compatible with remediation plans and cause validation to fail.
add new step

File output

Writes data into a text file in ONE object storage.

  1. In Name, provide a custom name for the step (optional).

  2. In File name, provide the name for the output file.

When writing to a text file, columns will be separated by delimiters.
file output step
Text files are only available to end-users when they are generated as part of a DQ monitoring project, i.e. as part of plans using the Monitoring project post-processing input step which are initialized from the monitoring project. You can find them in the Export tab of the relevant monitoring project.

Data remediation output

Use ONE Data to create & maintain reference data, as well as fix and remediate errors. For more information, see Data Remediation with ONE Data.

Writes data into new or existing ONE Data catalog items for data remediation.

  1. Add a description to help locate the table in ONE Data (optional).

  2. In Filter records by DQ results select either All records or Invalid records:

    • All records: all records are written to ONE Data.

    • Invalid records: only invalid records are written to ONE Data.

  3. In ONE Data table, select a table to write into, or create a new one:

    1. To create a new table, type the name you want to use for the new table, then select Add {table name}.

      Write new table
      If a catalog item with this name already exists, and you have sufficient permissions, the existing table is overridden by the outputs of the plan.
    2. To write into an existing table, select the table from the list available.

What is the difference between the data remediation output step and the ONE Data writer step?

The data remediation output step writes the term and rules metadata from catalog items to ONE Data: ONE Data writer step does not copy information about applied terms or rules.

Due to this, there are fewer data transformation options in data remediation plans (which use the data remediation output step) than in general transformation plans, as it is not possible to change attribute structure.

The remediation step also stores DQ results in ONE Data, meaning you can view DQ results and also filter by DQ results.

The data remediation output is designed for the specific use case of import of issues for manual resolution.

ONE Data writer

Writes data into new or existing ONE Data catalog items.

  1. In Name, provide a custom name for the step (optional).

  2. In ONE Data table, select a table to write into, or create a new one:

    1. To create a new table, type the name you want to use for the new table, then select Add {table name}.

      Write new table
      If a catalog item with this name already exists, and you have sufficient permissions, the existing table is overridden by the outputs of the plan.
    2. To write into an existing table, select the table from the list available.

      Select existing table
  3. Use Add attribute and add all the attributes you want to be present in your ONE Data table. Map the attributes to your input data. Attribute names can match the source data, or you can provide alternative names.

    Select Write all attributes to include all attributes which are inputs to the step, without having to add them manually.
ONE data writer step

Output

A generic output step which can be used to embed the plan into another.

  1. In Name, provide a custom name for the step (optional).

  2. Select Enforce format to limit the visible fields in the output to the attributes defined. Otherwise, the format is defined by the inputs to this step.

  3. Using Add attribute, add input attribute names and data types.

    Typically, these would correspond with the input endpoints of another plan.
generic output step

Add attributes

Adds new attributes.

  1. In Name, provide a custom name for the step (optional).

  2. Provide the attribute name and data type, and define the attribute using ONE expressions. For example, to create a new attribute, full name, which concatenates two existing attributes, first name and last name, and also trims the value, you would use the expression trim(contactfirstname+' '+contactlastname).

    add attributes step
  3. Select Add Expression attribute to add additional attributes, and repeat step 2 as necessary.

Condition

Splits the data flow into two streams based on the ONE expression condition.

  1. In Name, provide a custom name for the step (optional).

  2. In Condition, define a condition to separate the data using ONE expressions. Data that satisfies the condition will be sent to the output out_true, data that does not satisfy the condition will be sent to the output out_false. From these outputs, create connections to further steps as required.

condition step

Filter

Filters out all records which don’t pass the ONE expression condition. To use:

  1. In Name, provide a custom name for the step (optional).

  2. Add a condition using ONE expressions. Only data which satisfies this condition will be output to the next step of the plan.

filter step

Split

Splits the data flow into three streams based on the ONE expression condition.

  1. In Name, provide a custom name for the step (optional).

  2. In Condition, define a condition to split the data using ONE expressions. Data which satisfies the condition will be sent to the output out_true, data which does not satisfy the condition will be sent to the output out_false. All input data will be output to out_all. From these outputs, create connections to further steps as required.

split step

Monitoring project post processing input

Inputs the data and DQ evaluation results from a monitoring project.

When using the monitoring project post processing input step, initiate the plan from the monitoring project itself to ensure correct processing: don’t add the step in the plan editor using Add data input or Add step.
  1. Locate the required catalog item in the Configuration & Results tab of a monitoring project.

  2. Select the three dots menu and then Add post-processing transformation.

  3. Provide a name for the plan and select Confirm.

  4. Your input step is ready to use.

Data preview

select step for preview

When remediation plans are run, the data gets stored in internal tables in ONE Data (these are not visible in the application). Select a step to see a preview of the resultant data. If the step has multiple outputs, you can further select for which output you want to view the data, for example, out_true or out_false.

The preview will be retained if you close the plan and re-open it. Likewise, after editing a plan, the preview is not updated unless you manually recompute it: to do this, select Recompute preview.

recompute preview

You can also view two different previews side-by-side, using the Show secondary preview toggle.

secondary preview

Preview is a separate job. If the job fails, check the Processing center for more details. Remediation plan preview jobs can be found by going to Processing center > Base jobs > Transformation plan preview jobs, or by searching for job Type DQC_PREVIEW.

Next steps

With ONE Data you can now view issues, filter results by DQ, edit records, and make other necessary changes before exporting the data back to the data source. For more information, see Data Remediation with ONE Data.

Was this page useful?