User Community Service Desk Downloads

Tutorial: Create reference data from deduplicated catalog columns

This tutorial shows how to extract and deduplicate values from existing catalog items to create new reference data tables.

Scenario overview

You have a catalog item with data that contains duplicates and variations in key attributes. You want to create a clean reference data table with deduplicated values that can be used for standardization and validation.

What you’ll learn

  • How to create reference data directly from existing catalog items.

  • How to use data transformations to deduplicate data during reference data creation.

  • How to configure the transformation pipeline with input and reference data output.

  • How to use Group Aggregator step for deduplication.

What you’ll need

For this tutorial, prepare:

  • Source catalog item: A catalog item containing data with duplicate or inconsistent values (for example, department names, product categories, or location codes).

  • Target attribute: Identify the specific attribute that contains values suitable for reference data.

  • Access to catalog item: Ensure you have permissions to create reference data from the catalog item.

Prerequisites

Before starting this tutorial, ensure you have:

  • Appropriate role on reference data tables:

    • Owner or Editor role to create and modify reference data tables.

    • Approver role to publish changes (if using approval workflows).

    • At minimum Viewer role to access table data.

  • Working knowledge of data transformation plans.

  • Sample data or catalog items to work with.

For information about roles and permissions, see User roles. For an introduction to data transformations, see Data Transformations Overview.

Step-by-step instructions

  1. Analyze the source data

    1. Open your source catalog item.

    2. Display the record preview.

    3. Identify the attribute that contains duplicate or inconsistent values.

    4. Note the variations and inconsistencies in the data.

  2. Create reference data from catalog item

    1. Go to your source catalog item in the data catalog.

    2. Select the three dots menu.

    3. Select Create reference data.

    4. Configure the data loading settings as usual.

    5. Check Open the dataset in data transformations to prepare it before loading.

  3. Configure deduplication steps

    1. The transformation opens with the input (your catalog item) and Reference data output already added.

    2. In the transformation canvas:

      • Add a Group Aggregator step before any delete operations.

      • In the Group Aggregator configuration:

        • Set the Group by field to the target attribute (for example, department).

        • Set Input attribute to your target attribute (for example, department).

        • Set the Aggregation function to ANY_VALUE.

        • Set Output attribute to a cleaned name (for example, department_deduplicated).

      • (Optional) Add a Delete Attribute step after the Group Aggregator.

      • (Optional) Configure it to remove the original attribute.

  4. Validate and execute

    1. Validate the transformation plan configuration.

    2. Check the data preview to ensure deduplication works correctly.

    3. Publish the transformation plan.

    4. Run the transformation.

  1. Once the transformation completes, the output contains deduplicated values.

  2. The reference data table is automatically created from the transformation output. Search for it or go to Manage reference data > Tables to find it.

  3. Review the clean, deduplicated values in the newly created reference data table.

  1. Verify the results

    1. Check that duplicate values have been removed.

    2. Verify that the reference data table contains only unique values.

    3. Confirm the data quality and consistency of the results.

Expected outcome

You now have a clean reference data table with deduplicated values that can be used for validation, standardization, and other data quality processes.

Next steps

After completing this tutorial:

Was this page useful?