User Community Service Desk Downloads

Tutorial: Create reference data from deduplicated catalog columns

This tutorial shows how to extract and deduplicate values from existing catalog items to create new reference data tables.

Scenario overview

You have a catalog item with data that contains duplicates and variations in key attributes. You want to create a clean reference data table with deduplicated values that can be used for standardization and validation.

What you’ll learn

  • How to extract reference data from existing catalog items.

  • How to deduplicate data using transformations.

  • How to create reference tables from processed data.

  • How to use Group Aggregator step for deduplication.

What you’ll need

For this tutorial, prepare:

  • Source catalog item: A catalog item containing data with duplicate or inconsistent values (for example, department names, product categories, or location codes).

  • Target attribute: Identify the specific attribute that contains values suitable for reference data.

Prerequisites

Before starting this tutorial, ensure you have:

  • Appropriate role on reference data tables:

    • Owner or Editor role to create and modify reference data tables.

    • Approver role to publish changes (if using approval workflows).

    • At minimum Viewer role to access table data.

  • Working knowledge of data transformation plans.

  • Sample data or catalog items to work with.

For information about roles and permissions, see User roles.

Step-by-step instructions

  1. Analyze the source data

    1. Open your source catalog item.

    2. Display the record preview.

    3. Identify the attribute that contains duplicate or inconsistent values.

    4. Note the variations and inconsistencies in the data.

  2. Set up the transformation

    1. Go to Data Transformations.

    2. Create a new standalone transformation plan for deduplication.

    3. Load data from your source catalog item.

    4. Select the target attribute for processing.

  3. Configure deduplication steps

    1. In the transformation canvas:

      • Add a Group Aggregator step before any delete operations.

      • In the Group Aggregator configuration:

        • Set Input attribute to your target attribute (for example, department).

        • Set Output attribute to a cleaned name (for example, department_deduplicated).

      • Add a Delete Attribute step after the Group Aggregator.

      • Configure it to remove the original attribute.

  4. Validate and execute

    1. Validate the transformation plan configuration.

    2. Check the data preview to ensure deduplication works correctly.

    3. Publish the transformation plan.

    4. Run the transformation.

  5. Create the reference data table

    1. Once the transformation completes, the output contains deduplicated territory values.

    2. Go to Manage Data > Reference Data > Tables.

    3. Create a new reference data table using the transformation output.

    4. Review the clean, deduplicated territory values.

  6. Verify the results

    1. Check that duplicate values have been removed.

    2. Verify that the reference data table contains only unique values.

    3. Confirm the data quality and consistency of the results.

Expected outcome

You now have a clean reference data table with deduplicated values that can be used for validation, standardization, and other data quality processes.

Next steps

After completing this tutorial:

Was this page useful?