Tutorial: Create reference data from deduplicated catalog columns
This tutorial shows how to extract and deduplicate values from existing catalog items to create new reference data tables.
Scenario overview
You have a catalog item with data that contains duplicates and variations in key attributes. You want to create a clean reference data table with deduplicated values that can be used for standardization and validation.
What you’ll learn
-
How to extract reference data from existing catalog items.
-
How to deduplicate data using transformations.
-
How to create reference tables from processed data.
-
How to use Group Aggregator step for deduplication.
What you’ll need
For this tutorial, prepare:
-
Source catalog item: A catalog item containing data with duplicate or inconsistent values (for example, department names, product categories, or location codes).
-
Target attribute: Identify the specific attribute that contains values suitable for reference data.
Prerequisites
Before starting this tutorial, ensure you have:
-
Appropriate role on reference data tables:
-
Owner or Editor role to create and modify reference data tables.
-
Approver role to publish changes (if using approval workflows).
-
At minimum Viewer role to access table data.
-
-
Working knowledge of data transformation plans.
-
Sample data or catalog items to work with.
For information about roles and permissions, see User roles. |
Step-by-step instructions
-
Analyze the source data
-
Open your source catalog item.
-
Display the record preview.
-
Identify the attribute that contains duplicate or inconsistent values.
-
Note the variations and inconsistencies in the data.
-
-
Set up the transformation
-
Go to Data Transformations.
-
Create a new standalone transformation plan for deduplication.
-
Load data from your source catalog item.
-
Select the target attribute for processing.
-
-
Configure deduplication steps
-
In the transformation canvas:
-
Add a Group Aggregator step before any delete operations.
-
In the Group Aggregator configuration:
-
Set Input attribute to your target attribute (for example,
department
). -
Set Output attribute to a cleaned name (for example,
department_deduplicated
).
-
-
Add a Delete Attribute step after the Group Aggregator.
-
Configure it to remove the original attribute.
-
-
-
Validate and execute
-
Validate the transformation plan configuration.
-
Check the data preview to ensure deduplication works correctly.
-
Publish the transformation plan.
-
Run the transformation.
-
-
Create the reference data table
-
Once the transformation completes, the output contains deduplicated territory values.
-
Go to Manage Data > Reference Data > Tables.
-
Create a new reference data table using the transformation output.
-
Review the clean, deduplicated territory values.
-
-
Verify the results
-
Check that duplicate values have been removed.
-
Verify that the reference data table contains only unique values.
-
Confirm the data quality and consistency of the results.
-
Expected outcome
You now have a clean reference data table with deduplicated values that can be used for validation, standardization, and other data quality processes.
Next steps
After completing this tutorial:
-
Export to database: Export to Database.
-
Validate data quality: Create Validation Rules.
-
Learn best practices: Best Practices.
-
Use published data: Work with Published Reference Data.
-
Troubleshoot issues: Troubleshooting.
Was this page useful?