User Community Service Desk Downloads

Onboard Reference Data

Situation: Reference data is scattered across Excel files, database tables, and systems. Multiple versions exist and it’s unclear which is current or who owns it.

What we need to achieve: Centrally managed reference data with clear ownership, quality controls, and automated distribution.

When to use this approach

  • Reference data spread across multiple files and systems

  • Unclear data ownership and governance

  • Multiple versions of the same data with no single source of truth

  • Manual processes for sharing reference data

Example approaches

Get inspired by some example approaches.

Best for: Clean Excel/CSV files that just need governance.

Implementation steps:
  1. Start with basics: Quick Start - Learn the fundamentals with a sample dataset.

  2. Import your data: Create Reference Data Tables - Import from files.

  3. Set up governance: Set Up Access and Governance - Assign ownership and approval workflows.

  4. Enable access: Work with Published Reference Data - Make data available platform-wide.

Expected outcome: Your scattered files become governed reference data with clear ownership and distribution.


Approach 2: Catalog-based import with data preparation

Best for: Data that exists in catalog but needs quality improvements.

Implementation steps:
  1. Identify source data in Data Catalog and assess quality issues.

  2. Clean the data: Use Data Transformations to standardize formats, handle missing values, and address quality issues.

  3. Import cleaned data: Follow Option 1 steps for the cleaned dataset.

  4. Set up ongoing sync: Export to Database Tutorial - Maintain synchronization with source systems.

Expected outcome: Raw data from systems becomes clean, governed reference data ready for enterprise use.


Approach 3: Handle duplicates during import

Best for: Datasets with known duplicate entries.

Implementation steps:
  1. Assess duplication patterns in your source data.

  2. Set up deduplication: Deduplicate Data Tutorial - Complete workflow for removing duplicates.

  3. Import deduplicated data: Follow Option 1 steps for the cleaned dataset.

Expected outcome: Clean, deduplicated reference data ready for organization-wide use.

Next steps

  • Monitor and maintain: Set up regular reviews of data quality and governance processes.

  • Scale your approach: Best Practices - Patterns for rolling out across your organization.

  • Improve data quality: Improve Data Quality - Use reference data to validate other datasets.

  • Explore other scenarios: Common Use Cases - Get inspiration from other real-world scenarios.

Was this page useful?