Onboard Reference Data

Situation: Reference data is scattered across Excel files, database tables, and systems. Multiple versions exist and it’s unclear which is current or who owns it.

What we need to achieve: Centrally managed reference data with clear ownership, quality controls, and automated distribution.

When to use this approach

Reference data spread across multiple files and systems
Unclear data ownership and governance
Multiple versions of the same data with no single source of truth
Manual processes for sharing reference data

Example approaches

Get inspired by some example approaches.

Approach 1: Simple file import Recommended for beginners

Best for: Clean Excel/CSV files that just need governance.

Implementation steps:

Start with basics: Quick Start - Learn the fundamentals with a sample dataset.
Import your data: Create Reference Data Tables - Import from files.
Set up governance: Set Up Access and Governance - Assign ownership and approval workflows.
Enable access: Work with Published Reference Data - Make data available platform-wide.

Expected outcome: Your scattered files become governed reference data with clear ownership and distribution.

Approach 2: Catalog-based import with data preparation

Best for: Data that exists in catalog but needs quality improvements.

Implementation steps:

Identify source data in Data Catalog and assess quality issues.
Clean the data: Use Data Transformations to standardize formats, handle missing values, and address quality issues.
Import cleaned data: Follow Option 1 steps for the cleaned dataset.
Set up ongoing sync: Export to Database Tutorial - Maintain synchronization with source systems.

Expected outcome: Raw data from systems becomes clean, governed reference data ready for enterprise use.

Approach 3: Handle duplicates during import

Best for: Datasets with known duplicate entries.

Implementation steps:

Assess duplication patterns in your source data.
Set up deduplication: Deduplicate Data Tutorial - Complete workflow for removing duplicates.
Import deduplicated data: Follow Option 1 steps for the cleaned dataset.

Expected outcome: Clean, deduplicated reference data ready for organization-wide use.

Next steps

Monitor and maintain: Set up regular reviews of data quality and governance processes.
Scale your approach: Best Practices - Patterns for rolling out across your organization.
Improve data quality: Improve Data Quality - Use reference data to validate other datasets.
Explore other scenarios: Common Use Cases - Get inspiration from other real-world scenarios.

Was this page useful?