Onboard Reference Data
Situation: Reference data is scattered across Excel files, database tables, and systems. Multiple versions exist and it’s unclear which is current or who owns it.
What we need to achieve: Centrally managed reference data with clear ownership, quality controls, and automated distribution.
When to use this approach
-
Reference data spread across multiple files and systems
-
Unclear data ownership and governance
-
Multiple versions of the same data with no single source of truth
-
Manual processes for sharing reference data
Example approaches
Get inspired by some example approaches.
Approach 1: Simple file import Recommended for beginners
Best for: Clean Excel/CSV files that just need governance.
-
Start with basics: Quick Start - Learn the fundamentals with a sample dataset.
-
Import your data: Create Reference Data Tables - Import from files.
-
Set up governance: Set Up Access and Governance - Assign ownership and approval workflows.
-
Enable access: Work with Published Reference Data - Make data available platform-wide.
Expected outcome: Your scattered files become governed reference data with clear ownership and distribution.
Approach 2: Catalog-based import with data preparation
Best for: Data that exists in catalog but needs quality improvements.
-
Identify source data in Data Catalog and assess quality issues.
-
Clean the data: Use Data Transformations to standardize formats, handle missing values, and address quality issues.
-
Import cleaned data: Follow Option 1 steps for the cleaned dataset.
-
Set up ongoing sync: Export to Database Tutorial - Maintain synchronization with source systems.
Expected outcome: Raw data from systems becomes clean, governed reference data ready for enterprise use.
Approach 3: Handle duplicates during import
Best for: Datasets with known duplicate entries.
-
Assess duplication patterns in your source data.
-
Set up deduplication: Deduplicate Data Tutorial - Complete workflow for removing duplicates.
-
Import deduplicated data: Follow Option 1 steps for the cleaned dataset.
Expected outcome: Clean, deduplicated reference data ready for organization-wide use.
Next steps
-
Monitor and maintain: Set up regular reviews of data quality and governance processes.
-
Scale your approach: Best Practices - Patterns for rolling out across your organization.
-
Improve data quality: Improve Data Quality - Use reference data to validate other datasets.
-
Explore other scenarios: Common Use Cases - Get inspiration from other real-world scenarios.
Was this page useful?