Standardize Data Across Multiple Sources
Combine data from multiple systems that each use different formats for phone numbers, addresses, dates, and codes. Use embedded plans to build reusable standardization logic once, then apply it consistently across all your transformations.
Updates to the embedded plan flow through to every transformation that uses it, so standardized values join cleanly across sources without manual rework.
Why standardize data across sources
Critical business data lives in multiple systems — CRM, billing, support, ERP — and each system uses its own formats:
-
Phone numbers formatted differently: With dashes, parentheses, country codes, or plain digits.
-
Address data is inconsistent: Different capitalization, spacing, and abbreviations.
-
Date fields vary: ISO format, US format, European format.
-
Standardization logic is duplicated: The same cleansing is copied across 10 or more transformation plans.
-
Updates are repetitive: A change in business rules means revisiting every transformation that uses the same logic.
-
No single source of truth: Logic drifts over time, creating inconsistencies.
When you combine data from these sources, format inconsistencies cause failed joins, duplicate records, and misleading reports. Embedded plans solve this by letting you build standardization logic once and reference it from multiple transformation plans:
-
Build standardization logic once as a reusable embedded plan.
-
Reference the embedded plan from multiple standalone transformations.
-
Apply consistent standardization to all source systems.
-
Update logic in one place and every plan inherits the change automatically.
Think of embedded plans as functions: define them once, call them wherever needed.
Embedded plans vs transformation rules
| Embedded plans | Transformation rules | |
|---|---|---|
Primary use |
Reusable logic in standalone plans |
Standards applied to transformation catalog items |
Can be used in |
Standalone plans, other embedded plans, TCIs |
Transformation catalog items only |
Record count |
Can change (filter, aggregate, deduplicate) |
Must stay the same (1 record in = 1 record out) |
Typical examples |
Phone formatting, address standardization, data enrichment |
Code normalization, value cleansing, format fixes |
Example: Unify customer data from CRM, billing, and support
Three source systems hold data about the same customers, but each uses a different format for phone numbers and addresses.
CRM system (crm_customers):
| customer_id | name | phone | address_line1 | city | state | postal_code |
|---|---|---|---|---|---|---|
CRM-001 |
Jane Smith |
555-123-4567 |
123 Main St |
new york |
NY |
10001 |
CRM-002 |
John Doe |
(555) 234-5678 |
456 Oak Ave |
CHICAGO |
il |
60601 |
Billing system (billing_accounts):
| account_email | billing_phone | billing_address | billing_city | billing_state | billing_zip |
|---|---|---|---|---|---|
|
5551234567 |
123 main street |
New York |
NY |
10001 |
|
555.234.5678 |
456 Oak Avenue |
Chicago |
IL |
60601-1234 |
Support system (support_contacts):
| contact_email | support_phone | support_address | support_city | support_region |
|---|---|---|---|---|
|
+1 (555) 123-4567 |
123 Main Street |
New York |
New York |
|
+1 555 234 5678 |
456 Oak Ave. |
Chicago |
Illinois |
Before these sources can be joined, phone numbers and addresses must be normalized to a common format. Without standardization, the join fails — the same customer appears as three separate records.
Expected output (unified, after standardization):
| customer_id | name | phone | address_line1 | state | |
|---|---|---|---|---|---|
CRM-001 |
Jane Smith |
|
5551234567 |
123 Main St |
NY |
CRM-002 |
John Doe |
|
5552345678 |
456 Oak Ave |
IL |
Two embedded plans handle the standardization — one for phone, one for address — so each can be tested independently and reused across any source that needs it.
Step 1: Build the embedded plans
Go to Data Quality > Data transformations, select Create transformation plan > Embedded plan, and create two plans:
Standardize Phone Number
Normalizes phone to digits only.
-
Input step: Needs the phone attribute you want to normalize. In this example,
phone(string). -
Transform data step: Removes dashes, parentheses, dots, and spaces.
Use AI assistance in the expression field.
For example: "Remove all non-digit characters from a phone number, including dashes, parentheses, dots, and spaces."
Country code prefixes (e.g. +1) require additional handling.
Verify with sample data from your source.
|
Standardize Address
Normalizes city to title case, state to uppercase, and trims all fields.
-
Input step: Needs the address attributes you want to normalize. In this example:
address_line1,city,state,postal_code(all string). -
Transform data step: Applies the normalization logic.
| AI assistance works here too. For example: "Capitalize city, uppercase state, trim whitespace from all address fields." |
For both plans:
-
Configure the Output step. Choose Pass all attributes or specify a subset.
-
Validate and publish before moving on.
| When you add an embedded plan step to another plan’s canvas, you must configure it first before the connection ports become active. Select the embedded plan and map the input attributes. |
Step 2: Create the main standalone plan and add inputs
-
Go to Data Quality > Data transformations
-
Select Create transformation plan > Standalone plan
-
Name it — for example,
Unified Customer View— and select Create -
Add a Catalog item input step for each source — CRM, billing, and support
Name each input step clearly — for example, CRM Input, Billing Input, Support Input — to keep the canvas readable.
Step 3: Apply embedded plans to each source
Each source uses different column names.
Before applying an embedded plan, add an Edit schema step to rename the source columns to the names the embedded plan expects (phone, city, state, etc.).
Embedded plans match by name, so mismatched names produce empty output.
For each source, the sequence is:
-
Edit schema: Rename source columns to match what the embedded plan expects.
-
Embedded plan: Apply phone standardization.
-
Embedded plan: Apply address standardization.
| The CRM source already uses the expected column names, so its Edit schema step requires no renaming. |
All three sources now have consistent phone and address formatting.
Step 4: Join the standardized data
With phone numbers in a consistent format across all three sources, joins on phone become reliable. In this example, email is used as the join key:
-
Add a Join step. Join CRM with billing using a left join (to keep all CRM customers).
-
Add a second Join step. Join the result with support using a left join.
| Verify that the join key (email in this example) is consistent across systems before building the join. Records with mismatched or missing values are excluded from the result. |
Step 5: Clean up the schema and output
-
Add a Delete attributes step to remove redundant source-specific columns that shouldn’t appear in the unified output.
-
Add an Edit schema step to rename any remaining columns to unified names.
-
Add an output step. Choose based on where the unified data needs to land:
-
Database output: Use when the unified data needs to be available for downstream querying or reporting.
-
Reference data output: Use when the unified data will serve as reference data for lookups and validations across your organization.
-
File export: Use when the data needs to be delivered to file storage (for example, Amazon S3) for partner systems or data warehouse ingestion.
-
-
Run the plan and verify results before scheduling.
Step 6: Schedule for recurring updates
Set a schedule to keep the unified view current:
-
Daily at 3:00 AM, after source systems have updated.
-
Or use a cron expression for more complex timing.
See Automate Recurring Data Processing for scheduling details.
Reuse the embedded plans in other transformations
Once published, embedded plans are available to any transformation plan in your organization. Apply them wherever the same standardization is needed:
-
Vendor data standardization: Apply
Standardize Phone NumberandStandardize Addressto vendor contact information in a separate transformation plan. -
Transaction processing: Apply
Standardize Phone Numberto customer phone fields andStandardize Addressto shipping addresses, alongside transaction-specific filtering and aggregation.
Update standardization logic
When requirements change, open the embedded plan, update the logic, and republish. All standalone plans and TCIs using that embedded plan automatically inherit the updated logic on their next run — no need to update each plan individually.
| Test changes with sample data before publishing — all plans using the embedded plan are affected. |
Best practices for designing embedded plans
-
Name descriptively: Use names that clearly describe what the plan does, such as
Standardize Phone Number,Parse Full Name, orEnrich with Currency Rates. A clear name helps anyone reusing the plan understand what it does at a glance. -
Keep plans focused: Each embedded plan should do one thing well. Create multiple small embedded plans rather than one complex one so each is easier to test, reuse, and update independently.
-
Document inputs and outputs: Use the description field to explain what attributes the plan expects and what it produces. This is especially important when the plan is shared across teams.
-
Track changes in descriptions: When updating an embedded plan, note what changed in the description. Downstream plans inherit your changes automatically, so anyone reusing this plan needs to know what’s new.
-
Test independently first: Validate each embedded plan with data preview before using it in other plans. Catching issues at the embedded plan level is quicker than tracking them down inside a full pipeline.
-
Build a standardization library: Over time, build a collection of embedded plans for patterns your organization uses repeatedly, such as phone formats, address normalization, date parsing, and name handling. A shared library reduces duplication across all your transformations.
Was this page useful?