Standardize Data Across Multiple Sources

Combine data from multiple systems that each use different formats for phone numbers, addresses, dates, and codes. Use embedded plans to build reusable standardization logic once, then apply it consistently across all your transformations.

Updates to the embedded plan flow through to every transformation that uses it, so standardized values join cleanly across sources without manual rework.

Why standardize data across sources

Critical business data lives in multiple systems — CRM, billing, support, ERP — and each system uses its own formats:

Phone numbers formatted differently: With dashes, parentheses, country codes, or plain digits.
Address data is inconsistent: Different capitalization, spacing, and abbreviations.
Date fields vary: ISO format, US format, European format.
Standardization logic is duplicated: The same cleansing is copied across 10 or more transformation plans.
Updates are repetitive: A change in business rules means revisiting every transformation that uses the same logic.
No single source of truth: Logic drifts over time, creating inconsistencies.

When you combine data from these sources, format inconsistencies cause failed joins, duplicate records, and misleading reports. Embedded plans solve this by letting you build standardization logic once and reference it from multiple transformation plans:

Build standardization logic once as a reusable embedded plan.
Reference the embedded plan from multiple standalone transformations.
Apply consistent standardization to all source systems.
Update logic in one place and every plan inherits the change automatically.

Think of embedded plans as functions: define them once, call them wherever needed.

Embedded plans vs transformation rules

	Embedded plans	Transformation rules
Primary use	Reusable logic in standalone plans	Standards applied to transformation catalog items
Can be used in	Standalone plans, other embedded plans, TCIs	Transformation catalog items only
Record count	Can change (filter, aggregate, deduplicate)	Must stay the same (1 record in = 1 record out)
Typical examples	Phone formatting, address standardization, data enrichment	Code normalization, value cleansing, format fixes

Embedded plans

Transformation rules

Primary use

Reusable logic in standalone plans

Standards applied to transformation catalog items

Can be used in

Standalone plans, other embedded plans, TCIs

Transformation catalog items only

Record count

Can change (filter, aggregate, deduplicate)

Must stay the same (1 record in = 1 record out)

Typical examples

Phone formatting, address standardization, data enrichment

Code normalization, value cleansing, format fixes

Example: Unify customer data from CRM, billing, and support

Three source systems hold data about the same customers, but each uses a different format for phone numbers and addresses.

CRM system (crm_customers):

customer_id	name	phone	address_line1	city	state	postal_code
CRM-001	Jane Smith	555-123-4567	123 Main St	new york	NY	10001
CRM-002	John Doe	(555) 234-5678	456 Oak Ave	CHICAGO	il	60601

customer_id

name

phone

address_line1

city

state

postal_code

CRM-001

Jane Smith

555-123-4567

123 Main St

new york

10001

CRM-002

John Doe

(555) 234-5678

456 Oak Ave

CHICAGO

60601

Billing system (billing_accounts):

account_email billing_phone billing_address billing_city billing_state billing_zip

account_email	billing_phone	billing_address	billing_city	billing_state	billing_zip
`jane.smith@email.com`	5551234567	123 main street	New York	NY	10001
`john.doe@email.com`	555.234.5678	456 Oak Avenue	Chicago	IL	60601-1234

jane.smith@email.com

5551234567

123 main street

New York

10001

john.doe@email.com

555.234.5678

456 Oak Avenue

Chicago

60601-1234

Support system (support_contacts):

contact_email support_phone support_address support_city support_region

contact_email	support_phone	support_address	support_city	support_region
`jane.smith@email.com`	+1 (555) 123-4567	123 Main Street	New York	New York
`john.doe@email.com`	+1 555 234 5678	456 Oak Ave.	Chicago	Illinois

jane.smith@email.com

+1 (555) 123-4567

123 Main Street

New York

john.doe@email.com

+1 555 234 5678

456 Oak Ave.

Chicago

Illinois

Before these sources can be joined, phone numbers and addresses must be normalized to a common format. Without standardization, the join fails — the same customer appears as three separate records.

Expected output (unified, after standardization):

customer_id name email phone address_line1 state

customer_id	name	email	phone	address_line1	state
CRM-001	Jane Smith	`jane.smith@email.com`	5551234567	123 Main St	NY
CRM-002	John Doe	`john.doe@email.com`	5552345678	456 Oak Ave	IL

CRM-001

Jane Smith

jane.smith@email.com

5551234567

123 Main St

CRM-002

John Doe

john.doe@email.com

5552345678

456 Oak Ave

Two embedded plans handle the standardization — one for phone, one for address — so each can be tested independently and reused across any source that needs it.

Step 1: Build the embedded plans

Go to Data quality > Data transformations, select Create transformation plan > Embedded plan, and create two plans:

Standardize Phone Number

Normalizes phone to digits only.

Input step: Needs the phone attribute you want to normalize. In this example, phone (string).
Transform data step: Removes dashes, parentheses, dots, and spaces.

Use AI assistance in the expression field. For example: "Remove all non-digit characters from a phone number, including dashes, parentheses, dots, and spaces." Country code prefixes (e.g. +1) require additional handling. Verify with sample data from your source.

Standardize Address

Normalizes city to title case, state to uppercase, and trims all fields.

Input step: Needs the address attributes you want to normalize. In this example: address_line1, city, state, postal_code (all string).
Transform data step: Applies the normalization logic.

AI assistance works here too. For example: "Capitalize city, uppercase state, trim whitespace from all address fields."

For both plans:

Configure the Output step. Choose Pass all attributes or specify a subset.
Validate and publish before moving on.

When you add an embedded plan step to another plan’s canvas, you must configure it first before the connection ports become active. Select the embedded plan and map the input attributes.

Step 2: Create the main standalone plan and add inputs

Go to Data quality > Data transformations
Select Create transformation plan > Standalone plan
Name it — for example, Unified Customer View — and select Create
Add a Catalog item input step for each source — CRM, billing, and support

Name each input step clearly — for example, CRM Input, Billing Input, Support Input — to keep the canvas readable.

Step 3: Apply embedded plans to each source

Each source uses different column names. Before applying an embedded plan, add an Edit schema step to rename the source columns to the names the embedded plan expects (phone, city, state, etc.). Embedded plans match by name, so mismatched names produce empty output.

For each source, the sequence is:

Edit schema: Rename source columns to match what the embedded plan expects.
Embedded plan: Apply phone standardization.
Embedded plan: Apply address standardization.

The CRM source already uses the expected column names, so its Edit schema step requires no renaming.

All three sources now have consistent phone and address formatting.

Step 4: Join the standardized data

With phone numbers in a consistent format across all three sources, joins on phone become reliable. In this example, email is used as the join key:

Add a Join step. Join CRM with billing using a left join (to keep all CRM customers).
Add a second Join step. Join the result with support using a left join.

Verify that the join key (email in this example) is consistent across systems before building the join. Records with mismatched or missing values are excluded from the result.

Step 5: Clean up the schema and output

Add a Delete attributes step to remove redundant source-specific columns that shouldn’t appear in the unified output.
Add an Edit schema step to rename any remaining columns to unified names.
Add an output step. Choose based on where the unified data needs to land:
- Database output: Use when the unified data needs to be available for downstream querying or reporting.
- Reference data output: Use when the unified data will serve as reference data for lookups and validations across your organization.
- File export: Use when the data needs to be delivered to file storage (for example, Amazon S3) for partner systems or data warehouse ingestion.
Run the plan and verify results before scheduling.

Step 6: Schedule for recurring updates

Set a schedule to keep the unified view current:

Daily at 3:00 AM, after source systems have updated.
Or use a cron expression for more complex timing.

See Automate Recurring Data Processing for scheduling details.

Reuse the embedded plans in other transformations

Once published, embedded plans are available to any transformation plan in your organization. Apply them wherever the same standardization is needed:

Vendor data standardization: Apply Standardize Phone Number and Standardize Address to vendor contact information in a separate transformation plan.
Transaction processing: Apply Standardize Phone Number to customer phone fields and Standardize Address to shipping addresses, alongside transaction-specific filtering and aggregation.

Update standardization logic

When requirements change, open the embedded plan, update the logic, and republish. All standalone plans and TCIs using that embedded plan automatically inherit the updated logic on their next run — no need to update each plan individually.

Test changes with sample data before publishing — all plans using the embedded plan are affected.

Best practices for designing embedded plans

Name descriptively: Use names that clearly describe what the plan does, such as Standardize Phone Number, Parse Full Name, or Enrich with Currency Rates. A clear name helps anyone reusing the plan understand what it does at a glance.
Keep plans focused: Each embedded plan should do one thing well. Create multiple small embedded plans rather than one complex one so each is easier to test, reuse, and update independently.
Document inputs and outputs: Use the description field to explain what attributes the plan expects and what it produces. This is especially important when the plan is shared across teams.
Track changes in descriptions: When updating an embedded plan, note what changed in the description. Downstream plans inherit your changes automatically, so anyone reusing this plan needs to know what’s new.
Test independently first: Validate each embedded plan with data preview before using it in other plans. Catching issues at the embedded plan level is quicker than tracking them down inside a full pipeline.
Build a standardization library: Over time, build a collection of embedded plans for patterns your organization uses repeatedly, such as phone formats, address normalization, date parsing, and name handling. A shared library reduces duplication across all your transformations.

Next steps

Was this page useful?

Standardize Data Across Multiple Sources

Why standardize data across sources

Embedded plans vs transformation rules

Example: Unify customer data from CRM, billing, and support

Step 1: Build the embedded plans

Standardize Phone Number

Standardize Address

Step 2: Create the main standalone plan and add inputs

Step 3: Apply embedded plans to each source

Step 4: Join the standardized data

Step 5: Clean up the schema and output

Step 6: Schedule for recurring updates

Reuse the embedded plans in other transformations

Update standardization logic

Best practices for designing embedded plans

Next steps

Related scenarios