User Community Service Desk Downloads

Create Always-Current Data Views

Create virtual data views that compute results on-demand, such as de-duplicated customer lists, unified product catalogs, and rolling sales summaries. Use transformation catalog items to eliminate batch jobs and deliver data that’s always current. Results are computed when accessed rather than stored, so there’s no duplicate data, no refresh schedule to maintain, and the view always reflects the latest source data.

Why create always-current views

Batch-processed views create stale data and operational overhead:

  • Data goes stale: Nightly refresh means data is hours or a full day old.

  • Frequent jobs cost more: Hourly refreshes increase compute costs and resource contention.

  • Storage duplication: Materialized views waste storage by copying source data.

  • Synchronization is complex: Keeping views aligned with source changes requires orchestration.

  • Users don’t trust the data: Uncertainty about whether they’re seeing current data or yesterday’s snapshot.

Batch jobs can’t deliver always-current data without excessive cost and complexity.

How transformation catalog items work

Transformation catalog items (TCIs) create virtual views that compute results dynamically. The general workflow:

  • Define transformation logic on the visual canvas.

  • Logic reads current source data when accessed.

  • Results are computed on-demand, with no batch jobs or scheduled refreshes.

  • Output stays current and reflects the latest source data every time.

  • Can be used like any catalog item: profile, monitor quality, or export.

TCIs act like database views: logic is defined once, results are computed fresh each time someone accesses the data.

When to use TCIs

TCIs execute transformation logic every time they are accessed. For small to medium datasets accessed at moderate frequency, this is efficient. For very large datasets or TCIs queried thousands of times per hour, the compute cost of on-demand execution might outweigh the freshness benefit.

Use TCIs when:

  • You need data to reflect source changes immediately, without waiting for a scheduled refresh.

  • Source data changes frequently and users need to see the current state.

  • Storage duplication is a concern.

  • Access frequency is moderate.

Use scheduled standalone plans instead when:

  • Source data changes infrequently (daily, weekly).

  • The TCI is queried very frequently by many concurrent users.

  • Transformation logic is very complex or processes billions of records.

  • Historical point-in-time snapshots are valuable.

Both approaches can coexist: TCIs for always-current operational views, standalone plans for historical reporting.

Example: De-duplicated customer view

This example shows how to create a unified, de-duplicated customer view that combines data from multiple sources and stays always current.

CRM (crm_customers):

customer_id name email phone address crm_created

CRM-001

Jane Smith

jane.smith@email.com

555-123-4567

123 Main St, NY

2021-03-10

CRM-002

John Doe

john.doe@email.com

555-234-5678

456 Oak Ave, IL

2020-07-15

Billing (billing_accounts):

account_email billing_phone billing_address billing_created

jane.smith@email.com

5551234567

123 Main Street NY

2021-03-12

john.doe@email.com

5552345678

456 Oak Avenue IL

2020-07-18

new.customer@email.com

5559876543

789 Pine Rd, TX

2024-01-05

Support (support_contacts):

contact_email support_phone support_address

jane.smith@email.com

+1 (555) 123-4567

123 Main St

john.doe@email.com

+1 555 234 5678

456 Oak Ave

Expected output — one record per unique email, with data merged from all sources:

customer_id name email phone address first_seen

CRM-001

Jane Smith

jane.smith@email.com

5551234567

123 Main St, NY

2021-03-10

CRM-002

John Doe

john.doe@email.com

5552345678

456 Oak Ave, IL

2020-07-15

null

null

new.customer@email.com

5559876543

789 Pine Rd, TX

2024-01-05

Step 1: Create a transformation catalog item

Option A: Create from scratch

  1. Go to Data catalog.

  2. Select Create > Catalog item.

  3. Choose Transformation catalog item as the type.

  4. Enter a descriptive name that signals what the view provides, for example Unified Customer View - Deduplicated.

  5. Add a description explaining what it provides and which sources it combines.

  6. Select Create.

Option B: Convert an existing catalog item

  1. Open an existing source catalog item.

  2. From the three-dot menu, select Create transformation catalog item.

  3. Name and configure as needed.

Step 2: Open the Data Flow tab and add input sources

Open your TCI and go to the Data Flow tab. You’ll see a visual canvas — the same canvas used in standalone transformation plans. The TCI automatically includes a Data transformation CI output step; do not remove it — it must remain as the final step.

Add a Catalog item input step for each source you want to combine:

  • CRM customers table

  • Billing accounts table

  • Support contacts table

Name each input clearly — for example, CRM Customers, Billing Accounts, Support Contacts.

Step 3: Standardize data from each source

Before combining sources, normalize email and phone to a common format so joins work reliably. Add a Transform data step after each input and configure normalization for the fields each source has — at minimum, lowercase and trim email and digits-only phone.

Use AI assistance in any expression field — for example: "Convert email to lowercase and trim whitespace" or "Remove all non-digit characters from a phone number, including dashes, parentheses, and spaces."

If you’ve already created a Normalize email format transformation rule, you can apply it to each source catalog item via a preparation set before joining, instead of repeating the logic in the Data Flow.

Step 4: Combine data from multiple sources

Add Join steps to merge the standardized sources:

  1. Add a Join step

    • Connect standardized CRM and Billing outputs

    • Join type: Outer join — keep all records from both sources

    • Left key: email / Right key: account_email

  2. Add a second Join step

    • Connect the previous result and Support

    • Join type: Outer join

    • Left key: email / Right key: contact_email

Verify your join key first. This example uses email as the join key, which works only if the same customer uses a consistent email address across all three systems. Check for mismatches before building the join. Records with mismatched or missing values are excluded from the result.

Step 5: Deduplicate records

After the outer join, customers might appear multiple times. Add a Group aggregator step:

  • Group by: email

  • Aggregations: Configure each field to pick the best available value. Prefer CRM data where available, then fall back to billing and support.

Use AI assistance to generate aggregation expressions — for example: "Pick the first non-null value for name, preferring CRM over billing over support."

Step 6: Clean up and unify the schema

  1. Add a Delete attributes step — remove temporary join columns and source-specific fields that shouldn’t appear in the output

  2. Add an Edit schema step — rename remaining attributes to clean, unified names

  3. Optionally add an Add attributes step for derived fields — for example, a data_sources attribute set to "CRM+Billing+Support"

Step 7: Validate, preview, and publish

  1. Connect your final step to the Data transformation CI output step

  2. Select Validate plan and fix any errors

  3. Select Compute data preview to verify the output — check for one record per email, fields populated from the correct source, no unexpected nulls

  4. Select Publish — the TCI is now available in the catalog and computes results on-demand whenever accessed

Use your always-current view

Once published, use the TCI like any catalog item:

Profile the data

Go to the TCI and run profiling to understand data distribution and quality. Profiling executes the transformation on-demand and shows current results, not a cached snapshot.

Monitor data quality

Set up a data quality monitor on the TCI. Apply DQ rules to validate the unified view and schedule the monitor to run regularly. Each run computes fresh results from current source data.

Use in analytics and reporting

Reference the TCI in BI tools and analytics platforms. Results always reflect the latest source data, with no waiting for a nightly refresh job.

Export to downstream systems

Export the TCI to CSV or a database table when needed. Exports always contain current data. For systems that need regular pushes, schedule recurring exports using standalone plans.

How on-demand computation works

When someone accesses your TCI:

  1. The transformation logic defined in the Data Flow runs

  2. Transformation logic reads current data from source catalog items

  3. Applies transformations, joins, aggregations, and rules

  4. Returns computed results — nothing is stored

As a result, the view always reflects current source data, no duplicate data is stored, no batch jobs need scheduling, and source changes appear on the next access.

Common use cases

  • Unified customer 360 views: Combine CRM, billing, support, and marketing data into a single always-current customer profile.

  • De-duplicated product catalogs: Merge product data from multiple vendors or systems, deduplicate by product code.

  • Rolling analytics views: Filter to recent time periods; results automatically roll forward each day without a refresh job.

  • Filtered data subsets: Always reflect the current set of active records without a nightly job.

  • Secure data views: Mask or remove sensitive fields before exposing data to specific audiences.

Best practices for designing transformation catalog items

  • Filter early: TCIs run on every access. Push filtering steps as early as possible to reduce the data volume processed.

  • Use for appropriate data volumes: TCIs work well for millions of records. For very large datasets with frequent access, consider a scheduled standalone plan with a materialized output table.

  • Apply transformation rules for organization-wide standards: Use preparation sets on source catalog items for standardization that applies across your whole catalog. Keep Data Flow logic on the TCI focused on data combination.

  • Document the purpose: Add clear descriptions explaining what the TCI provides, which sources it combines, and who should use it.

  • Set up data quality monitoring: Apply DQ rules to the TCI to ensure the unified view meets quality standards.

  • Name clearly: Use names that signal the view is unified and always current: Customer 360 - Current, Active Products - Unified.

  • Consider access patterns: If the TCI is accessed very frequently by many users, evaluate whether a scheduled materialized view would be more efficient.

Was this page useful?