Matching Performance Best Practices

This article outlines two main ways to improve how efficiently matching computation is performed, through specific configuration and using parallelism in matching.

Configuration best practices

For a faster and more accurate matching computation, we recommend following these guidelines.

Check the configuration of the CDI example project to help you get started.

Migrating from another MDM solution

When migrating from another MDM solution to ONE MDM, avoid copying matching rules one-to-one. Doing so typically results in reduced matching performance.

Standardize and prepare data

For optimal results, start matching only after standardizing your data. This helps remove any differences in your data that are irrelevant for the matching process and in turn, contributes to more accurate matching results. Convert to uppercase, remove accents, clean up spacing, trim whitespace, and so on.

Look for and remove default values (for example, words like TEST, XNA, DEFAULT, or default dates) in matching key columns. This helps reduce the number of incorrect matching results.

Remove all columns that are not relevant for the Matching step, any logic after the step, and the Integration Output. You can use the Alter Format step in ONE Desktop for this purpose. This reduces how much temporary space is used as well as the storage input/output operations per second (IOPS).

Use partitioning and targeted grouping

Use partitioning extensively to create separate groups of records that will never be matched against each other, for example: party type, client segmentation, and so on. Partition design significantly impacts performance as poorly designed partitions can create bottlenecks.

Key rules should be selective enough to reduce unnecessary comparisons. Try to avoid creating large data groups based on a single key rule. Instead, combine more attributes and configure multiple key rules to create smaller, targeted groups.

Similarly, avoid using a very large set of columns for a key group. Cross-referencing matches are performed across every value in the set and can therefore be very time-consuming.

Optimize matching rules

Avoid creating multiple matching rules to capture the quality of the data you’re working with. Instead, strategically create a two-rule set:

One exact match rule: For clean, standardized data where you want precise matches.
One flexible approximation rule: For scenarios where data might be of poor quality but the rule offers acceptable approximation.

Order matching rules from most selective to least selective. Keep in mind that matching rules need to be reviewed and updated as data patterns evolve. This also means planning for reprocessing cycles after rule updates.

When defining matching rules, prefer using test and match functions over expressions and matching measures in the Matching step.

Parallelism in matching

Consider enabling parallelism for large data volumes. The Matching step can be configured so that partitions, key rules, and matching rules are computed in parallel.

To learn how to configure the appropriate level of parallelism, see Performance Tuning.

Was this page useful?