User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Term Suggestions: Behind the Scenes

Recommending business terms involves identifying attributes in other catalog items that are similar to the target attribute and proposing business terms based on the terms assigned to those similar attributes.

For more information about how to configure the Term Suggestions feature, see Term Suggestions Services Configuration.

The level of similarity between attributes is determined by the following features:

  • Metadata, which helps identify the similarity between the target attribute and other attributes via attribute name, the name of the catalog item it belongs to, or other metadata.

  • Data fingerprints, which encode the content of a catalog item attribute into a vector of 128 float numbers.

Term Suggestions need to be recomputed after the metadata database is recovered from a snapshot. See Term Suggestions synchronization.

How is user feedback processed in Term Suggestions?

Accepting and rejecting term suggestions

After a term has been suggested, it is not automatically assigned to the attribute in question as users need to approve or reject the suggestion. The user input plays a role in improving the overall quality of term suggestions since the Term Suggestions services use this feedback to learn about user preferences and particular qualities of each term and adjust future suggestions accordingly.

Two key concepts for understanding how the process of computing term suggestions is constantly adapted are the following:

  • Distance threshold: Defines how close the fingerprints and metadata of two attributes need to be so that a term assigned to one of them is suggested to the second attribute.

  • Target precision: The percentage of accepted term suggestions that the algorithm is trying to achieve. The default value is 0.9 and it can be set in the services' configuration.

  • Feature weight: Transforms a raw classifier output into reported confidence for each feature.

  • Pairwise interactions: Captures the influence of combinations of features.

The services try to adhere to the target precision rate by adapting the parameters (distance threshold, weight, interactions) when choosing which terms to suggest and with what confidence. There are two scenarios:

  1. If the actual approval rate is higher than the target precision rate, that is, if the Term Suggestions services are conservatively suggesting terms only when the chance of approval is very high, the services gradually increase the thresholds, weights, and interactions of features that contribute to the suggestions.

    Consequently, more terms are suggested, even with a lower level of confidence. This leads to a higher percentage of rejected terms and makes sure that the target precision rate is met.

  2. If the actual approval rate is lower than the one configured, the opposite happens. The distance threshold, weight, and interactions are decreased until the actual accuracy rate matches the target one. As a result, fewer term suggestions are made even when the confidence level is high.

Over time, parameters stabilize on these values so that the real acceptance rate of the terms that are suggested or rejected corresponds to the displayed confidence and that their average value is close to the target precision. For this reason, it is important to not only accept the correct suggestions but also to reject the incorrect ones.

Was this page useful?