User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Detection and DQ Evaluation Rules

Overview

Rules in ONE are split into two categories: detection rules and data quality evaluation rules (DQ rules). DQ rules are further split into what are called dimensions.

Detection rules and DQ rules can both be accessed and created from Data Quality > Rules but serve very different purposes in the platform:

  • Detection rules identify catalog item attributes to which a particular business term should be applied, based on the data or metadata. They are applied to terms themselves, and these terms are applied to attributes which in turn satisfy the condition of the rule. Multiple detection rules can be applied to one term. Detection rules run during profiling, to identify and classify attributes and catalog items according to the rule condition.

    500
  • DQ evaluation rules evaluate the quality of catalog items and their attributes. They are applied to terms, and then subsequently applied to attributes containing those terms for effective large-scale data evaluation. DQ evaluation rules run during DQ evaluation in the data catalog or in monitoring projects.

    In monitoring projects, DQ rules are applied directly to catalog item attributes.

    The results of these rules can be seen on the Data Quality tab of a catalog item, attribute, or the term itself, or in monitoring projects.

More about detection rules

Detection rules are used for rule-based term detection during profiling and data discovery. A typical use case of detection rules might be to assign terms to attributes which contain values matching those in a certain lookup file (for example, a lookup containing a list of first names). Or, you can create a condition which identifies email addresses and applies the term email.

In this scenario:

  1. A detection rule is created with the underlying logic. For example, the term can be assigned to attributes on the condition that the attribute value is in the specified lookup file.

  2. On the appropriate term (for example, First Name), the newly created detection rule is chosen after selecting Add Rules on the term Settings tab.

    In addition, the term threshold is defined, that is, the percentage of values which should satisfy the rule conditions in order for the term to be applied to the attribute.

    Term settings
  3. The term is automatically assigned to attributes which match the rule condition and the threshold during data discovery or evaluation.

Creation of detection rules is handled by going to Data Quality > Rules. Configuration of the rules can also be carried out in the Data Quality sections of the platform, but once they are created, most of the interaction with detection rules is within Glossary and Catalog where rules are added to terms, and where terms are added to your data as a result of these rules, respectively.

Although enabled from the same section, AI term detection and detection rules work independently. Detection rules are not AI-powered.

Information about enabling AI term suggestions can be found in Getting Started with Term Statistics and Settings; details about the Term Suggestions algorithm can be found in Term Suggestions: Behind the Scenes.

More about DQ evaluation rules

DQ evaluation rules are used to evaluate data based on a specific dimension. These affect the rules in two ways:

  • Depending on the dimension selected, different results are available in the rule condition builder (during rule implementation).

  • Results of rules from contributing dimensions are included in the calculation of the Overall Quality metric which can be seen on the Data Quality tab for a catalog item or for the term itself, or in results and reports of monitoring projects.

    For more information about contributing dimensions, see Data Quality Dimensions.

All predefined glossary terms come with preconfigured DQ rules that evaluate one or more data quality dimensions.

Data quality rules can be applied to terms, and in turn, indirectly applied to attributes and catalog items to which those terms are mapped. You can also apply data quality rules directly to catalog item attributes manually within monitoring projects.

Data quality rules in ONE evaluate the quality of catalog items in accordance with defined conditions. Within the context of the Knowledge Catalog and Business Glossary sections of the platform, this occurs during profiling or DQ evaluation. Within the context of the Data Quality section of the application, this occurs during monitoring.

DQ dimensions

DQ rules are divided into a number of different categories known as DQ dimensions – these affect the rules in two ways:

  • Different results will be available in the rule condition builder (during rule implementation) depending on dimension selected.

  • Results of the Validity dimension will be included in the calculation of the Overall Validity metric which can be seen in the Data Quality tab for a catalog item or for the term itself, or in results and reports of monitoring projects

Overall validity
  • Validity: By default, the possible results are Valid and Invalid, that is, if the condition is met, the data is valid or invalid. You could use this dimension when creating rules to verify the usability of the data (for example, regarding data format, data content or attribute relations).

  • Uniqueness: By default, the possible results are Unique, Not populated, and Not Unique. You could use this dimension when creating rules to verify that there are no duplicate values and only one instance appears in the dataset.

  • Completeness: By default, the possible results are Complete and Not complete. You could use this dimension when creating rules to verify that the value field is filled.

  • Accuracy: By default, the possible results are Accurate, No reference available, and Not accurate. You could use this dimension when creating rules to check whether values are accurate and reflect the true values, for example, based on reference data.

Custom dimensions can also be created according to the instructions in Data Quality Dimensions.

Was this page useful?