User Community Service Desk Downloads

Create DQ Rule in the Rule Library

Create rule

To create a rule:

  1. Navigate to the rule library under Data Quality > DQ Rules.

  2. Select Create.

    Select create from rule llibrary
  3. On the Create Rule screen, provide the following information:

    • Name: The name of the rule.

    • Rule type: Choose a data quality dimension for your rule. See Rule type for details.

    • Description (optional): A description of the rule and how to use it.

    • Reference of rule definition (optional): A descriptive name for the data source associated with the rule.

    • Find similar rules: Check whether the rule you are creating is similar to any other existing rule. This is essential for preventing the creation of duplicate rules.

    • Stewardship (recommended): Select the user group that should own the asset. After you select a group, the list of users assigned to the governance roles within the selected group is displayed.

  4. Select Save to create a draft of the rule.

  5. Before publishing or submitting an approval request, you need to define the rule logic consisting of at least one input attribute and condition. Continue to Define rule implementation.

Define rule implementation

Once you have created the rule, go to the Implementation tab.

Generate Rule with AI

Select this option to generate rule using AI:

  1. Define the rule and select Generate.

  2. Review the rule inputs and logic.

  3. If needed, adjust the rule definition and Re-generate.

  4. Continue to Test rule.

Generate rule using AI

Rule type

Under DQ Evaluation Rule, choose a data quality dimension for your rule: image::create-dq-rule-rule-library-select-rule-type.png[Select rule type,500]

  • Validity: Validity rules verify the usability of the data (for example, regarding data format, data content or attribute relations). The default results are Valid and Invalid.

  • Uniqueness: Use this dimension to verify that there are no duplicate values and only one instance appears in the dataset. The default results are Unique, Not populated, and Not Unique.

  • Completeness: Use this dimension to verify that the value field is filled. The default results are Complete and Not complete.

  • Accuracy: Use this dimension to check whether values are accurate and reflect the true values, for example, based on reference data. The default results are Accurate, No reference available, and Not accurate.

  • Timeliness: Use this dimension to verify whether data is available at the time it is needed. The default results are Timeliness ok, Minor delay, and Major delay.

  • Custom: If you have created any custom dimensions, they can also be selected here.

DQ dimensions are fully customizable. Validity, Uniqueness, Accuracy, Completeness, and Timeliness are available by default, but their configuration and results can be changed if required.

Inputs

Select rule inputs: Attributes (at least one is required), Parameters (optional), and Variables (optional). Once added, the inputs are available for building rule conditions in the Rule logic section.

How to name inputs
  • The input name does not affect the implementation.

  • We recommend using meaningful input names, as that helps keep the rule logic easy to understand for other users too.

  • If special characters or reserved words are used, they must be enclosed in square brackets, otherwise publishing fails.

How many inputs should you use?

For domain-based validation, you should have only one input.

Multiple inputs are required if you want to check the data from one attribute based on the values from another attribute or when you need to check the combination of values from several attributes. In this case, create as many inputs as there are attributes you will be validating. Rules with multiple inputs are used for data quality validation and monitoring in monitoring projects.

Attributes

Select the attribute data type and specify its name. Select Add attribute if you require additional fields.

Add rule inputs

(Optional) Select Add term to add terms to the attribute.

Adding terms to input attributes during implementation results in this rule being suggested by the system to other attributes that also have this term.

Parameters

Parameters are placeholders for values provided by the user. First, use parameters in conditions instead of constant values, and then provide parameter values when applying the rule. This way, you can create reusable rules that can be applied with different parameter values.

For example, you can use parameters to set up a DQ rule that checks whether the value of an age attribute falls within a certain interval, which is not known upfront. This rule will use any age interval specified by the user while applying the rule to the age attribute.

Parameter values are case-sensitive.

To add parameters:

Add parameters
  1. Select Add parameter.

  2. Provide a meaningful parameter name.

  3. Provide a data type.

Select Add parameter if you require additional fields.

Which data types can be used with parameters?

There are two additional data types that can be used with parameters:

  • List of values: Useful for validating whether the value is or is not from a predefined list. The list of values is provided by user when applying the rule.

    Used in combination with the Is from parametrized list or Is not from the parametrized list condition.

  • List of masks: Useful for validating whether masks and patterns come from a predefined list or not. This is useful especially for autogenerated rules where the masks are prefilled from profiling (an option preferred to specifying masks manually).

    This datatype is not supported in the Condition Builder yet, use ONE expression matches([Valid masks], Attribute) instead.

Variables

In the Variables section, you can apply various transformations to your input attributes. Variables store this transformed data. They can be used directly in the rule implementation logic, or to define groups for aggregation rules.

For example, you can create a variable that contains a trimmed value of a string attribute and then apply the rule conditions only on the trimmed version.

To create variables from your attributes:

Add input variables
  1. Select a name for your variable. Select Add variable if you require additional fields.

  2. Select the attributes that you want to transform. Use the dropdown to choose from the list of input attributes. The transformations will be applied in the next step.

    Alternatively, if you are an advanced user, use the dropdown to select Advanced Expression and define the attribute and the transformations using ONE expression language.

    Advanced expression in variables
  3. Add transformations using the Add transformation option. The transformations available depend on the data type of the selected attribute: a full list of types and available transformations is provided below.

How to apply multiple transformations to one attribute?

You can apply multiple transformations to a single attribute, but you can only add them in a logical sequence. For example, if you start with an attribute of data type String and apply the To integer transformation, you can’t then apply the Uppercase transformation, as it is only available for strings.

For the same reason, you can only remove the last transformation in the sequence at any given time. To remove a transformation, hover over it and select x to delete it.

List of available transformations
Expand to see all available transformations
Datatype Available transformations Description

String

To float

Changes data type from string to float.

To integer

Changes data type from string to integer.

To long

Changes data type from string to long.

Uppercase

Converts string characters to uppercase.

Lowercase

Converts string characters to lowercase.

Trim

Removes spaces from both ends of the string.

Squeeze spaces

Trims string and replaces repeated spaces with a single space.

Remove non-digits

Removes non-digit characters from string.

Remove non-letters

Removes non-letter characters from string.

Integer

To float

Changes data type from integer to float.

To date

Changes data type from integer to date.

To long

Changes data type from integer to long.

Boolean

To String

Changes data type from Boolean to string.

Date

To datetime

Changes data type from date to datetime.

To string

Changes data type from date to string.

Datetime

To date

Changes data type from datetime to date.

To string

Changes data type from datetime to string.

Long

To float

Changes data type from long to float.

To string

Changes data type from long to string.

To date

Changes data type from long to date.

To datetime

Changes data type from long to datetime.

Float

To string

Changes data type from float to string.

Floor

Rounds to the nearest integer that is less than or equal to float value.

Ceiling

Rounds to the nearest integer that is greater than or equal to float value.

Round

Rounds to nearest integer.

Rule logic

Define the rule logic by defining the rule conditions and applying them to the inputs.

  1. Select whether you are creating a standard Rule or an Aggregation rule. Aggregation rules require additional configuration.

  2. Provide the rule conditions and an explanation of the results.

    There are two ways to do this: through the Condition Builder or via Advanced Expression:

    • Select Advanced Expression to leverage ONE expressions in your rule condition.

    • Select Condition Builder to define the rule logic using the predefined options.

      Condition builder

      If you selected Advanced Expression, enter the expression in the space provided and proceed to Test rule.

      If you selected Condition Builder:

      1. Select the input attributes to use in the rule logic.

        Select inputs
      2. Select any required modifiers, for example, Trim or Round. Different modifiers are available for string and integer inputs.

        Select modifier
        Expand to see all available modifiers
        Modifier Description

        Value

        The given value of an attribute.

        Uppercase

        Uppercase version of string.

        Lowercase

        Lowercase version of string.

        Trim

        Removes any whitespaces from both sides of strings.

        Trim left

        Removes any leading whitespaces from strings.

        Trim right

        Removes any trailing whitespaces from strings.

        Squared

        Squares the value of an integer, float, or long data type attributes.

        Round

        Rounds the value of an integer, float, or long data type attributes to the nearest whole number.

        Average

        Averages the value of integer, float, or long data type attributes (available only when using aggregation rules and grouping).

        Min

        The minimum of string, integer, or long data types attributes (available only when using aggregation rules and grouping).

        Max

        The maximum of string, integer, or long data type attributes (available only when using aggregation rules and grouping).

        Sum

        The sum of of string, integer, or long data type attributes (available only when using aggregation rules and grouping).

      3. Select the required conditions, for example, matches mask or has length of, and provide the necessary requirements. The options available depend on the data type of the input attributes.

        Select condition
        Expand to see all available conditions
        Condition Description

        Is empty

        When the input is not filled.

        This checks if the field is filled, but a field is not identified as empty if it is filled with: NULL, Null, null, ., ,, -, _, N/A, n/a, and so on.

        Is not empty

        When the input is filled.

        Is the same as

        When the input is the same as the value you define.

        Is the same as attribute

        When the attribute is the same as the attribute you select.

        Is not the same as

        When the input is not the same as the value you define.

        Is not the same as attribute

        When the attribute is not the same as the attribute you select.

        Is from the following list

        When the attribute belongs to the list you define.

        Is not from the following list

        When the attribute does not belong to the list you define.

        Is from parametrized list

        When the attribute belongs to the list of parameters you define (available only when using List of values parameter data type).

        Is not from parametrized list

        When the attribute does not belong to the list of parameters you define (available only when using List of values parameter data type).

        Has length of

        Is the same as the length you define.

        Contains

        When the input contains a substring you define.

        Is from catalog item

        When input belongs to a lookup item built from specified catalog item attribute or reference data in Reference data.

        To make sure the rule always uses the latest data available, select ON DATA CHANGE in Data updates. If you select a catalog item that is not managed using Reference data, no automatic updates are available.

        Is not from catalog item

        When input doesn’t belong to a lookup item built from specified catalog item attribute or reference data in Reference data.

        Is from lookup

        When attribute input belongs to the lookup file you select.

        Is not from lookup

        When attribute input does not belong to the lookup file you select.

        Matches mask

        When input corresponds to the mask you define.

        Does not match mask

        When input does not correspond to the mask you define.

        Matches regexp

        When input corresponds to the regular expression you define.

        Does not match regexp

        When input does not correspond to the regular expression you define.

        Is true

        When Boolean input is true.

        Is false

        When Boolean input is false.

        Is unique

        When aggregated group is unique.

        Is not unique

        When aggregated group is not unique.

      4. Select whether the condition defined should produce the result that is, for example, Valid or Invalid. Options are different depending on the dimension chosen.

  3. Optionally, enter an explanation to help distinguish between multiple rules of the same dimension.

    Explanations are used as names of the conditions, and are also displayed as a results description in all kind of dashboards or reports. Use the explanation that would make sense to you when further analyzing data quality results.

    Condition result
  4. (Optional) Assign a score to results that are then used during evaluation. See Scoring records.

Scoring records

The score is used as a numeric expression of the severity of each record’s invalidity according to the rule. Scores are arbitrary and can be assigned to suit your project.

Typically, the logic follows that the higher the score, the worse the issue. For example, if a non-critical value is missing, the invalidity score could be 100. If a critical value is missing, the invalidity score could be 10000.

Assigning scores

In the example below, you have a rule in which an invalid record due to errors is considered of higher importance (score 500) than an invalid record due to a null value (score 50).

600

Test rule

Test the rule by clicking Test Rule and trying different inputs in the Test section. The results for each input are shown automatically in Message next to the test input row.

Test rule

Publish rule

Publish your changes.

Was this page useful?