User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Aggregation Rules

Aggregation rules in ONE can be utilized for all DQ dimensions (Validity, Uniqueness, Completeness, Timeliness, and Accuracy).

To unlock grouping capabilities and create an aggregation rule:

  1. Select Aggregation rule in rule type when creating a rule.

    600
  2. The Group by section is now available.

    400

You can create aggregation rules either using the Condition Builder (No code aggregation rules) or using ONE Expressions (Advanced expression aggregation rules). Both options allow you to easily:

  1. Select input attributes.

  2. Apply transformations to these attributes to create variables.

  3. Create an aggregated group by selecting the required inputs or variables.

    Float attributes are not suitable for grouping since the DQ evaluation operation removes trailing zeroes, which might lead to imprecise grouping and affect the evaluation results.

    If you need to use a float input attribute as a grouping parameter, create a variable to round the original value to an integer, for example, using this expression: toInteger(round(ATTRIBUTE, 0)). You can then use the variable in Group by.

The main difference is the criteria available for the results. If you are using the Condition Builder, options for aggregated groups are limited to is unique and is not unique.

400

Before you start

Make sure you are familiar with the basics of creating a rule in ONE.

We recommend checking the following topics:

No code aggregation rules

  1. In Data Quality > Rules, select Create.

  2. Provide the Name of your rule.

    Optionally, add a Description, Rule definition source, and Stewardship (recommended).

  3. On the Implementation tab, define the Input Attributes and their data types, and any Variables and their transformations (if applicable).

  4. In Rule Logic select the rule dimension. Available options are Validity, Completeness, Uniqueness, Timeliness, and Accuracy.

  5. After selecting rule type, the following dropdown menu appears. Select Aggregation rule.

    500
  6. Select Change Type when prompted.

    500
    If further rule logic has already been defined, be aware that this is not saved when you change rule type.
  7. A new section named Group by appears. Group by a single selected attribute or variable, or a number of them, using Add input (and then use the plus sign (+) for each additional attribute).

    Attributes or variables will be available under the names provided for them in the Attributes and Variables sections respectively.

  8. Using the dropdown, select Condition Builder if it is not already selected.

  9. Select the newly-created group from the list of available inputs. Groups will be listed under the heading Group by results.

    600
  10. You can now define the outcome of the DQ rule based on the uniqueness of the group.

    600

Advanced expression aggregation rules

  1. In Data Quality > Rules, select Create.

  2. Provide the Name of your rule.

    Optionally, add a Description, Rule definition source, and Stewardship (recommended).

  3. On the Implementation tab, define the Input Attributes and their data types, and any Variables and their transformations (if applicable).

    Input attributes and variables must be of a data type compatible with the aggregate function you wish to use.
  4. In Rule Logic, select the rule dimension. Available options are Validity, Completeness, Uniqueness, Timeliness, and Accuracy.

  5. After selecting rule type, the following dropdown menu appears. Select Aggregation rule.

    600
  6. Select Change Type when prompted.

    500
    If further rule logic has already been defined, be aware that this is not saved when you change rule type.
  7. A new section named Group by appears. Group by a single selected attribute or variable, or a number of them, using Add input (and then use the plus sign (+) for each additional attribute).

    Attributes or variables will be available under the names provided for them in the Attributes and Variables sections respectively.

    500
    This works the same way as GROUP BY clause in SQL: the GROUP BY statement groups rows that have the same values into summary rows.

    You can create a group consisting only of a single input attribute without any transformation, in order to make aggregate functions available on the attribute.

    1. Using the dropdown, change from Condition Builder to Advanced Expression.

    2. You can now use aggregate functions in the expression field: attributes, variables, and the created group will all be available to use in the expression.

Use cases

In this section you can find step-by-step guides on how to configure aggregation rules for the following three use cases:

  • Use case 1: Does a certain combination of values in the table appear more than once (that is, is a certain combination unique or not)? This uses the Condition Builder.

  • Use case 2: Is the value of this transaction within 20% of the average transaction value (+/-10% from AVG) for this customer? This uses the avg aggregate function.

  • Use case 3: Check whether there is the correct number of transactional records per day (for example, 24 per day) by aggregating records by date and then checking the size of each group. This uses the count aggregate function.

Use case 1: Unique group

Does a certain combination of values in the table appear more than once?

  1. In Data Quality > Rules, select Create.

  2. Name your rule and add other information as required. For the purpose of this example, we will use 'Combination rule'.

  3. On the Implementation tab, add the input attributes.

    In this example, we want to find if a combination of three attributes (ATTRIBUTE 1, ATTRIBUTE 2, and ATTRIBUTE 3) appears more than once.

    500
  4. In Rule Logic, select Validity rule.

  5. Using the dropdown select Aggregation rule.

  6. Select Change type when prompted.

  7. In Group by, group the data by the three input attributes.

    500
  8. Using the dropdown, switch from Condition Builder to Advanced Expression.

  9. Select Change type when prompted.

  10. Provide the condition. For this example, we need to define that when the group is not unique, then the result is invalid.

    500
  11. Optionally, test the rule by selecting Test Rule.

Use case 2: avg aggregate function

Is the value of this transaction within 20% of the average transaction value (+/-10% from AVG) for this customer?

  1. In Data Quality > Rules, select Create.

  2. Name your rule and add other information as required. For the purpose of this example, we will use 'Average rule'.

  3. On the Implementation tab, add the input attributes.

    In this example, we want to find if the value of a transaction is within 20% of the average transaction value for that customer so our inputs are CUSTOMER and TRANSACTIONS.

  4. In Rule Logic, select Validity rule.

  5. Using the dropdown select Aggregation rule.

  6. Select Change type when prompted.

  7. In Group by, group the data by the CUSTOMER attribute.

    500
  8. Switch from Condition Builder to Advanced Expression.

  9. Set the rule logic so that anything outside the 20% range of the average is considered INVALID. This has two elements:

    • If the value is bigger than 10% of the average:

      TRANSACTIONS > (avg(TRANSACTIONS) * 1.1)
      500
    • If the value is lower than 10% of the average:

      TRANSACTIONS < (avg(TRANSACTIONS) * 0.9)
      500
  10. Optionally, test the rule by selecting Test Rule.

Use case 3: count aggregate function

  1. In Data Quality > Rules, select Create.

  2. Name your rule and add other information as required. For the purpose of this example, we will use 'Day rule'.

  3. On the Implementation tab, add the input attributes.

    In this example, we want to find if there is the correct number of records in a day so our input attribute is DATE data type.

  4. In Rule Logic, select Validity rule.

  5. Using the dropdown select Aggregation rule.

  6. Select Change type when prompted.

  7. In Group by, group the data by the DATE attribute.

  8. Select Change type when prompted.

  9. Provide the expression. For this example, we need:

    WHEN
    (count() != 24)
    THEN
    Result is INVALID. Reason: Incorrect # of RECORDS
  10. Optionally, test the rule by selecting Test Rule.

Next steps

Apply your newly-created DQ rule to a term. See Add Rules to Terms.

Was this page useful?