Aggregation Rules
Aggregation rules in ONE can be utilized for all DQ dimensions (Validity, Uniqueness, Completeness, Timeliness, and Accuracy).
To unlock grouping capabilities and create an aggregation rule:
-
Select Aggregation rule in rule type when creating a rule.
-
The Group by section is now available.
You can create aggregation rules either using the Condition Builder (No code aggregation rules) or using ONE Expressions (Advanced expression aggregation rules). Both options allow you to easily:
-
Select input attributes.
-
Apply transformations to these attributes to create variables.
-
Create an aggregated group by selecting the required inputs or variables.
The main difference is the criteria available for the results.
If you are using the Condition Builder, options for aggregated groups are limited to is unique
and is not unique
.
Before you start
Make sure you are familiar with the basics of creating a rule in ONE.
We recommend checking the following topics:
-
Data Quality - Learn how data quality works in ONE.
-
Detection and DQ Evaluation Rules - Read more about the types of rules and their role.
-
Create DQ Evaluation Rule - A tutorial for creating a DQ rule.
-
Create Detection Rule - A tutorial for creating a detection rule.
No code aggregation rules
-
In Data Quality > Rules, select Create.
-
Provide the Name of your rule.
Optionally, add a Description, Rule definition source, and Stewardship (recommended).
-
On the Implementation tab, define the Input Attributes and their data types, and any Variables and their transformations (if applicable).
-
In Rule Logic select the rule dimension. Available options are Validity, Completeness, Uniqueness, Timeliness, and Accuracy.
-
After selecting rule type, the following dropdown menu appears. Select Aggregation rule.
-
Select Change Type when prompted.
If further rule logic has already been defined, be aware that this is not saved when you change rule type. -
A new section named Group by appears. Group by a single selected attribute or variable, or a number of them, using Add input (and then use the plus sign (
+
) for each additional attribute).Attributes or variables will be available under the names provided for them in the Attributes and Variables sections respectively.
-
Using the dropdown, select Condition Builder if it is not already selected.
-
Select the newly-created group from the list of available inputs. Groups will be listed under the heading Group by results.
-
You can now define the outcome of the DQ rule based on the uniqueness of the group.
Advanced expression aggregation rules
-
In Data Quality > Rules, select Create.
-
Provide the Name of your rule.
Optionally, add a Description, Rule definition source, and Stewardship (recommended).
-
On the Implementation tab, define the Input Attributes and their data types, and any Variables and their transformations (if applicable).
Input attributes and variables must be of a data type compatible with the aggregate function you wish to use. -
In Rule Logic, select the rule dimension. Available options are Validity, Completeness, Uniqueness, Timeliness, and Accuracy.
-
After selecting rule type, the following dropdown menu appears. Select Aggregation rule.
-
Select Change Type when prompted.
If further rule logic has already been defined, be aware that this is not saved when you change rule type. -
A new section named Group by appears. Group by a single selected attribute or variable, or a number of them, using Add input (and then use the plus sign (
+
) for each additional attribute).Attributes or variables will be available under the names provided for them in the Attributes and Variables sections respectively.
This works the same way as GROUP BY
clause in SQL: theGROUP BY
statement groups rows that have the same values into summary rows.You can create a group consisting only of a single input attribute without any transformation, in order to make aggregate functions available on the attribute.
-
Using the dropdown, change from Condition Builder to Advanced Expression.
-
You can now use aggregate functions in the expression field: attributes, variables, and the created group will all be available to use in the expression.
-
Use cases
In this section you can find step-by-step guides on how to configure aggregation rules for the following three use cases:
-
Use case 1: Does a certain combination of values in the table appear more than once (that is, is a certain combination unique or not)? This uses the Condition Builder.
-
Use case 2: Is the value of this transaction within 20% of the average transaction value (+/-10% from AVG) for this customer? This uses the
avg
aggregate function. -
Use case 3: Check whether there is the correct number of transactional records per day (for example, 24 per day) by aggregating records by date and then checking the size of each group. This uses the
count
aggregate function.
Use case 1: Unique group
Does a certain combination of values in the table appear more than once?
-
In Data Quality > Rules, select Create.
-
Name your rule and add other information as required. For the purpose of this example, we will use 'Combination rule'.
-
On the Implementation tab, add the input attributes.
In this example, we want to find if a combination of three attributes (
ATTRIBUTE 1
,ATTRIBUTE 2
, andATTRIBUTE 3
) appears more than once. -
In Rule Logic, select Validity rule.
-
Using the dropdown select Aggregation rule.
-
Select Change type when prompted.
-
In Group by, group the data by the three input attributes.
-
Using the dropdown, switch from Condition Builder to Advanced Expression.
-
Select Change type when prompted.
-
Provide the condition. For this example, we need to define that when the group is not unique, then the result is invalid.
-
Optionally, test the rule by selecting Test Rule.
Use case 2: avg
aggregate function
Is the value of this transaction within 20% of the average transaction value (+/-10% from AVG) for this customer?
-
In Data Quality > Rules, select Create.
-
Name your rule and add other information as required. For the purpose of this example, we will use 'Average rule'.
-
On the Implementation tab, add the input attributes.
In this example, we want to find if the value of a transaction is within 20% of the average transaction value for that customer so our inputs are
CUSTOMER
andTRANSACTIONS
. -
In Rule Logic, select Validity rule.
-
Using the dropdown select Aggregation rule.
-
Select Change type when prompted.
-
In Group by, group the data by the
CUSTOMER
attribute. -
Switch from Condition Builder to Advanced Expression.
-
Set the rule logic so that anything outside the 20% range of the average is considered
INVALID
. This has two elements:-
If the value is bigger than 10% of the average:
TRANSACTIONS > (avg(TRANSACTIONS) * 1.1)
-
If the value is lower than 10% of the average:
TRANSACTIONS < (avg(TRANSACTIONS) * 0.9)
-
-
Optionally, test the rule by selecting Test Rule.
Use case 3: count
aggregate function
-
In Data Quality > Rules, select Create.
-
Name your rule and add other information as required. For the purpose of this example, we will use 'Day rule'.
-
On the Implementation tab, add the input attributes.
In this example, we want to find if there is the correct number of records in a day so our input attribute is
DATE
data type. -
In Rule Logic, select Validity rule.
-
Using the dropdown select Aggregation rule.
-
Select Change type when prompted.
-
In Group by, group the data by the
DATE
attribute. -
Select Change type when prompted.
-
Set the rule logic to check for the correct number of transactional records per day (for example, 24 per day). In this example, a count of 24 records is valid, and everything else can be considered invalid, so the expression would be as follows:
(count() != 24)
-
Optionally, test the rule by selecting Test Rule.
Next steps
Apply your newly-created DQ rule to a term. See Add Rules to Terms.
Was this page useful?