Evaluate Data Quality in Snowflake
After deploying your DQ firewalls as Snowflake user-defined functions (UDFs) via DQ Gates, you can evaluate data quality directly within your Snowflake environment.
This guide shows how you can use UDFs to test individual values, evaluate tables, or integrate DQ evaluation into your data pipelines, and to learn how to understand the DQ evaluation results returned by these UDFs.
Test firewalls in Snowflake
Test different input values for the firewall attributes and see if your firewalls (UDFs) are functioning correctly.
You can test your firewalls directly in Snowflake by running a query in a Snowflake worksheet:
SELECT MY_DB.MY_SCHEMA.atc_dq_email_validation('name@company.com');
Replace the values:
-
MY_DB.MY_SCHEMA
: Database and schema where the UDF is located. -
atc_dq_email
: UDF name. -
name@example.com
: Actual data value that you want to test.
Run firewalls in Snowflake
Evaluate data quality across tables by applying your UDFs to multiple rows simultaneously.
To evaluate the data in a table, use the following query as a template. The query passes each row’s column values to the UDF for evaluation:
SELECT
orderID,
orderDate,
MY_DB.MY_SCHEMA.atc_validate_orders_year_2008(orderID, orderDate)
AS validation_result,
validation_result:passed as validation_passed
FROM MY_DB.demo_data.orders;
Replace the following values:
-
MY_DB.MY_SCHEMA
: Database and schema where the UDF is located. -
atc_dq_validate_orders_year_2008
: UDF name. -
orderID
,orderDate
: List of columns that contain values to evaluate. These must match the expected arguments for the UDF. -
MY_DB.demo_data.orders
: Name of the table containing the data to evaluate. -
validation_result:passed
: (Optional) Extracts the Boolean passed or failed result for easier filtering.
For large tables, consider using a |
How to find UDF argument details
To find the UDF argument names and order, use the following query as a template:
SELECT comment
FROM <my_database>.INFORMATION_SCHEMA.FUNCTIONS
WHERE function_name = 'ATC_DQ_EMAIL_VALIDATION';
deployed_at: 2025-08-28 09:09:00;
firewall_id: B5MSbQjSJr;
firewall_filename: email_validation.zip;
firewall_download_time: 2025-08-28 09:08:05;
arguments: #1: surname - STRING, #2: email - STRING;
returns: OBJECT<passed: boolean, score: int, rule_instance_results: OBJECT>;
Results
Data quality queries return a result set containing the original table columns and a validation_result
column with structured data quality results for each row.
Validation result
Each validation_result
contains the following information:
-
passed
: Indicates whether the record passed DQ evaluation. Eithertrue
orfalse
.The result is
passed
only if the record passes all DQ rules applied to it. The exception are rules belonging to dimension that has Overall contribution set tofalse
. Results of such rules do not count towards the overall result. So even if the rules fail, the overall result might still bePASSED
.
-
rule_instance_results
: Detailed results for each rule instance.-
dimension_result
: Rule dimension result. The result depends on the dimension to which the rule belongs. -
explanation
: Rule explanation, as defined in the rule implementation. -
overall_result
: Rule result (PASSED
orFAILED
). -
rule_instance_label
: Rule label, as defined when applying the rule to the attribute.
-
-
score
: Shows the combined score for allruleInstanceResults
.
{
"passed": false,
"rule_instance_results": [
{
"dimension_result": "INVALID",
"explanation": "FUTURE_DATE",
"overall_result": "FAILED",
"rule_instance_id": "Date is not in future"
},
{
"dimension_result": "VALID",
"explanation": "OTHER",
"overall_result": "PASSED",
"rule_instance_id": "Date is in fiscal season range"
}
],
"score": 2000
}
Example with table results
SELECT
orderID,
orderDate,
MY_DB.MY_SCHEMA.atc_dq_orders_year_2008(orderID, orderDate)
AS validation_result,
validation_result:passed as validation_passed
FROM MY_DB.demo_data.orders;
This query returns results in the following format:
orderID | orderDate | validation_result | validation_passed |
---|---|---|---|
123456 |
12-01-2008 |
structured object |
|
123457 |
14-05-2010 |
structured object |
|
The validation_passed
column provides direct access to the Boolean result for filtering and aggregation operations.
Access the validation results
The UDF returns structured data containing multiple pieces of information. You can use JSON path notation in Snowflake to access specific parts of the result.
The way you access different parts of the validation results depends on the structure of the data you want to extract:
-
Simple fields like
passed
andscore
can be accessed directly using JSON path notation. -
Array elements from
rule_instance_results
requireLATERAL FLATTEN
to expand and access individual array items.
passed
Extract just the Boolean passed or failed result without the full validation details:
SELECT MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed;
score
Extract the overall quality score from the validation result:
SELECT MY_DB.MY_SCHEMA.atc_dq_continent_validation('Europe'):score;
rule_instance_results
Use LATERAL FLATTEN
to expand the array elements:
SELECT
t.orderID,
t.validation_result,
ARRAY_AGG(f.value:explanation::STRING) AS explanations_array
FROM (
SELECT
orderID,
MY_DB.MY_SCHEMA.atc_validate_orders_year_2008(orderID, orderDate) AS validation_result
FROM MY_DB.demo_data.orders
) t,
LATERAL FLATTEN(INPUT => t.validation_result:rule_instance_results) f
GROUP BY t.orderID, t.validation_result;
This example extracts all rule explanations into an array:
orderID | validation_result | explanations_array |
---|---|---|
1001 |
{ "passed": false, "rule_instance_results": [ { "dimension_result": "INVALID", "explanation": "FUTURE_DATE", "overall_result": "FAILED", "rule_instance_id": "Date is not in future" }, { "dimension_result": "VALID", "explanation": "OTHER", "overall_result": "PASSED", "rule_instance_id": "Date is in fiscal season range" } ], "score": 2000 } |
["FUTURE_DATE", "OTHER"] |
You can similarly access other fields like:
-
f.value:dimension_result::STRING
: Rule dimension result. -
f.value:overall_result::STRING
: Individual rule result (PASSED/FAILED). -
f.value:rule_instance_id::STRING
: Rule instance identifier.
Integrate firewalls into production pipelines
The following examples show how to integrate ONE data quality UDFs into production data pipelines. These UDFs serve as automated quality checkpoints throughout the pipeline, enforcing data quality standards from ingestion to consumption.
The examples show how to:
-
Filter out invalid data before downstream processing.
-
Flag data quality issues while preserving all data.
-
Implement quality gates or circuit breakers that stop pipelines if thresholds aren’t met.
-
Create DQ dashboards and alerts for monitoring and reporting.
Filter out invalid data
Use this pattern to prevent low-quality data from entering downstream systems, mainly during early-stage ingestion or staging.
To exclude records that fail data quality checks, apply the condition :passed = TRUE
to the UDF output.
-- Insert only valid records
INSERT INTO clean_customers
SELECT firstname, lastname, email, phone
FROM raw_customers
WHERE MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed = TRUE;
-- Auto-ingest with validation
CREATE PIPE customer_validation_pipe AS
COPY INTO validated_customers
FROM (
SELECT
$1:firstname::STRING as firstname,
$1:lastname::STRING as lastname,
MY_DB.MY_SCHEMA.atc_dq_customers($1:firstname::STRING, $1:lastname::STRING) as quality_result
FROM @my_stage
WHERE MY_DB.MY_SCHEMA.atc_dq_customers($1:firstname::STRING, $1:lastname::STRING):passed = TRUE
);
Flag data quality issues
Create quality-flagged datasets by adding quality check columns to your tables, without filtering any data.
This pattern is useful when you need to preserve full data lineage or handle quality issues downstream, for example, by quarantining, alerting, or dashboarding.
CREATE OR REPLACE TABLE customers_with_quality AS
SELECT
*,
MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname) as quality_check,
MY_DB.MY_SCHEMA.atc_dq_email(email) as email_quality
FROM raw_customers;
Implement quality gates or circuit breakers
Stop pipeline execution if quality thresholds are not met.
To do this, calculate quality pass rates and implement conditional logic that halts the pipeline when thresholds aren’t reached, including automated alerting to notify stakeholders.
This approach helps enforce SLAs and prevents downstream impact when data quality declines.
-- Check data quality before proceeding
SET quality_pass_rate = (
SELECT
COUNT_IF(MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed) * 100.0 / COUNT(*)
FROM raw_customers
);
-- Conditional processing based on quality
IF ($quality_pass_rate < 95) THEN
CALL SYSTEM$SEND_EMAIL('dq-alerts@company.com', 'Data Quality Alert', 'Quality below threshold');
RETURN 'PIPELINE_FAILED';
END IF;
Monitoring and alerting
Create quality monitoring views. The views aggregate quality metrics over time, providing the foundation for dashboards and monitoring systems that track data quality trends.
CREATE OR REPLACE VIEW daily_quality_report AS
SELECT
DATE(created_at) as report_date,
COUNT(*) as total_records,
COUNT_IF(MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed) as valid_records,
COUNT_IF(NOT MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed) as invalid_records
FROM customers
GROUP BY DATE(created_at);
Integrate into testing frameworks
Catch quality issues early in the development process by integrating UDFs into testing frameworks like dbt. This helps maintain quality before deployment.
{% test validate_customers(model, first_name_column, last_name_column) %}
select *
from {{ model }}
where MY_DB.MY_SCHEMA.atc_dq_customers({{ first_name_column }}, {{ last_name_column }}):passed = FALSE
{% endtest %}
How to update firewalls?
When you change the configuration of DQ Firewalls in Ataccama ONE, these changes must be deployed to Snowflake to ensure they are applied in the Snowflake environment.
This process requires the same administrator permissions needed for DQ Gates installation.
For instructions, see Synchronize Firewalls in Snowflake with ONE.
Was this page useful?