User Community Service Desk Downloads

Evaluate Data Quality in Snowflake

After deploying your DQ firewalls as Snowflake user-defined functions (UDFs) via DQ Gates, you can evaluate data quality directly within your Snowflake environment.

This guide shows how you can use UDFs to test individual values, evaluate tables, or integrate DQ evaluation into your data pipelines, and to learn how to understand the DQ evaluation results returned by these UDFs.

Test firewalls in Snowflake

Test different input values for the firewall attributes and see if your firewalls (UDFs) are functioning correctly.

You can test your firewalls directly in Snowflake by running a query in a Snowflake worksheet:

SELECT MY_DB.MY_SCHEMA.atc_dq_email_validation('name@company.com');

Replace the values:

  • MY_DB.MY_SCHEMA: Database and schema where the UDF is located.

  • atc_dq_email: UDF name.

  • name@example.com: Actual data value that you want to test.

Run firewalls in Snowflake

Evaluate data quality across tables by applying your UDFs to multiple rows simultaneously.

To evaluate the data in a table, use the following query as a template. The query passes each row’s column values to the UDF for evaluation:

SELECT
  orderID,
  orderDate,
  MY_DB.MY_SCHEMA.atc_validate_orders_year_2008(orderID, orderDate)
  AS validation_result,
  validation_result:passed as validation_passed
FROM MY_DB.demo_data.orders;

Replace the following values:

  • MY_DB.MY_SCHEMA: Database and schema where the UDF is located.

  • atc_dq_validate_orders_year_2008: UDF name.

  • orderID, orderDate: List of columns that contain values to evaluate. These must match the expected arguments for the UDF.

  • MY_DB.demo_data.orders: Name of the table containing the data to evaluate.

  • validation_result:passed: (Optional) Extracts the Boolean passed or failed result for easier filtering.

For large tables, consider using a LIMIT or WHERE clause to test on a small subset first. This helps you validate the UDF behavior and avoid long execution times or unnecessary resource usage.

How to find UDF argument details

To find the UDF argument names and order, use the following query as a template:

SELECT comment
FROM <my_database>.INFORMATION_SCHEMA.FUNCTIONS
WHERE function_name = 'ATC_DQ_EMAIL_VALIDATION';
Sample response
deployed_at: 2025-08-28 09:09:00;
firewall_id: B5MSbQjSJr;
firewall_filename: email_validation.zip;
firewall_download_time: 2025-08-28 09:08:05;
arguments: #1: surname - STRING, #2: email - STRING;
returns: OBJECT<passed: boolean, score: int, rule_instance_results: OBJECT>;

Results

Data quality queries return a result set containing the original table columns and a validation_result column with structured data quality results for each row.

Validation result

Each validation_result contains the following information:

  • passed: Indicates whether the record passed DQ evaluation. Either true or false.

    The result is passed only if the record passes all DQ rules applied to it. The exception are rules belonging to dimension that has Overall contribution set to false. Results of such rules do not count towards the overall result. So even if the rules fail, the overall result might still be PASSED.

  • rule_instance_results: Detailed results for each rule instance.

    • dimension_result: Rule dimension result. The result depends on the dimension to which the rule belongs.

    • explanation: Rule explanation, as defined in the rule implementation.

    • overall_result: Rule result (PASSED or FAILED).

    • rule_instance_label: Rule label, as defined when applying the rule to the attribute.

  • score: Shows the combined score for all ruleInstanceResults.

Sample response
{
  "passed": false,
  "rule_instance_results": [
    {
      "dimension_result": "INVALID",
      "explanation": "FUTURE_DATE",
      "overall_result": "FAILED",
      "rule_instance_id": "Date is not in future"
    },
    {
      "dimension_result": "VALID",
      "explanation": "OTHER",
      "overall_result": "PASSED",
      "rule_instance_id": "Date is in fiscal season range"
    }
  ],
  "score": 2000
}

Example with table results

Example query
SELECT
  orderID,
  orderDate,
  MY_DB.MY_SCHEMA.atc_dq_orders_year_2008(orderID, orderDate)
  AS validation_result,
  validation_result:passed as validation_passed
FROM MY_DB.demo_data.orders;

This query returns results in the following format:

orderID orderDate validation_result validation_passed

123456

12-01-2008

structured object

true

123457

14-05-2010

structured object

false

The validation_passed column provides direct access to the Boolean result for filtering and aggregation operations.

Access the validation results

The UDF returns structured data containing multiple pieces of information. You can use JSON path notation in Snowflake to access specific parts of the result.

The way you access different parts of the validation results depends on the structure of the data you want to extract:

  • Simple fields like passed and score can be accessed directly using JSON path notation.

  • Array elements from rule_instance_results require LATERAL FLATTEN to expand and access individual array items.

passed

Extract just the Boolean passed or failed result without the full validation details:

SELECT MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed;

score

Extract the overall quality score from the validation result:

SELECT MY_DB.MY_SCHEMA.atc_dq_continent_validation('Europe'):score;

rule_instance_results

Use LATERAL FLATTEN to expand the array elements:

SELECT
  t.orderID,
  t.validation_result,
  ARRAY_AGG(f.value:explanation::STRING) AS explanations_array
FROM (
  SELECT
    orderID,
    MY_DB.MY_SCHEMA.atc_validate_orders_year_2008(orderID, orderDate) AS validation_result
  FROM MY_DB.demo_data.orders
) t,
LATERAL FLATTEN(INPUT => t.validation_result:rule_instance_results) f
GROUP BY t.orderID, t.validation_result;

This example extracts all rule explanations into an array:

orderID validation_result explanations_array

1001

{ "passed": false, "rule_instance_results": [ { "dimension_result": "INVALID", "explanation": "FUTURE_DATE", "overall_result": "FAILED", "rule_instance_id": "Date is not in future" }, { "dimension_result": "VALID", "explanation": "OTHER", "overall_result": "PASSED", "rule_instance_id": "Date is in fiscal season range" } ], "score": 2000 }

["FUTURE_DATE", "OTHER"]

You can similarly access other fields like:

  • f.value:dimension_result::STRING: Rule dimension result.

  • f.value:overall_result::STRING: Individual rule result (PASSED/FAILED).

  • f.value:rule_instance_id::STRING: Rule instance identifier.

Integrate firewalls into production pipelines

The following examples show how to integrate ONE data quality UDFs into production data pipelines. These UDFs serve as automated quality checkpoints throughout the pipeline, enforcing data quality standards from ingestion to consumption.

The examples show how to:

Filter out invalid data

Use this pattern to prevent low-quality data from entering downstream systems, mainly during early-stage ingestion or staging.

To exclude records that fail data quality checks, apply the condition :passed = TRUE to the UDF output.

ETL/ELT pipeline example
-- Insert only valid records
INSERT INTO clean_customers
SELECT firstname, lastname, email, phone
FROM raw_customers
WHERE MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed = TRUE;
Streaming data validation example using Snowpipe with UDFs
-- Auto-ingest with validation
CREATE PIPE customer_validation_pipe AS
COPY INTO validated_customers
FROM (
  SELECT
    $1:firstname::STRING as firstname,
    $1:lastname::STRING as lastname,
    MY_DB.MY_SCHEMA.atc_dq_customers($1:firstname::STRING, $1:lastname::STRING) as quality_result
  FROM @my_stage
  WHERE MY_DB.MY_SCHEMA.atc_dq_customers($1:firstname::STRING, $1:lastname::STRING):passed = TRUE
);

Flag data quality issues

Create quality-flagged datasets by adding quality check columns to your tables, without filtering any data.

This pattern is useful when you need to preserve full data lineage or handle quality issues downstream, for example, by quarantining, alerting, or dashboarding.

ETL/ELT pipeline integration example
CREATE OR REPLACE TABLE customers_with_quality AS
SELECT
  *,
  MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname) as quality_check,
  MY_DB.MY_SCHEMA.atc_dq_email(email) as email_quality
FROM raw_customers;

Implement quality gates or circuit breakers

Stop pipeline execution if quality thresholds are not met.

To do this, calculate quality pass rates and implement conditional logic that halts the pipeline when thresholds aren’t reached, including automated alerting to notify stakeholders.

This approach helps enforce SLAs and prevents downstream impact when data quality declines.

Example
-- Check data quality before proceeding
SET quality_pass_rate = (
  SELECT
    COUNT_IF(MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed) * 100.0 / COUNT(*)
  FROM raw_customers
);

-- Conditional processing based on quality
IF ($quality_pass_rate < 95) THEN
  CALL SYSTEM$SEND_EMAIL('dq-alerts@company.com', 'Data Quality Alert', 'Quality below threshold');
  RETURN 'PIPELINE_FAILED';
END IF;

Monitoring and alerting

Create quality monitoring views. The views aggregate quality metrics over time, providing the foundation for dashboards and monitoring systems that track data quality trends.

Example
CREATE OR REPLACE VIEW daily_quality_report AS
SELECT
  DATE(created_at) as report_date,
  COUNT(*) as total_records,
  COUNT_IF(MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed) as valid_records,
  COUNT_IF(NOT MY_DB.MY_SCHEMA.atc_dq_customers(firstname, lastname):passed) as invalid_records
FROM customers
GROUP BY DATE(created_at);

Integrate into testing frameworks

Catch quality issues early in the development process by integrating UDFs into testing frameworks like dbt. This helps maintain quality before deployment.

In dbt tests
{% test validate_customers(model, first_name_column, last_name_column) %}
select *
from {{ model }}
where MY_DB.MY_SCHEMA.atc_dq_customers({{ first_name_column }}, {{ last_name_column }}):passed = FALSE
{% endtest %}

How to update firewalls?

When you change the configuration of DQ Firewalls in Ataccama ONE, these changes must be deployed to Snowflake to ensure they are applied in the Snowflake environment.

This process requires the same administrator permissions needed for DQ Gates installation.

Was this page useful?