User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Ataccama Data Quality Application for Snowflake

This application is developed independently of ONE. Standard versioning does not apply.

Ataccama Data Quality Application is a Native Snowflake Application available from the Snowflake Marketplace.

The application lets you apply a library of frequently used data quality (DQ) rules that can be called as SQL user-defined functions (UDFs) in Snowflake. These predefined rules can cover a wide variety of DQ use cases and save time and effort required to write similar rules manually.

The pre-compiled rule logic is executed as Python code inside a container running in the Snowflake container services. All operations are executed within Snowflake.

Install and start the application

To install and start the application, do the following:

  1. Install the application from the Snowflake Marketplace.

  2. Grant the application appropriate permissions in the application UI.

  3. Start the application: Call the following function or run it from the RULE_LIBRARY tab:

CALL ataccama_data_quality.app_public.start_app();
If you changed the default application name (ataccama_data_quality) during installation, update all available code samples accordingly.

Application can take a couple of minutes to start when used for the first time.

To check the application starting status, you can call following procedure:

CALL ataccama_data_quality.app_public.service_status();

The procedure returns UNKNOWN, PENDING, and READY statuses as application spins up the container pool and deploys service applications.

After the procedure returns a READY status, it usually takes between one and two minutes for the remaining networking changes to be applied.

You only have to start the application once. It should always be available in your account after the initial setup.

After you install and start the application in your account, switch to the application owner role (the role you were using when you installed the app) whenever you want to use the application.

Call Ataccama data quality UDFs

The following UDFs are available:

  • dq.validate(string, DQ_Rule_ID)

    • string: Provide a string to validate using the selected DQ rule.

    • DQ_Rule_ID: Define which DQ rule will be used to validate the provided string.

    • Depending on the result, the function can return a generic PASSED or FAILED response.

    • If the rule does not exist, the function responds with an error.

    • The following example is a validation of a string with the value of america validated using the Continent DQ rule:

      SELECT ataccama_data_quality.dq.validate('america','Continent');
  • dq.validate_exp(string, DQ_Rule_ID)

    • string: Provide a string to validate using the selected DQ rule.

    • DQ_Rule_ID: Define which DQ rule logic will be used to validate the provided string.

    • Depending on the result, the function can return a VALID or a more granular explanation of why the validation failed.

    • If the rule does not exist, the function responds with an error.

    • The following example is a validation of string with a value of america validated using the Continent DQ Rule:

      SELECT ataccama_data_quality.dq.validate_exp('america','Continent');
  • dq.get_firewall_definitions()

    • The function loads names and IDs of all available DQ rules.

    • The following example returns all available DQ rule names and IDs:

      SELECT ataccama_data_quality.dq.get_firewall_definitions();

Performance tuning options

The application runs inside a compute pool within your Snowflake account, leveraging Snowflake container services. By default, installation spins up a compute pool with three nodes and three services. The used Node type is: CPU_X64_XS.

In some scenarios, you might want to re-configure the compute pool either for better performance or cost-saving purposes. The application provides you with an application role with required access for this purpose: SETTINGS_APP_ADMIN. This role can manage changes to the compute pool and the service instance through the basic UI in the Streamlit application (RULE_LIBRARY tab) Settings section, or in the Snowlake SQL interface.

Assign the application role

You can assign the application role in the Snowflake user interface, or through SQL:

GRANT APPLICATION ROLE ATACCAMA.SETTINGS_APP_ADMIN TO ROLE <YOUR_ROLE>;

Work with compute resources

When re-configuring compute pool resources, the recommended node to service instance ratio is 1:1 for the default CPU_X64_XS node type (for example, four nodes = four service instances). Therefore changing both the service instance settings and node settings is recommended.

To view details of the current application compute pool, use the following SQL query:

DESCRIBE COMPUTE POOL <app_name>_compute_pool;

The compute pool name should follow the pattern described in this document, but in case you are not sure, you can get the compute pool name by calling one of the helper functions:

CALL <app_name>.get_compute_pool_name();

To change the compute pool settings, such as the number of nodes, use the built-in Streamlit app (RULE_LIBRARY tab), or the following SQL query:

ALTER COMPUTE POOL <compute_pool_name> set MIN_NODES = <number> MAX_NODES = <number>;

To suspend or resume compute pool, use the built-in Streamlit app (RULE_LIBRARY tab), or the following SQL queries:

ALTER COMPUTE POOL <compute_pool_name> SUSPEND
ALTER COMPUTE POOL <compute_pool_name> RESUME
If a DQ rule function is called on a suspended compute pool, the pool automatically resumes operations (this usually takes a couple of minutes).

To list the currently running service instances, use the built-in Streamlit app (RULE_LIBRARY tab), or following the SQL query:

SHOW SERVICE INSTANCES IN SERVICE <app_name>.dq.ataccama_firewall_service

To change the number of service instances, use the built-in Streamlit app (RULE_LIBRARY tab), or following SQL query:

CALL <app_name>.app_public.alter_service(<number>,<number>)

Available DQ rules

To browse all available DQ rules, related information, and usage examples, go to the RULE_LIBRARY tab within the app.

Was this page useful?