User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Snowflake Lineage Scanner

The Snowflake lineage scanner extracts metadata through online (JDBC) based extraction.

What Snowflake statements and syntax are supported?

The following types of statements are supported:

  • CREATE TABLE AS SELECT

  • CREATE VIEW

  • CREATE MATERIALIZED VIEW

  • INSERT

  • MERGE

  • SELECT

  • UPDATE

The scanner also supports the following Snowflake objects for Design Time Lineage:

  • Views

  • Materialized views

Lineage for procedures (SQL, JAVA, or Python) is also supported. If a lineage was captured from a procedure, the procedure name is available in the lineage. Currently, the procedure name is displayed as a comment in the SQL code preview window.

Lineage from procedures and other Snowflake objects is supported only indirectly based on the captured queries in Snowflake’s QUERY_HISTORY table.

Limitations

Currently, the following statement types that could be relevant for lineage are not supported:

  • SWAP tables

  • CLONE table

Statements containing the following SQL constructs will not be included in the lineage diagram (in Snowflake Enterprise Edition supported without expression details):

  • Values clause

  • Match recognize

  • Column rename

  • Rare table functions (such as Identifier)

In addition, the following features are currently not supported:

  • Lineage from Snowflake shares

  • Lineage from/to stage (for example, the COPY command for loading files from S3 storage into a Snowflake table)

  • Streams (in Snowflake Enterprise Edition supported without expression details)

  • Dynamic tables (in Snowflake Enterprise Edition supported without expression details)

Prerequisites

The lineage scanner uses metadata from views located in the ACCOUNT_USAGE schema. For details, see the official Snowflake documentation: Account Usage.

Permissions and security

The IMPORTED PRIVILEGES role can access all required views from the ACCOUNT_USAGE schema. This role doesn’t have access to any data.

View name Database role Used in lineage toolkit Enterprise Edition only Purpose Latency Reference

TABLES

OBJECT_VIEWER

Yes

No

Database metadata

45 minutes

Tables

COLUMNS

OBJECT_VIEWER

Yes

No

Database metadata

90 minutes

Columns

VIEWS

OBJECT_VIEWER

Yes

No

Database metadata

90 minutes

Views

QUERY_HISTORY

GOVERNANCE_VIEWER

Yes

No

SQL history

90 minutes

Query History

SESSIONS

SECURITY_VIEWER

Yes, partially

No

Application name extraction

3 hours

Sessions

ACCESS_HISTORY

GOVERNANCE_VIEWER

Yes

Yes

Lineage information

3 hours

Access History

To enable other roles to access the database and schemas and query the views, a user with the ACCOUNTADMIN role must run the following command:

USE ROLE ACCOUNTADMIN;
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE ata_mde_role;

Scanner configuration

Online (JDBC) based extraction connects to Snowflake and extracts the necessary metadata and query log from the database. Currently, only username and password authentication is supported.

All fields marked with an asterisk (*) are mandatory.

Property Description

name*

Unique name for the scanner job.

sourceType*

Specifies the source type to be scanned. Must contain SNOWFLAKE.

description*

A human-readable description of the scan.

oneConnections

List of Ataccama ONE connection names for future automatic pairing.

inputDataCatalogFilePath

Path to a catalog file for initialization. Otherwise, an empty database is used.

Used only for troubleshooting.

accountIdentifier*

Snowflake account identifier. Displayed as the connection name on the Import Lineage screen in ONE.

See the official Snowflake documentation: Account name in your organization.

generateApplicationImpactLineage

Generates application impact lineage for each application that executed SELECT statements in the defined period. Can be used to see Snowflake objects referenced by unknown applications (such as reverse ETL tools, BI tools without lineage support, machine learning tools).

Possible values: true, false.

createUsageStatisticsReports

Generates a CSV file for each accessed column. See Usage statistics export.

Possible values: true, false.

jdbcUrl*

Full JDBC connection string. If account and all related connection attributes are configured, leave it empty.

account*

Refers to the part preceding 'snowflakecomputing.com' in the URL of your Snowflake connection. For instance, if your access URL is https://myorg-identifier.us-west-1.snowflakecomputing.com/, the corresponding account name would be myorg-identifier.us-west-1. If jdbcUrl is set, leave this empty.

username*

Snowflake username.

password*

Snowflake password. Can be encrypted.

warehouse

Snowflake warehouse. If not provided, the default warehouse assigned to the username is used.

role

Snowflake role. If not provided, the default role assigned to the username is used.

lineageFromPastNDays*

Lineage history in days. For example, a value of 10 means that lineage will contain only statements executed during the last 10 days with a lag of three hours (lineage from the last three hours will be not included due to the query_history and session views latency, see Account Usage).

At first execution, set it to a reasonably small value (1-2 days). This will help you estimate the lineage calculation duration for a longer period.

The maximum value is currently limited to 100 days.

includeDatabases

Restricts the lineage extraction to a specified list of databases. All source database objects (TABLE, VIEW) and the target database objects of an SQL query must we within the included databases list to have lineage from a query.

includeDatabases and excludeDatabases settings are mutually exclusive. Configure only one of them, or neither.

excludeDatabases

List of excluded databases. If one of the source database objects (TABLE, VIEW) or the target database objects of an SQL query is within the excluded database list, the lineage for the query will be not included.

includeDatabases and excludeDatabases settings are mutually exclusive. Configure only one of them, or neither.

includeSchemas

Restricts the lineage extraction to a specified list of schemas. All source database objects (TABLE, VIEW) and the target database objects of an SQL query must be within the included schemas list to have lineage from a query.

includeSchemas and excludeSchemas settings are mutually exclusive. Configure only one of them, or neither.

excludeSchemas

List of excluded schemas. If one of the source schemas objects (TABLE, VIEW) or the target database objects of an SQL query is within the excluded schemas list, the lineage for the query will be not included.

includeSchemas and excludeSchemas settings are mutually exclusive. Configure only one of them, or neither.

includeApplications

List of included applications. It can be used to limit lineage extraction to specific applications.

Example values: dbt, Talend, PythonConnector, Tableau.

excludeApplications

List of excluded applications. It can be used to limit lineage extraction to specific applications.

Example values: dbt, Talend, PythonConnector, Tableau.

excludeQueriesFromDbt

Disables extracting queries from dbt. This is useful when you plan to use the dbt scanner for creating more detailed lineage from dbt, which includes also metadata from dbt.

Default value: true.

skipTableColumnsExtraction

If set to true, data dictionary metadata is not extracted. Set it to true only for troubleshooting or testing purposes, or in specific proof-of-concept use cases.

Default value: false.

skipViewsExtraction

If set to true, views metadata is not extracted. The lineage export will contain lineage only from the views present in the query history view. That is, if you set lineageFromPastNDays to five days, only views created or recreated within the past five days will have lineage.

Default value: false.

Snowflake online (JDBC) extraction example configuration
{
   "scannerConfigs":[
      {
         "name":"snowflake-lineage",
         "accountIdentifier":"myorg-identifier",
         "sourceType":"SNOWFLAKE",
         "description":"Scan lineage demo export files",
         "generateApplicationImpactLineage":false,
         "createUsageStatisticsReports":false,
         "lineageFromPastNDays":5,
         "includeDatabases":[
            "STAGE_DEV",
            "BI_DEV"
         ],
         "excludeDatabases":[

         ],
         "includeSchemas":[
            ""
         ],
         "excludeSchemas":[
            "MY_SANDBOX"
         ],
         "excludeQueriesFromDbt":true,
         "excludeApplications":[
            "Talend"
         ],
         "connection":{
            "account":"myorg-identifier.us-west-1",
            "username":"ATA_LINEAGE_EXTRACTION_USER",
            "password":"@@ref:ata:[SNOWFLAKE_PASSWORD]",
            "warehouse":"SMALL_WAREHOUSE"
         }
      }
   ]
}

Application impact lineage

A table-level impact lineage is created for each referenced Snowflake table and Application combination.

An application here can be any tool querying Snowflake tables or views in the extracted period (for example, last 10 days):

  • BI tools (such as Tableau, Power BI, Metabase)

  • Machine learning applications

  • Reverse ETL Tools

To turn on application impact lineage generation, set generateApplicationImpactLineage to true in the JSON configuration file. By default, this type of lineage is not generated to speed up how quickly lineage is extracted.

Usage statistics export

Usage statistics can be exported in CSV format. In the following example, you can see what usage statistics look like for Ataccama Datawarehouse most queried columns (maximum of three columns per table):

usage statistics

The CSV files can be found in the folder holding the scanner execution output files (exec_<date>_<id>/<scanner-name>). Look for ColumnUsage.csv and TableUsage.csv:

csv files location

SQL DQ (anomaly) detections

In addition to usage statistics, results of the following SQL validations are exported into CSV files as well:

  • Detection results of unreferenced tables are available in the file NotUsedFromSources.csv:

    unreferenced tables
  • Detection results of orphaned With clauses are available in the file NotReferencedWithClauses.csv:

    orphaned with clauses

The CSV files can be found in the folder holding the scanner execution output files (exec_<date>_<id>/<scanner-name>).

image 2024 09 17 15 38 48 498

Troubleshooting online (JDBC) extraction

No active warehouse selected error

If you receive the following error:

No active warehouse selected in the current session.
Select an active warehouse with the 'use warehouse' command.

Try setting the warehouse in the connection configuration section of the JSON configuration file, then rerun the command.

JDBC driver communication error

If you receive the following error:

JDBC driver encountered communication error.
Message: HTTP status=403.

This can indicate that the configuration parameter account contains only the Snowflake accountIdentifier (for example, myorg-identifier) instead of accountIdentifier together with the cloudRegion (for example, myorg-identifier.us-west-1). Try updating the account parameter in the JSON configuration file, then rerun the command.

Was this page useful?