Snowflake Lineage Scanner
The Snowflake lineage scanner extracts metadata through online (JDBC) based extraction.
What Snowflake statements and syntax are supported?
The following types of statements are supported:
-
CREATE TABLE AS SELECT
-
CREATE VIEW
-
CREATE MATERIALIZED VIEW
-
INSERT
-
MERGE
-
SELECT
-
UPDATE
The scanner also supports the following Snowflake objects for Design Time Lineage:
-
Views
-
Materialized views
Lineage for procedures (SQL, JAVA, or Python) is also supported. If a lineage was captured from a procedure, the procedure name is available in the lineage. Currently, the procedure name is displayed as a comment in the SQL code preview window.
Lineage from procedures and other Snowflake objects is supported only indirectly based on the captured queries in Snowflake’s QUERY_HISTORY table.
|
Limitations
Currently, the following statement types that could be relevant for lineage are not supported:
-
SWAP tables
-
CLONE table
Statements containing the following SQL constructs will not be included in the lineage diagram (in Snowflake Enterprise Edition supported without expression details):
-
Values clause
-
Match recognize
-
Column rename
-
Rare table functions (such as Identifier)
In addition, the following features are currently not supported:
-
Lineage from Snowflake shares
-
Lineage from/to stage (for example, the
COPY
command for loading files from S3 storage into a Snowflake table) -
Streams (in Snowflake Enterprise Edition supported without expression details)
-
Dynamic tables (in Snowflake Enterprise Edition supported without expression details)
Prerequisites
The lineage scanner uses metadata from views located in the ACCOUNT_USAGE
schema.
For details, see the official Snowflake documentation: Account Usage.
Permissions and security
The IMPORTED PRIVILEGES role can access all required views from the ACCOUNT_USAGE
schema.
This role doesn’t have access to any data.
View name | Database role | Used in lineage toolkit | Enterprise Edition only | Purpose | Latency | Reference |
---|---|---|---|---|---|---|
|
|
Yes |
No |
Database metadata |
45 minutes |
|
|
|
Yes |
No |
Database metadata |
90 minutes |
|
|
|
Yes |
No |
Database metadata |
90 minutes |
|
|
|
Yes |
No |
SQL history |
90 minutes |
|
|
|
Yes, partially |
No |
Application name extraction |
3 hours |
|
|
|
Yes |
Yes |
Lineage information |
3 hours |
To enable other roles to access the database and schemas and query the views, a user with the ACCOUNTADMIN
role must run the following command:
USE ROLE ACCOUNTADMIN;
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE ata_mde_role;
Scanner configuration
Online (JDBC) based extraction connects to Snowflake and extracts the necessary metadata and query log from the database. Currently, only username and password authentication is supported.
All fields marked with an asterisk (*
) are mandatory.
Property | Description | ||
---|---|---|---|
|
Unique name for the scanner job. |
||
|
Specifies the source type to be scanned.
Must contain |
||
|
A human-readable description of the scan. |
||
|
List of Ataccama ONE connection names for future automatic pairing. |
||
|
Path to a catalog file for initialization. Otherwise, an empty database is used. Used only for troubleshooting. |
||
|
Snowflake account identifier. Displayed as the connection name on the Import Lineage screen in ONE. See the official Snowflake documentation: Account name in your organization. |
||
|
Generates application impact lineage for each application that executed Possible values: |
||
|
Generates a CSV file for each accessed column. See Usage statistics export. Possible values: |
||
|
Full JDBC connection string.
If |
||
|
Refers to the part preceding 'snowflakecomputing.com' in the URL of your Snowflake connection.
For instance, if your access URL is |
||
|
Snowflake username. |
||
|
Snowflake password. Can be encrypted. |
||
|
Snowflake warehouse.
If not provided, the default warehouse assigned to the |
||
|
Snowflake role.
If not provided, the default role assigned to the |
||
|
Lineage history in days.
For example, a value of
The maximum value is currently limited to 100 days. |
||
|
Restricts the lineage extraction to a specified list of databases.
All source database objects (
|
||
|
List of excluded databases.
If one of the source database objects (
|
||
|
Restricts the lineage extraction to a specified list of schemas.
All source database objects (
|
||
|
List of excluded schemas.
If one of the source schemas objects (
|
||
|
List of included applications. It can be used to limit lineage extraction to specific applications. Example values: |
||
|
List of excluded applications. It can be used to limit lineage extraction to specific applications. Example values: |
||
|
Disables extracting queries from Default value: |
||
|
If set to Default value: |
||
|
If set to Default value: |
{
"scannerConfigs":[
{
"name":"snowflake-lineage",
"accountIdentifier":"myorg-identifier",
"sourceType":"SNOWFLAKE",
"description":"Scan lineage demo export files",
"generateApplicationImpactLineage":false,
"createUsageStatisticsReports":false,
"lineageFromPastNDays":5,
"includeDatabases":[
"STAGE_DEV",
"BI_DEV"
],
"excludeDatabases":[
],
"includeSchemas":[
""
],
"excludeSchemas":[
"MY_SANDBOX"
],
"excludeQueriesFromDbt":true,
"excludeApplications":[
"Talend"
],
"connection":{
"account":"myorg-identifier.us-west-1",
"username":"ATA_LINEAGE_EXTRACTION_USER",
"password":"@@ref:ata:[SNOWFLAKE_PASSWORD]",
"warehouse":"SMALL_WAREHOUSE"
}
}
]
}
Application impact lineage
A table-level impact lineage is created for each referenced Snowflake table and Application combination.
An application here can be any tool querying Snowflake tables or views in the extracted period (for example, last 10 days):
-
BI tools (such as Tableau, Power BI, Metabase)
-
Machine learning applications
-
Reverse ETL Tools
To turn on application impact lineage generation, set generateApplicationImpactLineage
to true
in the JSON configuration file.
By default, this type of lineage is not generated to speed up how quickly lineage is extracted.
Usage statistics export
Usage statistics can be exported in CSV format. In the following example, you can see what usage statistics look like for Ataccama Datawarehouse most queried columns (maximum of three columns per table):
The CSV files can be found in the folder holding the scanner execution output files (exec_<date>_<id>/<scanner-name>
).
Look for ColumnUsage.csv
and TableUsage.csv
:
SQL DQ (anomaly) detections
In addition to usage statistics, results of the following SQL validations are exported into CSV files as well:
-
Detection results of unreferenced tables are available in the file
NotUsedFromSources.csv
: -
Detection results of orphaned
With
clauses are available in the fileNotReferencedWithClauses.csv
:
The CSV files can be found in the folder holding the scanner execution output files (exec_<date>_<id>/<scanner-name>
).
Troubleshooting online (JDBC) extraction
- No active warehouse selected error
-
If you receive the following error:
No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command.
Try setting the
warehouse
in the connection configuration section of the JSON configuration file, then rerun the command. - JDBC driver communication error
-
If you receive the following error:
JDBC driver encountered communication error. Message: HTTP status=403.
This can indicate that the configuration parameter
account
contains only the SnowflakeaccountIdentifier
(for example,myorg-identifier
) instead ofaccountIdentifier
together with thecloudRegion
(for example,myorg-identifier.us-west-1
). Try updating theaccount
parameter in the JSON configuration file, then rerun the command.
Was this page useful?