BigQuery Lineage Scanner
Scanned and supported objects
|
While we strive for comprehensive lineage capture, certain dataflows and transformations might be incomplete or unavailable due to technical constraints. We continuously work to expand coverage and accuracy. |
Limitations
Some SQL constructs might not be supported.
Fallback lineage provides table-level lineage only. See Fallback table-level lineage > Limitations.
Supported connectivity
The scanner connects to the BigQuery server using the following connection method:
-
Connector type: JDBC.
-
Authentication method: Service account key.
BigQuery permissions
Google Cloud Platform (GCP) roles and permissions for metadata extraction
To enable the extraction of metadata across multiple BigQuery projects, ensure that the following permissions and roles are correctly assigned to the service account. These permissions are required for each project where metadata will be extracted.
The same service account should be used for all projects.
Before you start
Keep in mind:
-
The following roles must be assigned per project where metadata extraction is required.
-
Use the same service account across all projects to streamline the metadata extraction process.
Step 1: Assign required GCP roles to the service account
The service account must have sufficient permissions to query metadata and access query history. Assign the following roles to the service account for each project:
- BigQuery Metadata Viewer (
roles/bigquery.metadataViewer) -
-
Grants permissions to read metadata about datasets, tables, and views.
-
Needs to be set for all extracted projects.
-
Required to allow access to table, column, and view metadata.
-
- BigQuery Resource Viewer (
roles/bigquery.resourceViewer) -
-
Grants permissions to view BigQuery workloads but does not grant permission to create or modify any slot reservations or commitments.
-
Needs to be set for all extracted projects.
-
Required to allow access to the query metadata.
-
- BigQuery Job User (
roles/bigquery.jobUser) -
-
Grants permissions to run queries and list jobs.
-
Needs to be set in the project for which service account key is created.
-
Required to allow access to query history and enables running queries to extract metadata.
-
Step 2: Generate a service account key file
Once you assign the required roles to the service account, generate a service account key file for authentication:
-
Go to the Google Cloud Console.
-
Navigate to IAM & Admin > Service Accounts.
-
Select the service account that will be used for metadata extraction.
-
Select Keys > Add Key > Create New Key.
-
Choose the JSON key type and download the key file.
| Keep this JSON key file secure, as it contains credentials for the service account. Use this key file to authenticate the service account when extracting metadata. |
Step 3: Enable fallback table-level lineage (optional)
If you want to use the fallback lineage option, you need to:
-
Set
enableFallbackTableLineagetotruein the scanner configuration. -
Enable the Data Lineage API in all GCP projects where you plan to use fallback table-level lineage.
Note that BigQuery Data Lineage has a maximum retention period of 30 days and there are costs associated with enabling lineage collection. For detailed information about enabling the API, retention periods, and pricing, see the official BigQuery Data Lineage documentation. -
Assign the required
datalineage.events.getpermission to the service account.You can grant this by assigning the Data Lineage Viewer role (
roles/datalineage.viewer) to the service account for each project where fallback lineage will be used. Alternatively, create a custom role with thedatalineage.events.getpermission.
Scanner configuration
All fields marked with an asterisk (*) are mandatory.
| Property | Description |
|---|---|
|
Unique name for the scanner job. |
|
Specifies the source type to be scanned.
Must contain |
|
A human-readable description of the scan. |
|
List of project IDs to include in the scan. |
|
List of project IDs to exclude from the scan. |
|
List of regions to scan (such as |
|
If set to |
|
If set to |
|
Lineage history in days.
For example, if set to |
|
Enable fallback to Google’s Data Lineage API for table-level lineage when SQL parsing fails.
Default value: When configured, table-level lineage is provided for queries that could not be parsed. See Fallback table-level lineage > Limitations. |
|
The project ID of the project where the service account is created. |
{
"scannerConfigs": [
{
"name": "example-config",
"sourceType": "BIGQUERY",
"description": "Example configuration for the BigQuery Lineage Scanner",
"includeProjectIds":[
"project1",
"project2"
],
"excludeProjectIds":[
"project3"
],
"regionNames":[
"us",
"eu"
],
"skipTableColumnsExtraction": false,
"skipViewsExtraction": false,
"lineageFromPastNDays": 5,
"enableFallbackTableLineage": false,
"connection": {
"projectId": "example-project-id"
}
}
]
}
Fallback table-level lineage
When SQL parsing fails to resolve lineage for certain BigQuery queries, the scanner can fall back to Google’s Data Lineage API to retrieve table-level lineage. This fallback mechanism helps ensure lineage coverage even when queries contain SQL constructs that cannot be parsed.
To enable fallback lineage, set enableFallbackTableLineage to true in your scanner configuration.
See Scanner configuration for details.
How it works
The fallback lineage process goes through the following steps:
-
The scanner first attempts to extract lineage by parsing SQL queries from query history.
-
For queries that cannot be parsed or analyzed, the scanner identifies unprocessed target tables and views.
-
When fallback is configured, the scanner queries Google’s Data Lineage API to retrieve source tables for these unprocessed targets.
-
Table-level lineage relationships are then created from the API response.
The lineageFromPastNDays parameter applies to both the main lineage extraction and fallback lineage processing, determining how far back in history to look for lineage information.
|
Limitations
-
Fallback lineage provides table-level lineage only. Column-level lineage is not available through the fallback mechanism as it requires successful SQL parsing and analysis.
-
The fallback mechanism only processes target tables and views that were not successfully analyzed during the main lineage extraction phase.
Troubleshooting
Here are some common issues you might encounter while using the scanner, along with suggested solutions.
Cannot authenticate with the service account
- Problem
-
Preflight checks do not pass and authentication with the service account fails. The logs contain a similar message:
Preflight check failed: …. - Solution
-
-
Double-check that the service account key file or its content is correctly specified in the configuration.
-
If providing a file, ensure the file path can be accessed and is correctly referenced.
-
Cannot extract metadata from BigQuery
- Problem
-
The scanner cannot extract metadata from BigQuery datasets. The logs contain a similar message:
Failed to extract table columns for project <project-name> for region <region-name>. - Solution
-
-
Confirm that the service account has the necessary roles and permissions assigned in the GCP project.
-
Ensure the service account has BigQuery Metadata Viewer role assigned for the project.
-
Cannot extract lineage information
- Problem
-
Jobs extraction fails. The logs contain a similar message:
Failed to extract query jobs for project <project-name> for region <region-name>. - Solution
-
-
Check the role BigQuery Resource Viewer is assigned to the service account.
-
Check, that the
regionNamesare correctly specified in the configuration file. -
Check that there is a lineage metadata to extract in the last
lineageFromPastNDaysdays. If there isn’t any, increase the value appropriately.
-
Cannot extract metadata from specific projects
- Problem
-
Metadata extraction fails for certain BigQuery projects. The logs contain a similar message for a specific project:
Failed to extract metadata for project <project-name> for region <region-name>. - Solution
-
-
Verify that the project IDs are correctly specified in the configuration file.
-
Ensure that the service account has sufficient access to those projects.
-
Lineage is missing
- Problem
-
The lineage is missing or incomplete even though the extraction finished without any issues.
This can be caused by multiple issues. Check the
sql_analyzer_error_log.csvlog file: for the queries present in the file, there is no lineage available at the moment. - Solution
-
-
Share the
sql_analyzer_error_log.csvlog file with the Ataccama Support team for further analysis.
-
Fallback lineage processing fails
- Problem
-
Fallback lineage processing fails with permission errors or API unavailable errors. The logs contain messages such as
Permission denied. Ensure the service account has 'datalineage.events.get' permission and Data Lineage API is enabled in project <project-id>orData Lineage API is unavailable for location <location>. - Solution
-
-
Identify which GCP project is causing the error from the log message. The error is project-specific, so you need to address it for each affected project.
-
Ensure that the Data Lineage API is enabled in the specific GCP project(s) where the error occurs:
-
Go to the Google Cloud Console.
-
Navigate to APIs & Services > Library.
-
Search for "Data Lineage API" and verify it is configured for the project.
-
-
Verify that the service account has the
datalineage.events.getpermission for the affected project:-
Assign the Data Lineage Viewer role (
roles/datalineage.viewer) to the service account for each project where fallback lineage is used. -
Alternatively, create a custom role with the
datalineage.events.getpermission.
-
-
Check that the region specified in
regionNamesis supported by the Data Lineage API. -
Verify that
enableFallbackTableLineageis set totruein your scanner configuration if you intend to use this feature.
-
Was this page useful?