BigQuery Lineage Scanner
Supported connectivity
The scanner connects to the BigQuery server using the following connection method:
-
Connector type: JDBC.
-
Authentication method: Service account key.
BigQuery permissions
Google Cloud Platform (GCP) roles and permissions for metadata extraction
To enable the extraction of metadata across multiple BigQuery projects, ensure that the following permissions and roles are correctly assigned to the service account. These permissions are required for each project where metadata will be extracted.
The same service account should be used for all projects.
Before you start
Keep in mind:
-
The following roles must be assigned per project where metadata extraction is required.
-
Use the same service account across all projects to streamline the metadata extraction process.
Step 1: Assign required GCP roles to the service account
The service account must have sufficient permissions to query metadata and access query history. Assign the following roles to the service account for each project:
-
BigQuery Metadata Viewer (
roles/bigquery.metadataViewer
)-
Grants permissions to read metadata about datasets, tables, and views.
-
Needs to be set for all extracted projects.
-
Required to allow access to table, column, and view metadata.
-
-
BigQuery Resource Viewer (
roles/bigquery.resourceViewer
)-
Grants permissions to view BigQuery workloads but does not grant permission to create or modify any slot reservations or commitments.
-
Needs to be set for all extracted projects.
-
Required to allow access to the query metadata.
-
-
BigQuery Job User (
roles/bigquery.jobUser
)-
Grants permissions to run queries and list jobs.
-
Needs to be set in the project for which service account key is created.
-
Required to allow access to query history and enables running queries to extract metadata.
-
Step 2: Generate a service account key file
Once you assign the required roles to the service account, generate a service account key file for authentication:
-
Go to the Google Cloud Console.
-
Navigate to IAM & Admin > Service Accounts.
-
Select the service account that will be used for metadata extraction.
-
Select Keys > Add Key > Create New Key.
-
Choose the JSON key type and download the key file.
Keep this JSON key file secure, as it contains credentials for the service account. Use this key file to authenticate the service account when extracting metadata. |
Scanner configuration
All fields marked with an asterisk (*
) are mandatory.
Property | Description |
---|---|
|
Unique name for the scanner job. |
|
Specifies the source type to be scanned.
Must contain |
|
A human-readable description of the scan. |
|
List of project IDs to include in the scan. |
|
List of project IDs to exclude from the scan. |
|
List of regions to scan (such as |
|
If set to |
|
If set to |
|
Lineage history in days.
For example, if set to |
|
The project ID of the project where the service account is created. |
|
The contents of the service account key file used for authentication. |
|
Email of the service account used for authentication. |
{
"scannerConfigs": [
{
"name": "example-config",
"sourceType": "BIGQUERY",
"description": "Example configuration for the BigQuery Lineage Scanner",
"includeProjectIds":[
"project1",
"project2"
],
"excludeProjectIds":[
"project3"
],
"regionNames":[
"us",
"eu"
],
"skipTableColumnsExtraction": false,
"skipViewsExtraction": false,
"lineageFromPastNDays": 5,
"connection": {
"projectId": "example-project-id",
"serviceAccountJson": "@@ref:ata:[bigquery.keyfile.json]",
"serviceAccountEmail": "example-service-account@example.com"
}
}
]
}
Troubleshooting
Here are some common issues you might encounter while using the scanner, along with suggested solutions.
Cannot authenticate with the service account
- Problem
-
Preflight checks do not pass and authentication with the service account fails. The logs contain a similar message:
Preflight check failed: …
. - Solution
-
-
Double-check that the service account key file or its content is correctly specified in the configuration.
-
If providing a file, ensure the file path can be accessed and is correctly referenced.
-
Cannot extract metadata from BigQuery
- Problem
-
The scanner cannot extract metadata from BigQuery datasets. The logs contain a similar message:
Failed to extract table columns for project <project-name> for region <region-name>.
- Solution
-
-
Confirm that the service account has the necessary roles and permissions assigned in the GCP project.
-
Ensure the service account has BigQuery Metadata Viewer role assigned for the project.
-
Cannot extract lineage information
- Problem
-
Jobs extraction fails. The logs contain a similar message:
Failed to extract query jobs for project <project-name> for region <region-name>.
- Solution
-
-
Check the role BigQuery Resource Viewer is assigned to the service account.
-
Check, that the
regionNames
are correctly specified in the configuration file. -
Check that there is a lineage metadata to extract in the last
lineageFromPastNDays
days. If there isn’t any, increase the value appropriately.
-
Cannot extract metadata from specific projects
- Problem
-
Metadata extraction fails for certain BigQuery projects. The logs contain a similar message for a specific project:
Failed to extract metadata for project <project-name> for region <region-name>
. - Solution
-
-
Verify that the project IDs are correctly specified in the configuration file.
-
Ensure that the service account has sufficient access to those projects.
-
Lineage is missing
- Problem
-
The lineage is missing or incomplete even though the extraction finished without any issues.
This can be caused by multiple issues. Check the
sql_analyzer_error_log.csv
log file: for the queries present in the file, there is no lineage available at the moment. - Solution
-
-
Share the
sql_analyzer_error_log.csv
log file with the Ataccama Support team for further analysis.
-
Was this page useful?