User Community Service Desk Downloads

BigQuery Lineage Scanner

Scanned and supported objects

Supported statements types

  • CREATE TABLE AS SELECT

  • CREATE VIEW

  • INSERT

  • MERGE

  • SELECT

  • UPDATE

Supported database objects

  • Views

  • Materialized views

Limitations

Some SQL constructs might not be supported.

Supported connectivity

The scanner connects to the BigQuery server using the following connection method:

  • Connector type: JDBC.

  • Authentication method: Service account key.

BigQuery permissions

Google Cloud Platform (GCP) roles and permissions for metadata extraction

To enable the extraction of metadata across multiple BigQuery projects, ensure that the following permissions and roles are correctly assigned to the service account. These permissions are required for each project where metadata will be extracted.

The same service account should be used for all projects.

Before you start

Keep in mind:

  • The following roles must be assigned per project where metadata extraction is required.

  • Use the same service account across all projects to streamline the metadata extraction process.

Step 1: Assign required GCP roles to the service account

The service account must have sufficient permissions to query metadata and access query history. Assign the following roles to the service account for each project:

  • BigQuery Metadata Viewer (roles/bigquery.metadataViewer)

    • Grants permissions to read metadata about datasets, tables, and views.

    • Needs to be set for all extracted projects.

    • Required to allow access to table, column, and view metadata.

  • BigQuery Resource Viewer (roles/bigquery.resourceViewer)

    • Grants permissions to view BigQuery workloads but does not grant permission to create or modify any slot reservations or commitments.

    • Needs to be set for all extracted projects.

    • Required to allow access to the query metadata.

  • BigQuery Job User (roles/bigquery.jobUser)

    • Grants permissions to run queries and list jobs.

    • Needs to be set in the project for which service account key is created.

    • Required to allow access to query history and enables running queries to extract metadata.

Step 2: Generate a service account key file

Once you assign the required roles to the service account, generate a service account key file for authentication:

  1. Go to the Google Cloud Console.

  2. Navigate to IAM & Admin > Service Accounts.

  3. Select the service account that will be used for metadata extraction.

  4. Select Keys > Add Key > Create New Key.

  5. Choose the JSON key type and download the key file.

Keep this JSON key file secure, as it contains credentials for the service account. Use this key file to authenticate the service account when extracting metadata.

Scanner configuration

All fields marked with an asterisk (*) are mandatory.

Property Description

name*

Unique name for the scanner job.

sourceType*

Specifies the source type to be scanned. Must contain BIGQUERY.

description*

A human-readable description of the scan.

includeProjectIds

List of project IDs to include in the scan.

excludeProjectIds

List of project IDs to exclude from the scan.

regionNames

List of regions to scan (such as us, eu). Default value: us.

skipTableColumnsExtraction

If set to true, table column metadata is not extracted. Default value: false.

skipViewsExtraction

If set to true, view metadata is not extracted. Default value: false.

lineageFromPastNDays

Lineage history in days. For example, if set to 10, the lineage extract will contain lineage from transformations executed in the past 10 days.

connection.projectId*

The project ID of the project where the service account is created.

connection.serviceAccountJson*

The contents of the service account key file used for authentication.

connection.serviceAccountEmail*

Email of the service account used for authentication.

BigQuery scanner example configuration
{
   "scannerConfigs": [
      {
         "name": "example-config",
         "sourceType": "BIGQUERY",
         "description": "Example configuration for the BigQuery Lineage Scanner",
         "includeProjectIds":[
            "project1",
            "project2"
         ],
         "excludeProjectIds":[
            "project3"
         ],
         "regionNames":[
            "us",
            "eu"
         ],
         "skipTableColumnsExtraction": false,
         "skipViewsExtraction": false,
         "lineageFromPastNDays": 5,
         "connection": {
            "projectId": "example-project-id",
            "serviceAccountJson": "@@ref:ata:[bigquery.keyfile.json]",
            "serviceAccountEmail": "example-service-account@example.com"
         }
      }
   ]
}

Troubleshooting

Here are some common issues you might encounter while using the scanner, along with suggested solutions.

Cannot authenticate with the service account

Problem

Preflight checks do not pass and authentication with the service account fails. The logs contain a similar message: Preflight check failed: …​.

Solution
  • Double-check that the service account key file or its content is correctly specified in the configuration.

  • If providing a file, ensure the file path can be accessed and is correctly referenced.

Cannot extract metadata from BigQuery

Problem

The scanner cannot extract metadata from BigQuery datasets. The logs contain a similar message: Failed to extract table columns for project <project-name> for region <region-name>.

Solution
  • Confirm that the service account has the necessary roles and permissions assigned in the GCP project.

  • Ensure the service account has BigQuery Metadata Viewer role assigned for the project.

Cannot extract lineage information

Problem

Jobs extraction fails. The logs contain a similar message: Failed to extract query jobs for project <project-name> for region <region-name>.

Solution
  • Check the role BigQuery Resource Viewer is assigned to the service account.

  • Check, that the regionNames are correctly specified in the configuration file.

  • Check that there is a lineage metadata to extract in the last lineageFromPastNDays days. If there isn’t any, increase the value appropriately.

Cannot extract metadata from specific projects

Problem

Metadata extraction fails for certain BigQuery projects. The logs contain a similar message for a specific project: Failed to extract metadata for project <project-name> for region <region-name>.

Solution
  • Verify that the project IDs are correctly specified in the configuration file.

  • Ensure that the service account has sufficient access to those projects.

Lineage is missing

Problem

The lineage is missing or incomplete even though the extraction finished without any issues.

This can be caused by multiple issues. Check the sql_analyzer_error_log.csv log file: for the queries present in the file, there is no lineage available at the moment.

Solution
  • Share the sql_analyzer_error_log.csv log file with the Ataccama Support team for further analysis.

Was this page useful?