Databricks Connection

This article describes how to connect to Databricks for data processing and lineage extraction.

For lineage scanner configuration, see also Databricks Lineage Scanner.

Availability

Data processing & catalog	Edge processing	Lineage	Exceptions
✔	✔	✔	None

Data processing & catalog

Edge processing

Lineage

Exceptions

✔

None

Pushdown processing

With Databricks, Ataccama ONE profiling jobs always run in pushdown in Databricks and not in Ataccama’s runtime.

You can also configure pushdown processing for DQ evaluation jobs (see Enable pushdown for DQ evaluation).

To learn more about pushdown processing and when to use it, see Pushdown Processing and When to Use Pushdown for DQ Evaluation.

Prerequisites

Review how sources and connections work.
Create a source to add this connection to.
In your Databricks workspace, generate a Personal Access Token (PAT) for authentication. See Databricks personal access tokens in the Databricks documentation.

Add a connection

Go to [your source] > Connections and select Add Connection.
In Connection type, select Databricks.
Fill in the following:
- Name: A meaningful name for your connection. Used to indicate the location of catalog items.
- Description (Optional): A short description of the connection.
- JDBC: A JDBC connection string for your Databricks compute resource. See JDBC connection string format.
- Workspace URL, Host, HTTP Path: Required only for lineage extraction. Otherwise, you can leave them empty.
  
  See Lineage extraction settings.
Configure pushdown settings. See Enable pushdown for DQ evaluation.

You can find the server hostname and JDBC URL in your Databricks workspace.

For SQL warehouses, go to SQL Warehouses > [your warehouse] > Connection details.

For clusters, go to Compute > [your cluster] > Configuration > Advanced options > JDBC/ODBC.

For more information, see Get connection details for a Databricks compute resource in the Databricks documentation.

JDBC connection string format

The JDBC connection string uses the following format:

jdbc:databricks://<server-hostname>:<port>/<schema>;transportMode=http;ssl=1;httpPath=<http-path>;AuthMech=3

Replace the following:

<server-hostname>: The server hostname from your Databricks workspace connection details (for example, adb-1234567890123456.7.azuredatabricks.net for Azure or my-workspace.cloud.databricks.com for AWS). Copy this value from your Databricks workspace connection details.
<port>: The port number. The default is 443.
<schema> (Optional): The default database or schema name. If omitted, the active catalog and schema defaults are used.
<http-path>: The HTTP path for your compute resource. The format differs depending on the resource type (cluster or SQL warehouse). See Get connection details for a Databricks compute resource.

Properties transportMode=http and ssl=1 are required.

The AuthMech parameter depends on the authentication method. For Personal Access Token (PAT) authentication, use AuthMech=3.

Lineage extraction settings

The following properties are required if using Databricks for lineage:

Workspace URL: The URL of your Databricks workspace (for example, https://adb-1234567890123456.7.azuredatabricks.net).

Corresponds to the JDBC URL <server-hostname> with the protocol prefix https://.
HTTP Path: URL path for your Databricks compute resource. Corresponds to the JDBC URL <http-path>.
Host: Databricks host domain. Corresponds to the JDBC URL <server-hostname>.

Enable pushdown for DQ evaluation

Pushdown settings

In Data quality evaluation, select where DQ evaluation jobs will run:

Pushdown to process data in Databricks.
Cloud to process data in Ataccama’s runtime.

To help you decide, see When to Use Pushdown for DQ Evaluation.

In Data-based term detection, select whether to skip data-based term detection or run it in ONE. Data-based term detection can only be run in Ataccama’s runtime.

Operational database settings

If you enable pushdown for DQ evaluation, you need to specify a catalog and schema in Databricks where ONE can upload and store processing functions and temporary tables. For details about what’s stored, see Why DQ evaluation requires extra storage in your source.

You need to create the operational schema in your Databricks environment before running DQ evaluation.

Databricks operational catalog: The name of the catalog where the operational schema is located.
Databricks operational schema: The name of the schema created in Databricks for storing processing functions and temporary tables.

Add credentials

Only Token authentication is currently supported.

If you’ve enabled pushdown for DQ evaluation, you also need to mark at least one set of credentials as operational. ONE uses these credentials to manage the operational database (see Operational database settings).

If you’ve enabled pushdown for DQ evaluation, you must mark at least one set of credentials as operational. ONE uses these credentials to upload catalog item data to your operational database (see Operational database settings).

In most cases, the same set of credentials can be assigned as both default and operational. However, if you prefer to use a more privileged account just for managing the operational database, add a second set of credentials and mark only those as operational.

Select Add Credentials.
Fill in the following:
- Name: A name for this set of credentials.
- Description (Optional): Explain what the credentials are used for or provide other useful information.
- Token: The personal access token generated in Databricks. See Prerequisites.
To use this set of credentials by default when connecting to the data source, select Assign as default.
To use this set of credentials for the operational database for DQ evaluation, select Assign as operational.

Always use the dedicated credential fields for authentication details such as passwords, secrets, and tokens. This ensures credentials are handled with the appropriate level of protection and reliably preserved across environments.

One set of credentials must be defined as default for each connection. Otherwise, DQ evaluation fails and previewing data in the catalog is not possible.

Add write credentials

If you want to export data to this source, add write credentials. Select Add Credentials and follow the instructions in Add credentials.

Next steps

Test and save your connection to complete setup.

Was this page useful?