Databricks on AWS

This guide explains how to configure Ataccama ONE to work with AWS Databricks clusters for large-scale data processing. You’ll set up the necessary connections between your Ataccama ONE deployment, Databricks cluster, and Amazon S3 storage.

By following this guide, you’ll be able to:

Process large datasets using Databricks compute resources
Leverage Spark processing capabilities within Ataccama ONE
Securely connect to your AWS Databricks workspace
Optionally enable Unity Catalog for enhanced data governance

This configuration is suitable for hybrid deployments, self-managed environments, and Custom Ataccama Cloud deployments that need to process data at scale using AWS Databricks.

This configuration guide covers Spark integration with Databricks on AWS, which requires a Spark processing license.

If you do not have a Spark processing license or prefer a simpler setup, use the standard JDBC connection instead. For JDBC connection setup, see Databricks JDBC.

Step 1: Configure access to metadata

Add the following metastore properties to /opt/ataccama/one/dpe/etc/application.properties:

# Databricks as data source configuration
plugin.metastoredatasource.ataccama.one.cluster.databricks.name={ CLUSTER_NAME }
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class=com.databricks.client.jdbc.Driver
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class-path={ ATACCAMA_ONE_HOME }/dpe/lib/jdbc/DatabricksJDBC42.jar
plugin.metastoredatasource.ataccama.one.cluster.databricks.url={ DBR_JDBC_STRING }
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN
plugin.metastoredatasource.ataccama.one.cluster.databricks.databricksUrl={ WORKSPACE_URL }
plugin.metastoredatasource.ataccama.one.cluster.databricks.timeout=15m
plugin.metastoredatasource.ataccama.one.cluster.databricks.profiling-sample-limit=100000
plugin.metastoredatasource.ataccama.one.cluster.databricks.full-select-query-pattern=SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.preview-query-pattern=SELECT {columns} FROM {table} LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.row-count-query-pattern=SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.sampling-query-pattern=SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-preview-query-pattern=SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-import-metadata-query-pattern=SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
spring.profiles.active=JDBC_DRIVERS,SPARK_DATABRICKS

Replace all placeholder values in { } with your actual configuration values.

Additional Unity Catalog configuration

Unity Catalog provides a unified approach to governing your data and AI assets across Databricks workspaces, along with centralized access control, auditing, lineage, and data discovery capabilities. In Unity Catalog, data assets are organized in three levels, starting from the highest: Catalog > Schema > Table, allowing you to work with multiple catalogs at once.

If you are using Unity Catalog, add the following additional properties to the metastore configuration in /opt/ataccama/one/dpe/etc/application.properties:

# Unity Catalog enabled configuration (add to existing properties above)
plugin.metastoredatasource.ataccama.one.cluster.databricks.unity-catalog-enabled=true
plugin.metastoredatasource.ataccama.one.cluster.databricks.catalog-exclude-pattern=^(SAMPLES)|(samples)|(main)$

Use plugin.metastoredatasource.ataccama.one.cluster.databricks.catalog-exclude-pattern to specify catalogs as technical and prevent them from being imported to ONE. The value should be a regular expression matching the catalog items that you don’t want to import.

We only support Unity Catalog with Dedicated access mode enabled, not with Standard. For more information about what this means for your Databricks configuration, see the official Databricks documentation, article Create clusters & SQL warehouses with Unity Catalog access.