User Community Service Desk Downloads

Metastore Data Source Configuration for Databricks

Metastore Data Source is a plugin that allows you to connect to Databricks and browse it in ONE.

The following properties are provided in the dpe/etc/application.properties file.

Basic settings

Property Data type Description

plugin.metastoredatasource.ataccama.one.cluster.<cluster-id>.launch-properties.

String

Used to customize the configuration of launch properties of a specific cluster. This property can override any already existing launch properties for a specific cluster.

For example, to specify the storage for a single cluster you could use the following configuration:

plugin.metastoredatasource.ataccama.one.cluster.<cluster-id>.launch-properties.mount.url=abfss://container-name@account-name.dfs.core.windows.net/tmp

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.cluster

String

The name or the identifier of the cluster. If the property is not set, the most recently added cluster is used.

Default value: ataccama.

plugin.metastoredatasource.service.partitions.default.limit

Number

The number of stored key-value pairs that are used to identify partitions in a catalog item. These partition identifiers are then passed on to Metadata Management Module (MMM) and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept.

Default value: 10.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.name

String

The name of the cluster. If not specified, the cluster identifier is used instead.

If you use multiple Databricks clusters, this property should match the name of the cluster as specified in your Databricks workspace.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.url

String

The URL where the cluster can be accessed.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class

String

The driver class of the driver, for example, com.simba.spark.jdbc.Driver. Required if multiple drivers are found in the driver classpath (property driver-class-path).

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class-path

String

The classpath of the driver, for example, ${ataccama.path.root}/lib/runtime/jdbc/databricks/*.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication

String

The type of authentication. Possible values: TOKEN and INTEGRATED. For Databricks, also available authentication types: AAD_CLIENT_CREDENTIAL and AAD_MANAGED_IDENTITIES.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.disabled

Boolean

Disables the data source. To do so, set the property to true.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-preview-query-pattern

String

Determines if the preview of the data source is possible for the SQL catalog items.

By default, the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}. It is done only for the optimization reasons. It is also safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-import-metadata-query-pattern

String

Imports the metadata from the data source while not reaching the data itself.

By default, the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT 0. It is done only for the optimization reasons.

It is also safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.schema-exclude-pattern

String

A regular expression matching the names of schemas that are excluded from the job result.

Default value: global_temp.

Make sure that correct regular expression syntax is used (see Pattern (Java Platform SE 7). Otherwise DPE fails to run properly.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.mount.point

String

The folder in the Databricks File System that is used as a mount point for the data source folder.

Default value: /mnt/ataccama.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.mount.url

String

The location of the data source folder for storing libraries and files for processing. The folder is then mounted to the directory in the Databricks File System defined in the property mount.point.

Default value: s3a://…​/tmp/dbr.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.databricksUrl

String

The URL to the Databricks cluster.

Used to check whether the cluster is running. Required for Databricks.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.timeout

String

Specifies for how long DPE continues retrying to establish a connection in case the cluster is not running.

Used for Databricks only. Optional.

Default value: 15m. Accepted units: ns (nanoseconds), us (microseconds), ms (milliseconds), s (seconds), m (minutes), h (hours), d (days).

Authentication properties for Databricks

There are several authentication methods for Databricks (property plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication).

Databricks token

To generate a personal access token at Databricks, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=TOKEN

Integrated credentials

To authenticate using Integrated Credentials, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=INTEGRATED

Service principal with a secret

To authenticate using Azure Active Directory Service Principal, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_CLIENT_CREDENTIAL

In addition, you need to define the following properties:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.tenantId = <tenant ID of your subscription (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientId = <service principal client ID (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientSecret = <service principal client secret>
# The Resource ID of Databricks in Azure. This value is constant ("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d") and is not Databricks cluster-specific
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Managed identities

To authenticate using Azure Active Directory Managed Identities, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_MANAGED_IDENTITIES

In addition, you need to define the following properties:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.tokenPropertyKey = Auth_AccessToken

Authentication

Property Data type Description

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.url

String

The URL of the Databricks regional endpoint, for example, https://northeurope.azuredatabricks.net/.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.authType

String

Determines the type of authentication used with Databricks.

The following authentication types are available:

  • BASIC_AUTH - Uses a username and a password to authenticate.

  • PERSONAL_TOKEN - Uses a token generated at Databricks.

  • AAD_CLIENT_CREDENTIAL - Uses Azure Active Directory Service Principal with a secret.

  • AAD_MANAGED_IDENTITY - Uses Azure Active Directory Managed Identities.

    DPE must be running on an Azure VM within the same tenant and the managed identity must be set up for the VM in Azure AD using authentication=ActiveDirectoryManagedIdentity.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.token

String

An access token for the Databricks platform. This token is used for jobs that are executed through ONE Desktop. Otherwise, the token is provided when creating a connection to Databricks in ONE.

As of the current version, the property is optional. If you configured a metastore data source through ONE and are using the Catalog Item Reader step in ONE Desktop, the token is automatically passed to ONE Desktop without additional configuration or user action.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.user

String

The username for Databricks. The username and password are used instead of an access token.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.password

String

The password for Databricks. The username and password are used instead of an access token.

Azure Active Directory authentication

For AAD authentication types you need to specify plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d, which is the Resource ID of Databricks in Azure.

Azure AD service principal

For authentication using Azure AD Service Principal use the following properties:

Azure AD service principal
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.tenantId=tenantID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientId=clientID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientSecret=clientSecret
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Azure AD managed identity

For authentication using AAD MSI use the following properties:

Azure AD managed identity
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.authType=AAD_MANAGED_IDENTITY
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.vaultUrl=https://<;your_vault>.vault.azure.net/
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.clientId=<CLIENT_ID>
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.tenantId=<TENANT_ID>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

You can get the client ID using the following curl command:

curl 'http://169.254.169.254/metadata/identity/oauth2/token?resource=https://vault.azure.net&api-version=2018-02-01' -H "Metadata: true" |jq

Was this page useful?