Metastore Data Source Configuration for Databricks
Metastore Data Source is a plugin that allows you to connect to Databricks and browse it in ONE.
The following properties are provided in the dpe/etc/application.properties
file.
Basic settings
Property | Data type | Description | ||
---|---|---|---|---|
|
String |
Used to customize the configuration of launch properties of a specific cluster. This property can override any already existing launch properties for a specific cluster. For example, to specify the storage for a single cluster you could use the following configuration:
|
||
|
String |
The name or the identifier of the cluster. If the property is not set, the most recently added cluster is used. Default value: |
||
|
Number |
The number of stored key-value pairs that are used to identify partitions in a catalog item. These partition identifiers are then passed on to Metadata Management Module (MMM) and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept. Default value: |
||
|
String |
The name of the cluster. If not specified, the cluster identifier is used instead. If you use multiple Databricks clusters, this property should match the name of the cluster as specified in your Databricks workspace. |
||
|
String |
The URL where the cluster can be accessed. |
||
|
String |
The driver class of the driver, for example, |
||
|
String |
The classpath of the driver, for example, |
||
|
String |
The type of authentication.
Possible values: |
||
|
Boolean |
Disables the data source.
To do so, set the property to |
||
|
String |
Determines if the preview of the data source is possible for the SQL catalog items. By default, the pattern is |
||
|
String |
Imports the metadata from the data source while not reaching the data itself. By default, the pattern is It is also safe to leave the pattern in the following way: |
||
|
String |
A regular expression matching the names of schemas that are excluded from the job result. Default value:
|
||
|
String |
The folder in the Databricks File System that is used as a mount point for the data source folder. Default value: |
||
|
String |
The location of the data source folder for storing libraries and files for processing.
The folder is then mounted to the directory in the Databricks File System defined in the property Default value: |
||
|
String |
The URL to the Databricks cluster. Used to check whether the cluster is running. Required for Databricks. |
||
|
String |
Specifies for how long DPE continues retrying to establish a connection in case the cluster is not running. Used for Databricks only. Optional. Default value: |
Authentication properties for Databricks
There are several authentication methods for Databricks (property plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication
).
Databricks token
To generate a personal access token at Databricks, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=TOKEN
Integrated credentials
To authenticate using Integrated Credentials, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=INTEGRATED
Service principal with a secret
To authenticate using Azure Active Directory Service Principal, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_CLIENT_CREDENTIAL
In addition, you need to define the following properties:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.tenantId = <tenant ID of your subscription (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientId = <service principal client ID (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientSecret = <service principal client secret>
# The Resource ID of Databricks in Azure. This value is constant ("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d") and is not Databricks cluster-specific
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
Managed identities
To authenticate using Azure Active Directory Managed Identities, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_MANAGED_IDENTITIES
In addition, you need to define the following properties:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.tokenPropertyKey = Auth_AccessToken
Authentication
Property | Data type | Description |
---|---|---|
|
String |
The URL of the Databricks regional endpoint, for example, |
|
String |
Determines the type of authentication used with Databricks. The following authentication types are available:
|
|
String |
An access token for the Databricks platform. This token is used for jobs that are executed through ONE Desktop. Otherwise, the token is provided when creating a connection to Databricks in ONE. As of the current version, the property is optional. If you configured a metastore data source through ONE and are using the Catalog Item Reader step in ONE Desktop, the token is automatically passed to ONE Desktop without additional configuration or user action. |
|
String |
The username for Databricks. The username and password are used instead of an access token. |
|
String |
The password for Databricks. The username and password are used instead of an access token. |
Azure Active Directory authentication
For AAD authentication types you need to specify plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
, which is the Resource ID of Databricks in Azure.
Azure AD service principal
For authentication using Azure AD Service Principal use the following properties:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.tenantId=tenantID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientId=clientID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientSecret=clientSecret
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
Azure AD managed identity
For authentication using AAD MSI use the following properties:
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.authType=AAD_MANAGED_IDENTITY
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.vaultUrl=https://<;your_vault>.vault.azure.net/
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.clientId=<CLIENT_ID>
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.tenantId=<TENANT_ID>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
You can get the client ID using the following curl command: curl 'http://169.254.169.254/metadata/identity/oauth2/token?resource=https://vault.azure.net&api-version=2018-02-01' -H "Metadata: true" |jq |
Was this page useful?