Metastore Data Source Configuration
Metastore Data Source is a plugin that allows you to connect to big data sources such as Cloudera, Hortonworks, AWS EMR, and Databricks and browse them in ONE.
The following properties are provided in the dpe/etc/application.properties
file.
Basic settings
Property | Data type | Description | ||
---|---|---|---|---|
|
String |
Used to customize the configuration of launch properties of a specific cluster. This property can override any already existing launch properties for a specific cluster. For example, to specify the storage for a single cluster you could use the following configuration:
|
||
|
String |
The name or the identifier of the cluster. If the property is not set, the most recently added cluster is used. Default value: |
||
|
Number |
The number of stored key-value pairs that are used to identify partitions in a catalog item. These partition identifiers are then passed on to Metadata Management Module (MMM) and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept. Default value: |
||
|
String |
The name of the cluster. If not specified, the cluster identifier is used instead. If you use multiple Databricks clusters, this property should match the name of the cluster as specified in your Databricks workspace. |
||
|
String |
The URL where the cluster can be accessed. |
||
|
String |
The driver class of the driver, for example, |
||
|
String |
The classpath of the driver, for example, |
||
|
String |
The type of authentication.
Possible values: Use
|
||
|
Boolean |
If set to If the property is not set, the option is enabled by default. Used only for Kerberos authentication. |
||
|
String |
The name of the Kerberos principal.
The principal is a unique identifier that Kerberos uses to assign tickets that grant access to different services.
The principal typically consist of three elements: the primary, the instance, and the realm, for example, |
||
|
String |
Points to the keytab file that stores the Kerberos principal and the corresponding encrypted key that is generated from the principal password. |
||
|
String |
The URL to the Databricks cluster. Used to check whether the cluster is running. Required for Databricks. |
||
|
String |
Specifies for how long DPE continues retrying to establish a connection in case the cluster is not running. Used for Databricks only. Optional. Default value: |
||
|
Boolean |
Disables the data source.
To do so, set the property to |
||
|
String |
Determines if the preview of the data source is possible for the SQL catalog items. By default, the pattern is |
||
|
String |
Imports the metadata from the data source while not reaching the data itself. By default, the pattern is It is also safe to leave the pattern in the following way: |
||
|
String |
A regular expression matching the names of schemas that are excluded from the job result. Default value:
|
||
|
String |
The folder in the Databricks File System that is used as a mount point for the data source folder. Default value: |
||
|
String |
The location of the data source folder for storing libraries and files for processing.
The folder is then mounted to the directory in the Databricks File System defined in the property Default value: |
Authentication properties for Databricks
There are several authentication methods for Databricks (property plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication
).
Databricks token
To generate a personal access token at Databricks, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=TOKEN
Integrated credentials
To authenticate using Integrated Credentials, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=INTEGRATED.
Service principal with a secret
To authenticate using Azure Active Directory Service Principal, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_CLIENT_CREDENTIAL.
In addition, you need to define the following properties:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.tenantId = <tenant ID of your subscription (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientId = <service principal client ID (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientSecret = <service principal client secret>
# The Resource ID of Databricks in Azure. This value is constant ("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d") and is not Databricks cluster-specific
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
Managed identities
To authenticate using Azure Active Directory Managed Identities, set the property as follows:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_MANAGED_IDENTITIES.
In addition, you need to define the following properties:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.tokenPropertyKey = Auth_AccessToken
Authentication
Property | Data type | Description |
---|---|---|
|
String |
The URL of the Databricks regional endpoint, for example, |
|
String |
Determines the type of authentication used with Databricks. The following authentication types are available:
|
|
String |
An access token for the Databricks platform. This token is used for jobs that are executed through ONE Desktop. Otherwise, the token is provided when creating a connection to Databricks in ONE. As of the current version, the property is optional. If you configured a metastore data source through ONE and are using the Catalog Item Reader step in ONE Desktop, the token is automatically passed to ONE Desktop without additional configuration or user action. |
|
String |
The username for Databricks. The username and password are used instead of an access token. |
|
String |
The password for Databricks. The username and password are used instead of an access token. |
Azure Active Directory authentication
For AAD authentication types you need to specify plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
, which is the Resource ID of Databricks in Azure.
Azure AD service principal
For authentication using Azure AD Service Principal use the following properties:
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.tenantId=tenantID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientId=clientID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientSecret=clientSecret
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
Azure AD managed identity
For authentication using AAD MSI use the following properties:
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.authType=AAD_MANAGED_IDENTITY
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.vaultUrl=https://<;your_vault>.vault.azure.net/
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.clientId=<CLIENT_ID>
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.tenantId=<TENANT_ID>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
You can get the client ID using the following curl command: curl 'http://169.254.169.254/metadata/identity/oauth2/token?resource=https://vault.azure.net&api-version=2018-02-01' -H "Metadata: true" |jq |
Using Apache Knox with Hadoop
This configuration option is available only from version 13.3.2 and later. |
Starting from version 13.3.2, you can use Apache Knox when connecting your Hadoop clusters in order to browse their catalog in ONE.
Use plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication
set to SIMPLE
.
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=Hortonworks_KNOX
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=/opt/../jdbc/<hive_jdbc_drivername>.jar
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=jdbc:hive2://<host>:8443/;ssl=true;transportMode=http;httpPath=gateway/default/hive
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=SIMPLE
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT 1
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Cloudera
plugin.metastoredatasource.ataccama.one.cluster.cloudera.name=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.url=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class=com.cloudera.hive.jdbc41.HS2Driver
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/cloudera/*
plugin.metastoredatasource.ataccama.one.cluster.cloudera.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.cloudera.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Hortonworks
# If the cluster name is not provided, the cluster identifier is used instead (hortonworks)
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/hortonworks/*
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Databricks
The following authentication method is deprecated. |
#--------------------------------------- MOUNT CLUSTER AS SOURCE ----------------------------------------------------------
# When working with Databricks, the name of the cluster in Databricks must match the name provided in DPE
plugin.metastoredatasource.ataccama.one.cluster.databricks.name=
plugin.metastoredatasource.ataccama.one.cluster.databricks.url=
plugin.metastoredatasource.ataccama.one.cluster.databricks.databricksUrl=
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class=com.simba.spark.jdbc.Driver
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/databricks/*
plugin.metastoredatasource.ataccama.one.cluster.databricks.timeout=15m
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=INTEGRATED
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_CLIENT_CREDENTIAL
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_MANAGED_IDENTITIES
plugin.metastoredatasource.ataccama.one.cluster.databricks.tokenPropertyKey=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.tenantId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=keyvault:SECRET:databrickssecret
plugin.metastoredatasource.ataccama.one.cluster.databricks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Was this page useful?