Metastore Data Source Configuration
Metastore Data Source is a plugin that allows you to connect to big data sources such as Cloudera, Hortonworks, AWS EMR, and Databricks and browse them in ONE.
The following properties are provided in the DPE deployment in the Configuration Service User Guide or in the dpe/etc/application.properties
file.
Basic settings
Property | Data type | Description | ||
---|---|---|---|---|
|
Number |
The number of stored key-value pairs that are used to identify partitions in a catalog item.
These partition identifiers are then passed on to MMM and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept.
Default value: |
||
|
String |
The name of the cluster. If not specified, the cluster identifier is used instead. |
||
|
String |
The URL where the cluster can be accessed. |
||
|
String |
The driver class of the driver, for example, |
||
|
String |
The classpath of the driver, for example, |
||
|
String |
The type of authentication.
Valid values:
|
||
|
Boolean |
If set to |
||
|
String |
The name of the Kerberos principal.
The principal is a unique identifier that Kerberos uses to assign tickets that grant access to different services.
The principal typically consist of three elements: the primary, the instance, and the realm, for example, |
||
|
String |
Points to the keytab file that stores the Kerberos principal and the corresponding encrypted key that is generated from the principal password. |
||
|
String |
The URL to the Databricks cluster. Used to check whether the cluster is running. Required for Databricks. |
||
|
String |
Specifies for how long DPE continues retrying to establish a connection in case the cluster is not running. Used for Databricks only. Optional. Default value: |
||
|
Boolean |
Disables the data source.
To do so, set the property to |
||
|
String |
Determines if the preview of the data source is possible for the SQL Catalog Items.
By default the pattern is |
||
|
String |
Imports the metadata from the data source while not reaching the data itself.
By default the pattern is |
||
|
String |
A regular expression matching the names of schemas that are excluded from the job result. Default value:
|
Authentication Properties for Databricks
There are several methods how you can authenticate Databricks.
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication
can be set to the following authentication methods:
Databricks token
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN
Generate the personal access token at Databricks.
Integrated Credentials
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=INTEGRATED
Authenticate using Integrated Credentials.
Service Principal with a Secret
`plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_CLIENT_CREDENTIAL `
For authentication using Azure Active Directory Service Principal you need to add the following properties:
-
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.tenantId = <tenant ID of your subscription (UUID format)>
-
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientId = <service principal client ID (UUID format)>
-
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret = <service principal client secret>
-
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
The plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource
property is a Resource ID of Databricks in Azure.
This value is constant (2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
) and is not Databricks cluster specific.
Using Apache Knox with Hadoop
This configuration option is available only from version 13.3.2 and later. |
Starting from version 13.3.2, you can use Apache Knox when connecting your Hadoop clusters in order to browse their catalog in ONE.
Use plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication
set to SIMPLE
.
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=Hortonworks_KNOX
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=/opt/../jdbc/<hive_jdbc_drivername>.jar
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=jdbc:hive2://<host>:8443/;ssl=true;transportMode=http;httpPath=gateway/default/hive
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=SIMPLE
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT 1
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Cloudera
plugin.metastoredatasource.ataccama.one.cluster.cloudera.name=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.url=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class=com.cloudera.hive.jdbc41.HS2Driver
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/cloudera/*
plugin.metastoredatasource.ataccama.one.cluster.cloudera.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.cloudera.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Hortonworks
# If the cluster name is not provided, the cluster identifier is used instead (hortonworks)
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/hortonworks/*
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Databricks
The following authentication method is deprecated. |
#--------------------------------------- MOUNT CLUSTER AS SOURCE ----------------------------------------------------------
# When working with Databricks, the name of the cluster in Databricks must match the name provided in DPE
plugin.metastoredatasource.ataccama.one.cluster.databricks.name=
plugin.metastoredatasource.ataccama.one.cluster.databricks.url=
plugin.metastoredatasource.ataccama.one.cluster.databricks.databricksUrl=
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class=com.simba.spark.jdbc.Driver
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/databricks/*
plugin.metastoredatasource.ataccama.one.cluster.databricks.timeout=15m
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=INTEGRATED
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_CLIENT_CREDENTIAL
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_MANAGED_IDENTITIES
plugin.metastoredatasource.ataccama.one.cluster.databricks.tokenPropertyKey=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.tenantId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=keyvault:SECRET:databrickssecret
plugin.metastoredatasource.ataccama.one.cluster.databricks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0
Was this page useful?