Metastore Data Source Configuration

Metastore Data Source is a plugin that allows you to connect to big data sources such as Cloudera, Hortonworks, AWS EMR, and Databricks and browse them in ONE.

The following properties are provided in the DPE deployment in the Configuration Service or in the dpe/etc/application.properties file.

Basic settings

Property Data type Description

plugin.metastoredatasource.service.partitions.default.limit

Number

The number of stored key-value pairs that are used to identify partitions in a catalog item. These partition identifiers are then passed on to MMM and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept. Default value: 10.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.name

String

The name of the cluster. If not specified, the cluster identifier is used instead.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.url

String

The URL where the cluster can be accessed.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class

String

The driver class of the driver, for example, com.cloudera.hive.jdbc41.HS2Driver. Required if multiple drivers are found in the driver classpath (property driver-class-path).

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class-path

String

The classpath of the driver, for example, ${ataccama.path.root}/lib/runtime/jdbc/cloudera/*.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication

String

The type of authentication. Valid values: KERBEROS, SIMPLE, and TOKEN. For SIMPLE and TOKEN authentication types, no other properties are required. Use SIMPLE with Apache Knox.

The configuration option for Apache Knox is available only from version 13.3.2 and later.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.impersonate

Boolean

If set to true, processes can be started on the cluster using a superuser on behalf of another user. In that case, the keytab provided needs to match the superuser’s credentials. If the property is not set, the option is enabled by default. Used only for Kerberos authentication.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.kerberos.principal

String

The name of the Kerberos principal. The principal is a unique identifier that Kerberos uses to assign tickets that grant access to different services. The principal typically consist of three elements: the primary, the instance, and the realm, for example, primary/instance@REALM.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.kerberos.keytab

String

Points to the keytab file that stores the Kerberos principal and the corresponding encrypted key that is generated from the principal password.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.databricksUrl

String

The URL to the Databricks cluster. Used to check whether the cluster is running. Required for Databricks.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.timeout

String

Specifies for how long DPE continues retrying to establish a connection in case the cluster is not running. Used for Databricks only. Optional.

Default value: 15m. Accepted units: ns (nanoseconds), us (microseconds), ms (milliseconds), s (seconds), m (minutes), h (hours), d (days).

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.disabled

Boolean

Disables the data source. To do so, set the property to true.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-preview-query-pattern

String

Determines if the preview of the data source is possible for the SQL Catalog Items. By default the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}. It is done only for optimization reasons. But it is safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-import-metadata-query-pattern

String

Imports the metadata from the data source while not reaching the data itself. By default the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT 0. It is done only for the optimization reasons. But it is safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.schema-exclude-pattern

String

A regular expression matching the names of schemas that are excluded from the job result.

Default value: global_temp.

Make sure that correct regular expression syntax is used, see Pattern (Java Platform SE 7). Otherwise, DPE fails to run properly.

Authentication Properties for Databricks

There are several methods how you can authenticate Databricks. plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication can be set to the following authentication methods:

Databricks token

plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN

Generate the personal access token at Databricks.

Integrated Credentials

plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=INTEGRATED

Authenticate using Integrated Credentials.

Service Principal with a Secret

`plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_CLIENT_CREDENTIAL `

For authentication using Azure Active Directory Service Principal you need to add the following properties:

plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.tenantId = <tenant ID of your subscription (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientId = <service principal client ID (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret = <service principal client secret>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

The plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource property is a Resource ID of Databricks in Azure. This value is constant (2ff814a6-3304-4ab8-85cb-cd0e6f879c1d) and is not Databricks cluster specific.

Using Apache Knox with Hadoop

This configuration option is available only from version 13.3.2 and later.

Starting from version 13.3.2, you can use Apache Knox when connecting your Hadoop clusters in order to browse their catalog in ONE. Use plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication set to SIMPLE.

Apache Knox metastore configuration

plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=Hortonworks_KNOX
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=/opt/../jdbc/<hive_jdbc_drivername>.jar
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=jdbc:hive2://<host>:8443/;ssl=true;transportMode=http;httpPath=gateway/default/hive
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=SIMPLE
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT 1
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Cloudera

Cloudera metastore configuration

plugin.metastoredatasource.ataccama.one.cluster.cloudera.name=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.url=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class=com.cloudera.hive.jdbc41.HS2Driver
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/cloudera/*
plugin.metastoredatasource.ataccama.one.cluster.cloudera.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.cloudera.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Hortonworks

Hortonworks metastore configuration

# If the cluster name is not provided, the cluster identifier is used instead (hortonworks)
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/hortonworks/*
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Databricks

The following authentication method is deprecated.

Databricks metastore configuration

#--------------------------------------- MOUNT CLUSTER AS SOURCE ----------------------------------------------------------
# When working with Databricks, the name of the cluster in Databricks must match the name provided in DPE
plugin.metastoredatasource.ataccama.one.cluster.databricks.name=
plugin.metastoredatasource.ataccama.one.cluster.databricks.url=
plugin.metastoredatasource.ataccama.one.cluster.databricks.databricksUrl=
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class=com.simba.spark.jdbc.Driver
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/databricks/*
plugin.metastoredatasource.ataccama.one.cluster.databricks.timeout=15m
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=INTEGRATED
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_CLIENT_CREDENTIAL
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_MANAGED_IDENTITIES
plugin.metastoredatasource.ataccama.one.cluster.databricks.tokenPropertyKey=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.tenantId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=keyvault:SECRET:databrickssecret
plugin.metastoredatasource.ataccama.one.cluster.databricks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Was this page useful?