User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Metastore Data Source Configuration

Metastore Data Source is a plugin that allows you to connect to big data sources such as Cloudera, Hortonworks, AWS EMR, and Databricks and browse them in ONE.

The following properties are provided in the dpe/etc/application.properties file.

Basic settings

Property Data type Description

plugin.metastoredatasource.ataccama.one.cluster.<cluster-id>.launch-properties.

String

Used to customize the configuration of launch properties of a specific cluster. This property can override any already existing launch properties for a specific cluster.

For example, to specify the storage for a single cluster you could use the following configuration:

plugin.metastoredatasource.ataccama.one.cluster.<cluster-id>.launch-properties.mount.url=abfss://container-name@account-name.dfs.core.windows.net/tmp

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.cluster

String

The name or the identifier of the cluster. If the property is not set, the most recently added cluster is used.

Default value: ataccama.

plugin.metastoredatasource.service.partitions.default.limit

Number

The number of stored key-value pairs that are used to identify partitions in a catalog item. These partition identifiers are then passed on to Metadata Management Module (MMM) and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept.

Default value: 10.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.name

String

The name of the cluster. If not specified, the cluster identifier is used instead.

If you use multiple Databricks clusters, this property should match the name of the cluster as specified in your Databricks workspace.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.url

String

The URL where the cluster can be accessed.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class

String

The driver class of the driver, for example, com.cloudera.hive.jdbc41.HS2Driver. Required if multiple drivers are found in the driver classpath (property driver-class-path).

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class-path

String

The classpath of the driver, for example, ${ataccama.path.root}/lib/runtime/jdbc/cloudera/*.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication

String

The type of authentication. Possible values: KERBEROS, SIMPLE, and TOKEN. For SIMPLE and TOKEN authentication types, no other properties are required.

Use SIMPLE with Apache Knox.

The configuration option for Apache Knox is available only from version 13.3.2 and later.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.impersonate

Boolean

If set to true, processes can be started on the cluster using a superuser on behalf of another user. In that case, the keytab provided needs to match the superuser’s credentials.

If the property is not set, the option is enabled by default. Used only for Kerberos authentication.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.kerberos.principal

String

The name of the Kerberos principal. The principal is a unique identifier that Kerberos uses to assign tickets that grant access to different services. The principal typically consist of three elements: the primary, the instance, and the realm, for example, primary/instance@REALM.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.kerberos.keytab

String

Points to the keytab file that stores the Kerberos principal and the corresponding encrypted key that is generated from the principal password.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.databricksUrl

String

The URL to the Databricks cluster.

Used to check whether the cluster is running. Required for Databricks.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.timeout

String

Specifies for how long DPE continues retrying to establish a connection in case the cluster is not running.

Used for Databricks only. Optional.

Default value: 15m. Accepted units: ns (nanoseconds), us (microseconds), ms (milliseconds), s (seconds), m (minutes), h (hours), d (days).

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.disabled

Boolean

Disables the data source. To do so, set the property to true.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-preview-query-pattern

String

Determines if the preview of the data source is possible for the SQL catalog items.

By default, the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}. It is done only for the optimization reasons. It is also safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-import-metadata-query-pattern

String

Imports the metadata from the data source while not reaching the data itself.

By default, the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT 0. It is done only for the optimization reasons.

It is also safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.schema-exclude-pattern

String

A regular expression matching the names of schemas that are excluded from the job result.

Default value: global_temp.

Make sure that correct regular expression syntax is used (see Pattern (Java Platform SE 7). Otherwise DPE fails to run properly.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.mount.point

String

The folder in the Databricks File System that is used as a mount point for the data source folder.

Default value: /mnt/ataccama.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.mount.url

String

The location of the data source folder for storing libraries and files for processing. The folder is then mounted to the directory in the Databricks File System defined in the property mount.point.

Default value: s3a://…​/tmp/dbr.

Authentication properties for Databricks

There are several authentication methods for Databricks (property plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication).

Databricks token

To generate a personal access token at Databricks, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=TOKEN

Integrated credentials

To authenticate using Integrated Credentials, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication=INTEGRATED.

Service principal with a secret

To authenticate using Azure Active Directory Service Principal, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_CLIENT_CREDENTIAL.

In addition, you need to define the following properties:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.tenantId = <tenant ID of your subscription (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientId = <service principal client ID (UUID format)>
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.clientSecret = <service principal client secret>
# The Resource ID of Databricks in Azure. This value is constant ("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d") and is not Databricks cluster-specific
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Managed identities

To authenticate using Azure Active Directory Managed Identities, set the property as follows:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.authType=AAD_MANAGED_IDENTITIES.

In addition, you need to define the following properties:

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.aad.resource = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.tokenPropertyKey = Auth_AccessToken

Authentication

Property Data type Description

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.url

String

The URL of the Databricks regional endpoint, for example, https://northeurope.azuredatabricks.net/.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.authType

String

Determines the type of authentication used with Databricks.

The following authentication types are available:

  • BASIC_AUTH - Uses a username and a password to authenticate.

  • PERSONAL_TOKEN - Uses a token generated at Databricks.

  • AAD_CLIENT_CREDENTIAL - Uses Azure Active Directory Service Principal with a secret.

  • AAD_MANAGED_IDENTITY - Uses Azure Active Directory Managed Identities.

    DPE must be running on an Azure VM within the same tenant and the managed identity must be set up for the VM in Azure AD using authentication=ActiveDirectoryManagedIdentity.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.token

String

An access token for the Databricks platform. This token is used for jobs that are executed through ONE Desktop. Otherwise, the token is provided when creating a connection to Databricks in ONE.

As of the current version, the property is optional. If you configured a metastore data source through ONE and are using the Catalog Item Reader step in ONE Desktop, the token is automatically passed to ONE Desktop without additional configuration or user action.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.user

String

The username for Databricks. The username and password are used instead of an access token.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.password

String

The password for Databricks. The username and password are used instead of an access token.

Azure Active Directory authentication

For AAD authentication types you need to specify plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d, which is the Resource ID of Databricks in Azure.

Azure AD service principal

For authentication using Azure AD Service Principal use the following properties:

Azure AD service principal
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.tenantId=tenantID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientId=clientID
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.clientSecret=clientSecret
plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.launch-properties.dbr.aad.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Azure AD managed identity

For authentication using AAD MSI use the following properties:

Azure AD managed identity
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.authType=AAD_MANAGED_IDENTITY
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.vaultUrl=https://<;your_vault>.vault.azure.net/
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.clientId=<CLIENT_ID>
plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.dbr.aad.keyvault.tenantId=<TENANT_ID>
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.keyvault.resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

You can get the client ID using the following curl command:

curl 'http://169.254.169.254/metadata/identity/oauth2/token?resource=https://vault.azure.net&api-version=2018-02-01' -H "Metadata: true" |jq

Using Apache Knox with Hadoop

This configuration option is available only from version 13.3.2 and later.

Starting from version 13.3.2, you can use Apache Knox when connecting your Hadoop clusters in order to browse their catalog in ONE. Use plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication set to SIMPLE.

Apache Knox metastore configuration
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=Hortonworks_KNOX
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=/opt/../jdbc/<hive_jdbc_drivername>.jar
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=jdbc:hive2://<host>:8443/;ssl=true;transportMode=http;httpPath=gateway/default/hive
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=SIMPLE
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT 1
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Cloudera

Cloudera metastore configuration
plugin.metastoredatasource.ataccama.one.cluster.cloudera.name=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.url=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class=com.cloudera.hive.jdbc41.HS2Driver
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/cloudera/*
plugin.metastoredatasource.ataccama.one.cluster.cloudera.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.cloudera.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Hortonworks

Hortonworks metastore configuration
# If the cluster name is not provided, the cluster identifier is used instead (hortonworks)
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/hortonworks/*
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Databricks

The following authentication method is deprecated.
Databricks metastore configuration
#--------------------------------------- MOUNT CLUSTER AS SOURCE ----------------------------------------------------------
# When working with Databricks, the name of the cluster in Databricks must match the name provided in DPE
plugin.metastoredatasource.ataccama.one.cluster.databricks.name=
plugin.metastoredatasource.ataccama.one.cluster.databricks.url=
plugin.metastoredatasource.ataccama.one.cluster.databricks.databricksUrl=
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class=com.simba.spark.jdbc.Driver
plugin.metastoredatasource.ataccama.one.cluster.databricks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/databricks/*
plugin.metastoredatasource.ataccama.one.cluster.databricks.timeout=15m
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=TOKEN
plugin.metastoredatasource.ataccama.one.cluster.databricks.authentication=INTEGRATED
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_CLIENT_CREDENTIAL
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.authType=AAD_MANAGED_IDENTITIES
plugin.metastoredatasource.ataccama.one.cluster.databricks.tokenPropertyKey=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.tenantId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientId=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.resource=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=
plugin.metastoredatasource.ataccama.one.cluster.databricks.aad.clientSecret=keyvault:SECRET:databrickssecret
plugin.metastoredatasource.ataccama.one.cluster.databricks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.databricks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.databricks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Was this page useful?