User Community Service Desk Downloads

Metastore Data Source Configuration for Hadoop

Metastore Data Source is a plugin that allows you to connect to big data sources such as Cloudera, Hortonworks, and AWS EMR and browse them in ONE.

The following properties are provided in the dpe/etc/application.properties file.

Basic settings

Property Data type Description

plugin.metastoredatasource.ataccama.one.cluster.<cluster-id>.launch-properties.

String

Used to customize the configuration of launch properties of a specific cluster. This property can override any already existing launch properties for a specific cluster.

For example, to specify the storage for a single cluster you could use the following configuration:

plugin.metastoredatasource.ataccama.one.cluster.<cluster-id>.launch-properties.mount.url=abfss://container-name@account-name.dfs.core.windows.net/tmp

plugin.metastoredatasource.service.partitions.default.limit

Number

The number of stored key-value pairs that are used to identify partitions in a catalog item. These partition identifiers are then passed on to Metadata Management Module (MMM) and stored there. If the total number of partitions exceeds the value set in this parameter, the first n values from the list are kept.

Default value: 10.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.name

String

The name of the cluster. If not specified, the cluster identifier is used instead.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.url

String

The URL where the cluster can be accessed.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class

String

The driver class of the driver, for example, com.cloudera.hive.jdbc41.HS2Driver. Required if multiple drivers are found in the driver classpath (property driver-class-path).

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.driver-class-path

String

The classpath of the driver, for example, ${ataccama.path.root}/lib/runtime/jdbc/cloudera/*.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.authentication

String

The type of authentication. Possible values: KERBEROS and SIMPLE. For SIMPLE authentication type, no other properties are required.

Use SIMPLE with Apache Knox.

The configuration option for Apache Knox is available only from version 13.3.2 and later.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.impersonate

Boolean

If set to true, processes can be started on the cluster using a superuser on behalf of another user. In that case, the keytab provided needs to match the superuser’s credentials.

If the property is not set, the option is enabled by default. Used only for Kerberos authentication.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.kerberos.principal

String

The name of the Kerberos principal. The principal is a unique identifier that Kerberos uses to assign tickets that grant access to different services. The principal typically consist of three elements: the primary, the instance, and the realm, for example, primary/instance@REALM.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.kerberos.keytab

String

Points to the keytab file that stores the Kerberos principal and the corresponding encrypted key that is generated from the principal password.

plugin.metastoredatasource.ataccama.one.cluster.<clusterId>.disabled

Boolean

Disables the data source. To do so, set the property to true.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-preview-query-pattern

String

Determines if the preview of the data source is possible for the SQL catalog items.

By default, the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}. It is done only for the optimization reasons. It is also safe to leave the pattern in the following way: {dslQuery}.

plugin.metastoredatasource.ataccama.one.driver.<clusterId>.dsl-query-import-metadata-query-pattern

String

Imports the metadata from the data source while not reaching the data itself.

By default, the pattern is SELECT * FROM ({dslQuery}) dslQuery LIMIT 0. It is done only for the optimization reasons.

It is also safe to leave the pattern in the following way: {dslQuery}.

Using Apache Knox with Hadoop

This configuration option is available only from version 13.3.2 and later.

Starting from version 13.3.2, you can use Apache Knox when connecting your Hadoop clusters in order to browse their catalog in ONE. Use plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication set to SIMPLE.

Apache Knox metastore configuration
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=Hortonworks_KNOX
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=/opt/../jdbc/<hive_jdbc_drivername>.jar
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=jdbc:hive2://<host>:8443/;ssl=true;transportMode=http;httpPath=gateway/default/hive
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=SIMPLE
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.full-select-query-pattern = SELECT {columns} FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.preview-query-pattern = SELECT {columns} FROM {table} LIMIT 1
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.row-count-query-pattern = SELECT COUNT(*) FROM {table}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.sampling-query-pattern = SELECT {columns} FROM {table} LIMIT {limit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Cloudera

Cloudera metastore configuration
plugin.metastoredatasource.ataccama.one.cluster.cloudera.name=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.url=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class=com.cloudera.hive.jdbc41.HS2Driver
plugin.metastoredatasource.ataccama.one.cluster.cloudera.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/cloudera/*
plugin.metastoredatasource.ataccama.one.cluster.cloudera.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.cloudera.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.cloudera.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.cloudera.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Hortonworks

Hortonworks metastore configuration
# If the cluster name is not provided, the cluster identifier is used instead (hortonworks)
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.name=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.url=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class=org.apache.hive.jdbc.HiveDriver
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.driver-class-path=${ataccama.path.root}/lib/runtime/jdbc/hortonworks/*
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.authentication=KERBEROS
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.impersonate=true
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.principal=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.kerberos.keytab=
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.disabled=false
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-preview-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT {previewLimit}
plugin.metastoredatasource.ataccama.one.cluster.hortonworks.dsl-query-import-metadata-query-pattern = SELECT * FROM ({dslQuery}) dslQuery LIMIT 0

Was this page useful?