DPE Configuration
In on-premise deployments, the following properties configure Data Processing Engine (DPE) and are provided either in the Configuration Service or in the dpe/etc/application.properties
file.
In addition, the following properties can be specified for DPE as well:
Basic settings
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables debug logging mode.
If set to Default value: |
|
String |
Used to send information to Data Processing Manager (DPM) about the location of DPE. If DPM cannot reach DPE by the machine hostname, this property overrides that hostname. If not set, the hostname is determined by trying to resolve the DNS records for the Default value: |
|
Number |
Used to send information to DPM about the location of DPE. If DPM cannot reach DPE by the gRPC port, this property overrides that port. Default value: |
|
Number |
The HTTP server port for DPE. Default value: |
|
String |
A comma-separated list of environments in which this instance of DPE can be used. In the current release, this only affects user restrictions when accessing the file system data sources. |
|
String |
A meaningful name for DPE that is used in DPM Admin Console. This is especially useful in firewall-friendly mode as it makes it easier to identify DPE instances. We recommend using alphanumeric characters without spaces. If not set, the default value matches the hostname or the URL and the port of the DPE server.
In firewall-friendly mode, the default value is |
Keycloak authentication
Properties | Data type | Description |
---|---|---|
|
String |
The URL of the server where Keycloak is running. Default value: |
|
String |
The name of the Keycloak realm. Default value: |
|
String |
The client identifier used to verify the admin user’s authorization token. Default value: |
|
String |
The secret key of the client identifier for the admin account. Secret keys can be generated using Keycloak. Default value: |
|
String |
The client identifier. Used to verify a user’s authorization token and to log in a user. Default value: |
|
String |
The secret key of the client. Secret keys can be generated using Keycloak. Default value: |
|
String |
Specifies the issuer of the JWT token. Typically, Keycloak uses the URL of the realm as the token issuer. Default value: |
|
String |
The type of client token authentication.
Possible values: Default value: |
|
String |
Points to the keystore file used for |
|
String |
The type of the keystore used for Default value: |
|
String |
The password of the keystore used for |
|
String |
The private key name specified in the keystore used for The default value is the client identifier. |
|
String |
The password for the private key. Used if the private key is encrypted. The default value is the keystore password. |
|
String |
Specifies for how long the JWT token used for authentication in Keycloak remains valid.
Used for Default value: |
gRPC Server
General settings
Property | Data type | Description | ||
---|---|---|---|---|
|
Number |
The port where the gRPC server is running. Default value: |
||
|
String |
Limits the size of messages that the gRPC server can process. The message size needs to fit in the working memory. Default value: |
||
|
Number |
The gRPC server request executor core pool size. Make sure this value is sufficient for the usual traffic in order to avoid creating additional connections because the current version of the gRPC library has the keep-alive for threads set to 0. Default value:
|
||
|
Number |
The gRPC server request executor max pool size. The queue length of executor is unlimited by default, and therefore the maximum pool size is effectively ignored. However, the value should be equal to or higher than the core pool size. Default value:
|
Authentication
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables basic authentication on the gRPC Server. Default value: |
|
Boolean |
Enables bearer authentication on the gRPC Server. Default value: |
|
Boolean |
Enables internal JWT token authentication on the gRPC Server. Default value: |
|
Boolean |
Enables mTLS authentication on the gRPC Server. Default value: |
TLS/mTLS
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables TLS authentication on the gRPC server.
When set to This property can be set for each DPE. Default value: |
|
String |
Defines whether mutual TLS authentication is enabled.
Possible values: When set to Disabled by default. |
|
Boolean |
Specifies whether the server allows all TLS connection attempts. Used if mTLS is enabled. |
|
String |
The full path to the TLS certificate, for example, |
|
String |
The full path to the private key of the certificate, for example, |
|
String |
The full path to the public certificate of the root certificate authority, for example, |
|
String |
The domain name of the generated server certificate. |
|
String |
The full path to the keystore containing private and public key certificates that are used by the gRPC server.
This property has a higher priority compared to |
|
String |
The type of keystore.
Possible values: |
|
String |
The password for the keystore. Used if the keystore is encrypted. |
|
String |
The private key name specified in the provided keystore. |
|
String |
The password for the private key of the gRPC server. Used if the private key is encrypted. If the private key is not set, the password must be the same for all the items in the keystore. |
gRPC Client
General settings
Property | Data type | Description |
---|---|---|
|
String |
Limits the size of messages that the gRPC client can process. Default value: |
TLS/mTLS
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables TLS authentication when communicating with the gRPC server. It also ensures that the communication with the server over gRPC is secure (encrypted instead of in plaintext) and guarantees the integrity of messages. Default value: |
|
Boolean |
Enables mutual TLS authentication between the server and the client. Default value: |
|
Boolean |
Specifies whether the gRPC client should verify the certificate of the server with which they communicate. Used if mTLS is enabled. |
|
String |
The full path to the TLS certificate, for example, |
|
String |
The full path to the private key of the certificate, for example, |
|
String |
The full path to the public certificate of the root certificate authority, for example, |
|
String |
The full path to a truststore file that contains public keys and certificates against which the client verifies the certificates from the server, for example, |
|
String |
The password for the truststore. Used if the truststore is encrypted. |
|
String |
The type of truststore.
Possible values: |
DPM connection
Property | Data type | Description |
---|---|---|
|
Number |
Defines how often DPE checks the connection to DPM. If there is no reply from DPM, DPE tries to register again with DPM. Expressed in milliseconds. Default value: |
|
String |
The name of the gRPC channel that DPE uses to communicate with DPM. Default value: |
|
String |
The IP address or the URL of the server where DPM is running. Default value: |
|
Number |
The port where the gRPC server is running. Default value: |
|
Boolean |
Enables TLS authentication when communicating with the DPM gRPC server. It also ensures that the communication with the server over gRPC is secure (encrypted instead of in plaintext) and guarantees the integrity of messages. Default value: |
|
String |
Defines whether mutual TLS authentication is enabled.
Possible values: When set to Disabled by default. |
|
Boolean |
Specifies whether the gRPC client should verify the certificate of the server with which they communicate. Used if mTLS is enabled. |
|
String |
The full path to the TLS certificate, for example, |
|
String |
The full path to the private key of the certificate, for example, |
|
String |
The full path to the public certificate of the root certificate authority, for example, |
|
String |
The full path to a truststore file that contains public keys and certificates against which the client verifies the certificates from the server, for example, |
|
String |
The password for the truststore. Used if the truststore is encrypted. |
|
String |
The type of truststore.
Possible values: |
Communication mode between DPE and DPM
Starting from version 13.3.1, it is possible to enable communication over bidirectional gRPC stream between DPE and DPM. When this firewall-friendly mode is configured, the follow-up communication from DPM to DPE, such as browsing queries or submitting jobs, does not require opening DPE’s inbound ports to the outside world.
In cases when firewall-friendly mode cannot be configured, the TLS security can be set for all DPE by setting the DPM property ataccama.one.dpm.registry.enforce-tls
or by enabling TLS security on selected DPE instances.
We recommend checking the desired security level if any of your DPE instances communicate with DPM via internet.
Property | Data type | Description | ||
---|---|---|---|---|
|
String |
Defines how DPE connects to DPM. The following options are available:
Default value: |
||
|
String |
Defines how long DPE waits for the in-process server shutdown before shutting it down forcefully.
Used only if Default value: |
||
|
String |
Sets the maximum period of time that DPE waits for a new request from DPM before it attempts to register again.
Used only if Default value: |
MDM connection
Property | Data type | Description |
---|---|---|
|
Boolean |
If set to Default value: |
|
Boolean |
Enables TLS authentication when communicating with MDM. |
|
String |
The full path to the truststore, for example, |
|
String |
The password for the truststore. |
Plugins and JDBC drivers
Property | Data type | Description |
---|---|---|
|
String |
The location of the plugins folder. Default value: |
|
String |
Points to the folder containing the JDBC drivers used by DPE. Default value: |
|
String |
The connection entity name used in the metadata model (MMD) for local file systems. |
|
String |
The connection entity name used in the metadata model (MMD) for ONE MDM data source. |
|
String |
The connection entity name used in the metadata model (MMD) for ONE RDM data source. |
|
String |
The connection entity name used in the metadata model (MMD) for S3 data source. |
Snowflake query pushdown processing
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables automatic upload of user-defined functions to Snowflake. Default value: |
|
Number |
Specifies how many keys are uploaded in a batch during lookup upload. Default value: |
|
Number |
Specifies the maximum number of keys when uploading lookup tables. Larger lookup tables are inserted into a temporary table. Default value: |
DataConnect plugin
The DataConnect plugin is used to query metadata in a particular data source. The retrieved information can be cached.
Property | Data type | Description |
---|---|---|
|
Number |
The maximum number of threads that can be dedicated to closing the clients that have been evicted. Default value: |
|
Boolean |
If set to Default value: |
|
Number |
The maximum size of the DataConnect cache for all data sources. The value refers to the number of cached client instances, where each instance corresponds to a combined definition of a data source and one of its credentials. To disable caching, set the value to Default value: |
|
Number |
The maximum number of browse query results that are cached for each DataSource client. Default value: |
|
String |
Specifies for how long the entries are stored in the cache. If a user browses the data source again before this period expires, the cache expiration is extended for the same amount of time. Default value: |
|
String |
Defines for how long the DataSource client cache stores items for browsing.
Default value: |
File system
Property | Data type | Description | ||
---|---|---|---|---|
|
String |
The root path of the default mounted file system. It is possible to set up multiple file systems in the same DPE. These folders can be used for profiling and browsing data in MMM. To add another file system, replace the name of the file system (
|
||
|
String |
Restricts which roles are allowed to work with a particular file system.
To set limitations for another file system, make sure to provide the correct file system name and the environment name, for example, |
||
|
String |
The sample file size for items loaded from local file systems.
Used to detect the content type, such as Default value: |
Persistent storage
Persistent storage is intended to work as DPE’s internal storage. This means that it should not store any user data or, in general, use a database with user data.
There are two types of persistent storage: the default embedded database and a custom data source.
Using a custom data source instead of the embedded one is optional and requires further configuration.
The driver required for the custom data source must be manually added to the lib
folder with other database drivers.
Property | Data type | Description |
---|---|---|
|
String |
The type of data source for persistent storage.
If you are using the default embedded database, set the value to Default value: |
|
String |
Points to the folder storing the persisted data. Persisted data includes all application data, files for processing, and processing results. Default value: |
Custom data source
Property | Data type | Description |
---|---|---|
|
String |
The JDBC driver class name for the custom data source for persistent storage, for example, |
|
String |
A JDBC connection string pointing to the custom data source for persistent storage, for example, |
|
String |
The username for the custom data source for persistent storage. Default value: |
|
String |
The password for the custom data source persistent storage. Default value: |
Executor
Property | Data type | Description | ||
---|---|---|---|---|
|
String |
The classpath delimiter used for libraries for local jobs. Default value: |
||
|
String |
The libraries needed for local jobs.
Jobs are stored as Starting from 13.4.0, all JDBC libraries need to be specified here in order to use the You can specify:
Default value: |
||
|
String |
The directory that holds temporary work files for local jobs.
A folder is created for each job within this directory, with the name containing the If not specified, the system environment variable After each job finishes, the corresponding work files are automatically deleted from this directory.
|
||
|
String |
References the script for customizing how local jobs are launched.
For example, you can change which shell script is used to start the job.
The script is located in the The default value for Linux is
|
||
|
String |
(Optional) Excludes certain Java libraries and drivers that are in the same classpath but are not needed for processing. Each library needs to be prefixed by an exclamation mark ( Example: |
||
|
String |
Points to the location of Ataccama licenses needed for local jobs.
The property should be configured only if licenses are stored outside of the standard locations, that is the home directory of the user and the folder Default value: |
||
|
String |
Sets the The runtime is compatible with Java 8 or later. |
||
|
String |
Configures any environment variable for running local jobs. To set a custom environment variable, provide the name of the variable instead of the placeholder |
||
plugin.executor-launch-model.ataccama.one.launch-type-properties.LOCAL.job-specific-system-properties.allowed-keys plugin.executor-launch-model.ataccama.one.launch-type-properties.SPARK.job-specific-system-properties.allowed-keys |
String |
A comma-separated list of keys identifying the system properties that can be set when submitting a job, which makes their values specific for a particular job.
These job-specific system properties are then passed to each spawned runtime JVM associated with the identified launch type ( |
||
|
Boolean |
If set to Default value: |
||
|
String |
Specifies the time period after which job results are deleted from DPE, starting from when the job finishes. Typically, the property serves as a backup in case job files could not be deleted after the job was completed. This option applies regardless of how the property Default value: |
||
|
Number |
Defines how often DPM checks for recently completed jobs in DPE. Expressed in milliseconds. Default value: |
||
|
Number |
The number of old jobs to be cleaned at once. Default value: |
||
|
Number |
The maximum duration of each job-cleaning run.
This property must be set with respect to Default value: |
||
|
Number |
The maximum number of processes or threads that can be run in parallel. Default value: |
||
|
Boolean |
Enables notifications over gRPC from the ONE runtime server for runtime jobs. Default value: |
||
|
Boolean |
Enables notifications over gRPC from the ONE runtime server for data quality monitoring jobs. Default value: |
||
|
String |
A regular expression matching the names of environment variables set for DPE that can be accessed by child processes spawned for runtime jobs. Default value: |
||
|
Number |
The required number of evaluation threads. To set this, use
The variable defines the number of models which are processed in parallel in the DQ engine. Default value: |
Shutdown
Property | Data type | Description |
---|---|---|
|
String |
The type of application shutdown. If set to Default value: |
|
String |
Defines how long a shutdown phase can last. After this time expires, the application shuts down regardless of any active requests. Default value: |
|
Boolean |
If set to The waiting period is defined through the property |
|
String |
How long the application waits for running jobs to complete before shutting down gracefully. Once the timeout is reached, any remaining jobs are canceled. Should be shorter than the value set in Default value: |
Was this page useful?