DPM Configuration
In on-premise deployments, the following properties configure Data Processing Module (DPM) and are provided in the dpm/etc/application.properties
file.
In addition, the following properties can be specified for DPM as well:
Keycloak authentication
Properties | Data type | Description |
---|---|---|
|
String |
A helper property used for constructing other Keycloak authentication properties. Default value: |
|
String |
The URL of the server where Keycloak is running. Default value: |
|
String |
The name of the Keycloak realm. Default value: |
|
String |
The client identifier. Used to verify a user’s authorization token and to log in a user. Default value: |
|
String |
The secret key of the client. Secret keys can be generated using Keycloak. Default value: |
|
String |
Specifies the issuer of the JWT token. Typically, Keycloak uses the URL of the realm as the token issuer. Default value: |
|
Boolean |
Enables Keycloak admin access. |
gRPC Server
General settings
Property | Data type | Description |
---|---|---|
|
Number |
The port where the gRPC server is running. Default value: |
|
String |
Limits the size of messages that the gRPC server can process. The message size needs to fit in the working memory. Default value: |
|
Number |
The gRPC server request executor core pool size. Make sure this value is sufficient for the usual traffic in order to avoid creating additional connections because the current version of the gRPC library has the keep-alive for threads set to 0. Default value: |
|
Number |
The gRPC server request executor max pool size. The queue length of executor is unlimited by default, and therefore the maximum pool size is effectively ignored. However, the value should be equal to or higher than the core pool size. Default value: |
Authentication
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables basic authentication on the gRPC Server. Default value: |
|
Boolean |
Enables bearer authentication on the gRPC Server. Default value: |
|
Boolean |
Enables internal JWT token authentication on the gRPC Server. Default value: |
|
Boolean |
Enables mTLS authentication on the gRPC Server. Default value: |
TLS/mTLS
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables TLS authentication on the gRPC server. Default value: |
|
String |
Defines whether mutual TLS authentication is enabled.
Possible values: When set to Disabled by default. |
|
String |
The full path to the TLS certificate, for example, |
|
String |
The full path to the private key of the certificate, for example, |
|
String |
The full path to the public certificate of the root certificate authority, for example, |
gRPC Client
General settings
Property | Data type | Description |
---|---|---|
|
String |
Limits the size of messages that the gRPC client can process. Default value: |
|
Number |
The default gRPC client request executor core pool size (intended to be applied only to the dynamically created DPE channels).
The pools must be set so that they can accommodate all requests to a single DPE (job submitting, status checks, The current version of the gRPC library requires a sufficient number of threads to ensure that the status checks do not wait for a thread in the queue. If you observe DPE spontaneously disconnecting as a result of timed-out status checks, increase the pool size. Default value: |
|
Number |
The default gRPC client request executor maximum pool size (intended to be applied only to the dynamically created DPE channels). The queue length of the executor is unlimited by default, and therefore the maximum pool size is effectively ignored. However, make sure that this value is equal to or higher than the core pool size. Default value: |
TLS/mTLS
Starting from 13.5.0, each DPE can actively request TLS-secured communication with DPM.
In order to allow only TLS-secured connections with all DPE instances, use the property ataccama.one.dpm.registry.enforce-tls=true
.
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables TLS authentication when communicating with the gRPC server. It also ensures that the communication with the server over gRPC is secure (encrypted instead of in plaintext) and guarantees the integrity of messages. Default value: |
|
Boolean |
Enables mutual TLS authentication between the server and the client. Default value: |
|
String |
The full path to the TLS certificate, for example, |
|
String |
The full path to the private key of the certificate, for example, |
|
String |
The full path to the public certificate of the root certificate authority, for example, |
MMM Connection
The following properties are used to configure gRPC channels through which DPM connects to a Metadata Management Module (MMM) instance in order to retrieve information needed for executing ONE plans. This is typically data for MMM catalog items that are used in the executed plans.
Property | Data type | Description |
---|---|---|
|
String |
The hostname of the machine where MMM is running. Default value: |
|
Number |
The gRPC port where MMM is running. Default value: |
|
Boolean |
Enables the HTTP connection between DPM and MMM. Default value: |
|
Number |
The HTTP port where MMM is running. Default value: |
MDM Connection
Property | Data type | Description |
---|---|---|
|
String |
The port of the gRPC interface that is used by MDM. Default value: |
DPM settings
Property | Data type | Description | ||
---|---|---|---|---|
|
Number |
The number of threads used to check the status of DPE. Default value: |
||
|
String |
Defines how often DPM checks the status of a connected DPE. Default value: |
||
|
String |
Specifies the initial interval duration for checking the status of a disconnected DPE. DPM performs the first check immediately after it detects a disconnected DPE. If the connection remains disrupted, subsequent checks are done based on this interval. Default value: |
||
|
Number |
A multiplier used to increase the interval between subsequent checks performed on a disconnected DPE. The interval becomes longer after each unsuccessful try until the maximum interval length is reached. Default value: |
||
|
String |
Determines the maximum interval duration for checking the status of a disconnected DPE. The interval multiplier cannot increase the interval duration past this value. Default value: |
||
|
Number |
Configures after which period a gRPC request times out when checking the engine status. Expressed in milliseconds. Default value: |
||
|
String |
Sets the threshold after which a gRPC channel provider associated with a registered DPE can be safely disposed of. This helps minimize the unnecessary use of DPM resources. Default value: |
||
|
Number |
Specifies how often DPM’s internal scheduler is run. During each run, the scheduler checks when each DPE has last been seen, and, if that period exceeds the value set in Default value: |
||
|
String |
Defines after which period a DPE is automatically inactivated. Inactivate DPEs are not allocated any jobs, however, they remain registered and their information is still persisted in DPM. Default value: |
||
|
String |
Defines how long a DPE must remain inactive before it can be removed through a GraphQL request (mutation Default value: |
||
|
Number |
Defines how often the scheduler automatically deletes inactive DPEs. Expressed in milliseconds. Default value: |
||
|
String |
Defines how long a DPE must remain inactive before it is automatically deleted.
Should be higher than Default value: |
||
|
String |
Defines how long a DPE must remain inactive before it can be deleted through a GraphQL request (mutation Default value: |
||
|
Number |
Determines how often Default value: |
||
|
Boolean |
This optional property enforces TLS-level security on communication of DPM and all connected DPEs.
If set to Default value: |
||
|
Number |
Specifies how many times DPM retries to place a lock on the database in order to complete its migration. If the lock is not placed by the time the count is reached, migration fails. DPM retries in 1s intervals.
To retry indefinitely, set to Default value: |
||
|
String |
A comma-separated list of connection types that are shown when configuring a new data source connection in ONE Web Application. Example: If this property is specified, only the connection types listed in the property value are available. Otherwise, all connection types are available. By default, this property is not set.
|
GraphQL and DPM Admin Console
Property | Data type | Description |
---|---|---|
|
Number |
The HTTP server port for GraphQL API and monitoring endpoints. This is also where the DPM Admin Console user interface is available. Default value: |
|
String |
A list of files containing data that should not be made available to non-power users. The files can still be accessed in DPM Admin Console but are not downloaded to ONE Desktop. Default value: |
|
Number |
Defines the maximum number of recent jobs listed in DPM Admin Console. Default value: |
|
String |
A comma-separated list of launch model files that are removed from the listing in DPM Admin Console, in addition to the already streamed files, such as logs. By default, job properties are not shown as they are included in the submit details and might leak sensitive information if unmasked. Default value: |
|
String |
Configures where the GraphQL servlet is exposed. Default value: |
|
String |
Limits access to DPM Admin Console based on the user role. Default value: |
|
String |
A comma-separated list specifying sensitive XML content in keys that is masked when |
|
String |
The masking strategy used to avoid displaying sensitive information. There are two strategies available:
Default value: |
|
String |
The string with which sensitive keys are replaced if the replacement masking strategy is applied. Default value: |
|
String |
A comma-separated list of sensitive keys that should be masked when |
|
Number |
Maximum number of rows returned for log output from DPE. Default value: |
|
String |
The base URL of the DPM Admin Console frontend. Default value: |
|
String |
The URL of the GraphQL endpoint that is used by DPM Admin Console for the user interface. Default value: |
Single sign-on for DPM Admin Console
Property | Data type | Description |
---|---|---|
|
String |
The name of the Keycloak realm used for SSO. Default value: |
|
String |
The base URL where Keycloak is available. Used as a prefix for other SSO URLs. Default value: |
|
String |
The URL where users are redirected to provide authentication credentials. Default value: |
|
String |
The URL used to obtain authentication tokens from Keycloak. Default value: |
|
String |
The URL used for logging users out. Default value: |
|
String |
The client identifier used for verifying user authorization tokens and for logging in. Default value: |
Plugins and JDBC drivers
Property | Data type | Description |
---|---|---|
|
String |
The location of the Default value: |
DQ Evaluation plugin
Property | Data type | Description |
---|---|---|
|
Number |
The disk space allocated to the lookup files cache. Expressed in megabytes. Default value: |
|
Number |
Configures for how long lookup files are cached. Expressed in minutes. Default value: |
|
Number |
The maximum allowed number of filter combinations. Default value: |
Persistent storage
The type of data source used for persistent storage is defined through DPE.
Any missing drivers need to be added to the <ataccama_home>/lib
folder.
Property | Data type | Description |
---|---|---|
|
String |
A JDBC connection string pointing to the persistent storage database. Default value: |
|
String |
The username for the persistent storage database. Default value: |
|
String |
The password for the persistent storage database. Default value: |
Audit
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables auditing. Default value: |
|
String |
A JDBC connection string pointing to the database where audit logs are stored. Default value: |
|
String |
The username for the audit database. Default value: |
|
String |
The password for the audit database. Default value: |
DataConnect plugin
The DataConnect Plugin is used to query metadata in a particular data source. The retrieved information can be cached.
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables prioritizing cached data sources when routing jobs and other queries. Default value: |
|
String |
Sets the timeout for gRPC requests on DPE. Default value: |
Profiling
The executor uses up the available resources in the following way:
-
If the core limit of the thread pool has not been reached, the executor tries to use a free active thread from the pool.
-
If there are no free threads left and the number of active threads corresponds to the pool core limit, the executor keeps adding tasks until the queue is filled.
-
A new thread is added only when the core size and the queue size have been exhausted.
-
When the maximum pool size is also reached, the executor rejects any new tasks.
Property | Data type | Description | ||
---|---|---|---|---|
|
Number |
The maximum number of threads in the thread pool used for executing asynchronous methods in the Profiling plugin. Default value: |
||
|
Number |
The number of threads that the thread pool must contain at all times, including both active and idle threads. This refers to the thread pool for executing asynchronous methods in the Profiling plugin. Default value: |
||
|
Number |
The number of computation threads that the executor queue can hold when executing asynchronous methods in the Profiling plugin. Default value: |
||
|
Boolean |
If set to Default value: |
||
|
Boolean |
If enabled, this property adds a
Default value: |
||
|
Number |
This property can be used to adjust the size of records batch size. Default value: |
||
|
Boolean |
This property can speed up the reading of data from the source. However, it might also increase disk space usage. Default value: |
Data source connection throttling
Starting from version 14.1.0, we recommend defining the connection throttling rules primarily through DPM Admin Console. For instructions, see DPM and DPE Configuration in DPM Admin Console. The properties described in this section now serve only to provide the initial values, not as the primary method for defining the connection throttling rules. |
In order to ensure the maximum number of connections is dedicated to the ONE platform and at the same time catalog browsing is never blocked by DPE jobs, you can now limit connections separately for DPE jobs and data-connect
queries.
Use all three plugin.jdbc-datasource.ataccama.one.dpm.resource-allocation.connections
properties to enable this function and specify the maximum number of connections used for DPE jobs and for data-connect
queries.
All connected DPE instances also need the plugin.jdbcdatasource.ataccama.one.connections.maximum-size
property set to 0
to disable caching of data source clients.
plugin.jdbc-datasource.ataccama.one.dpm.resource-allocation.connections.<name>.pattern=jdbc:oracle://<host:port>/.*
plugin.jdbc-datasource.ataccama.one.dpm.resource-allocation.connections.<name>.max-connections-jobs=4
plugin.jdbc-datasource.ataccama.one.dpm.resource-allocation.connections.<name>.max-connections-data-connect=20
JDBC patterns are logged by DPM if a configuration error occurs. Therefore, we strongly discourage inserting sensitive information into the JDBC matching pattern and using sensitive parts of the connection strings in matching patterns. |
Property | Data type | Description |
---|---|---|
|
String |
A list of regular expressions matching the data source connection strings. Example: |
|
Number |
Set the maximum number of connections that can be used for DPE jobs.
Unused slots are automatically allocated for If set to |
|
Number |
Set the number of connections that can be used for If set to |
|
Number |
(Optional) This property helps DPM to estimate the number of connections allocated per each data source when importing metadata.
Make sure the number set here does not exceed the sum of values set in |
Resource allocation of runtime steps
You can further finetune DPM resource allocation by using two advanced global runtime properties as follows:
Property | Data type | Description |
---|---|---|
|
Number |
Setting this property helps DPM estimate the number of connections needed for execution of one instance of the specified type of ONE Runtime step. Non-threaded applies to processes not affected by parallelism setting that always run in one thread. |
|
Number |
Setting this property helps DPM estimate the number of connections needed for execution of one instance of the specified type of ONE Runtime step. Threaded applies to processes affected by parallelism setting. Such steps can run in the maximum of threads set in parallelism. |
The name of the step corresponds to the class name of the step that is displayed when opening a .plan
file in an XML editor (for example, com.ataccama.dqc.tasks.io.text.write.TextFileWriter
or com.ataccama.dqc.tasks.io.json.call.JsonCall
).
Make sure to replace the dot (.
) in class names by a dash (-
) in property names.
For example, the previously mentioned class names should be com-ataccama-dqc-tasks-io-text-write-TextFileWriter
and com-ataccama-dqc-tasks-io-json-call-JsonCall
respectively.
Make sure that the result of the equation non-threaded + (parallelism * threaded) does not exceed the total number of your DPE job slots.
|
Executor
Starting from 13.4.0, the job-processing
executor is used for job preprocessing, with the job-processing.max-pool-size
property defining the job processing maximum parallelism.
The queue capacity needs to be slightly higher than the maximum pool size.
The new job-submitting
and job-killing
executors are used for the following purposes:
-
job-submitting
: Used for submitting jobs, which was previously done serially and included fetching the whole database of the queued jobs. Jobs are now submitted in parallel (propertyjob-submitting.max-pool-size
) and only the jobs that are waiting to be submitted are fetched from the database. The queue capacity should be slightly higher than the maximum pool size. -
job-killing
: A dedicated thread pool for cancelling jobs with a limited thread capacity (10000
by default). This queue is not persisted in the database.
The default thread pool sizes do not guarantee the best possible performance, which should be achieved through monitoring the system performance under the given settings and adjusting the configuration accordingly. |
Property | Data type | Description |
---|---|---|
|
Number |
The maximum number of threads in the thread pool used for pre-processing jobs and submitting them to the main queue in the Executor plugin. Default value: |
|
Number |
The minimum number of threads used for pre-processing jobs and submitting them to the main queue in the Executor plugin. Default value: |
|
Number |
The maximum number of jobs in the pre-processing queue. Subsequent jobs are retrieved from the database as the queue empties The value should be slightly higher than Default value: |
|
Number |
The maximum number of threads in the thread pool used for post-processing jobs. Default value: |
|
Number |
The minimum number of threads used for post-processing jobs. Default value: |
|
Number |
The maximum number of jobs in the post-processing queue. The value should be slightly higher than job-postprocessing.max-pool-size. Default value: |
|
Number |
The maximum number of threads in the thread pool used for submitting jobs in the Executor plugin. Default value: |
|
Number |
The number of threads that the thread pool used for submitting jobs must contain at all times, including both active and idle threads. Default value: |
|
Number |
The maximum number of jobs in the queue waiting to be submitted. The value should be slightly higher than Default value: |
|
Number |
The maximum number of threads in the thread pool used for canceling jobs in the Executor plugin. Default value: |
|
Number |
The number of threads that the thread pool used for canceling jobs must contain at all times, including both active and idle threads. Default value: |
|
Number |
The maximum number of jobs scheduled to be canceled. Default value: |
|
Number |
The maximum number of threads in the thread pool used for event processing in the Executor plugin. Default value: |
|
Number |
The number of threads that the thread pool must contain at all times, including both active and idle threads. This refers to the thread pool used for event processing in the Executor plugin. Default value: |
|
Number |
The maximum number of computation threads that the executor queue dedicated to event processing in the Executor plugin can hold. If the property is not set, the capacity is practically unlimited. Default value: |
|
Number |
The maximum number of events to be processed in a single thread. Default value: |
|
Number |
The maximum number of threads in the thread pool used for notifying about events in DPM. Default value: |
|
Number |
The number of threads that the thread pool used for notifying about events must contain at all times, including both active and idle threads. Default value: |
|
Number |
The maximum number of computation threads that the executor queue used for notifying about events in DPM can hold. If the property is not set, the capacity is practically unlimited. Default value: |
|
Number |
The number of events for processing each time. Default value: |
|
String |
Specifies how long the event processing holds the lock.
This value should be lower than Default value: |
|
String |
Specifies the frequency of the event processing poll checks for new events while holding the lock. Default value: |
|
String |
Defines for how long job files and processing results are stored in DPM. Default value: |
|
Number |
Specifies how often DPM checks for expired job files and removes them. Expressed in milliseconds. Default value: |
|
Number |
Specifies how many times to retry unlocking a job in case of a database error. Default value: |
|
String |
Specifies how long to wait between retries when unlocking a job. Default value: |
|
String |
Determines for how long a DPE can access a particular database. Each database can be used by only one DPE at a time, which is managed by setting a scheduler lock that stores and tracks status logs in a separate table. After this interval expires, the lock is released and another DPE can connect to the same database. This value needs to be higher than Default value: |
|
Number |
Configures how often the job queue is synced with each instance of DPM. The shared job queue contains jobs that are distributed between multiple DPMs. Jobs are then transferred from DPM to a compatible DPE handled by that DPM. Expressed in milliseconds. Default value: |
|
String |
Specifies for how long a job must remain inactive to be considered lost. Such jobs are then resubmitted if they remain in the pre-processing state, otherwise DPM attempts to cancel them. Default value: |
|
String |
Configures for how long jobs can stay in the event subscriber job cache. Default value: |
|
Number |
The number of jobs that are kept in the event subscriber job cache. Default value: |
|
String |
Sets for how long job input files can be stored before being assigned to a job. Default value: |
|
Number |
Determines how often to check for expired job input files in the job input storage. Expressed in milliseconds. Default value: |
|
String |
Defines how often DPM retrieves the status of queued jobs, including their health information and use of resources in order to avoid queue starvation.
This determines how often the following timeout checks are performed: Default value: |
|
String |
Configures for how long a job can remain in the job queue before being submitted to a DPE.
If set to If the property is not set, jobs are kept in the queue indefinitely. Default value: |
|
String |
Determines for how long a job can be recovered after DPE is disconnected.
When the timeout is reached, the job status is set to If DPM cannot reach DPE for a while, the job status first changes to Default value: |
|
String |
Specifies the maximum amount of time a job can stay in one of the following statuses: Default value: |
|
String |
Specifies the maximum amount of time a job can stay in the Default value: |
|
String |
Specifies the timeout for synchronous gRPC requests on DPE such as previewing tables and counting numbers of rows during pre-processing. Increase this timeout value if jobs fail during pre-processing phases. Default value: |
|
Number |
Specifies the number of attempts when unlocking a job in case of a database error. Default value: |
|
String |
Specifies the waiting time between attempts when unlocking a job in case of a database error. Default value: |
|
Boolean |
Allows resubmitting a job lost in Default value: |
|
Number |
The maximum number of rows that should be fetched from the database when paginating. Default value: |
|
Number |
The number of records that can be displayed on a single page in the shared file system. |
Shutdown
Property | Data type | Description |
---|---|---|
|
String |
The type of application shutdown. If set to Default value: |
|
String |
Defines how long a shutdown phase can last. After this time expires, the application shuts down regardless of any active requests. Default value: |
Global runtime configuration
Starting from 14.1.0, we recommend defining the global runtime configuration primarily through DPM Admin Console. For instructions, see DPM and DPE Configuration in DPM Admin Console. The property described in this section now serves only to provide the initial value, not as the primary method for defining the global runtime configuration. |
For each job, DPM generates a specific runtime configuration or retrieves it from ONE Desktop. However, it is also possible to modify this configuration by providing a global runtime configuration that is then merged with the generated one. The global runtime configuration is supplied to DPM in the form of a base64-encoded string.
If the global runtime configuration requires the use of drivers that are not included in the default installation, you need to add these drivers to the runtime class path.
To do this, include the path to the driver files, such as lib/runtime/jdbc/snowflake/*
, in the plugin.executor-launch-model.ataccama.one.launch-type-properties.launch-type.cp
.runtime parameter.
If you need to include multiple paths in this parameter, use the plugin.executor-launch-model.ataccama.one.launch-type-properties.launch-type.cpdelim
property to specify the delimiter used to separate the paths.
For example, if the plugin.executor-launch-model.ataccama.one.launch-type-properties.launch-type.cpdelim
is set to a semicolon (;
), you would include the paths as follows:
plugin.executor-launch-model.ataccama.one.launch-type-properties.launch-type.cp.runtime=path1;path2;path3;path4
Property | Data type | Description | ||
---|---|---|---|---|
|
String |
The global runtime configuration provided as a base64-encoded string. In XML format, runtime configurations have the following structure: Runtime configuration example - Connecting to MinIO
Runtime configuration example - Connecting to a JDBC data source
When a global runtime configuration is used, all remote executions have access to its content. For instance, if a user can access and edit the relevant post-processing components, they also have full access to the database and the data source based on the credentials provided in the global runtime configuration. For more information, see Importing and Exporting Runtime Configuration. |
Was this page useful?