DPM CLI
The DPM CLI is a command-line based integration tool for Data Processing Module (DPM), more specifically Executor, that is intended to assist with submitting and canceling ONE jobs, retrieving a list of jobs and their status information, changing the job priority in the job queue.
The entry point for the tool is bin/start.sh
(bin/start.bat
for Windows) while the configuration is located in etc/application.properties
.
For more information about how to configure the DPM CLI, refer to Configuration reference.
By default, jobs started using the DPM CLI are stored in storage/<jobId>
.
However, a different location can be specified through input parameters.
Server credentials are passed in one of the following ways:
-
As command-line arguments.
-
Through a file.
-
Interactively, by providing them in the terminal when prompted.
Other input files, such as lookups or large CSV files, can be referenced from ONE Object Storage and therefore do not need to be stored on your local machine when submitting.
Basics
The following section describes some commonly used commands and the arguments that they take.
For a full list of commands and arguments, run the command:
$ ./start.sh -h
The guide provides examples for Linux-based OS.
If you are using Windows, use start.bat script instead.
|
In general, the commands are structured as follows: <action_name> <connection_details> <action_params>
.
Submit jobs
Jobs are submitted using the command submitDqc
.
The following sections contain an overview of the arguments that the command takes.
Required arguments include connection details and job input parameters.
Connection details
When specifying connection details, you need to define where DPM is running as well as provide authentication credentials.
Command line argument | Example | Description |
---|---|---|
|
|
The server hostname. Typically, this corresponds to the hostname of the DPM gRPC server. Required. |
|
|
The server port. Typically, this corresponds to the port of the DPM gRPC server. Required. |
|
|
The username for accessing DPM. If not provided, the user is prompted to provide a password. |
|
|
The password for accessing DPM.
If not provided, the password is checked for in the |
|
|
Points to a file containing the password for accessing DPM. If not provided, the user is prompted to provide a password. |
Job input parameters
These parameters supply key information for executing jobs.
Command line argument | Example | Description | ||
---|---|---|---|---|
|
|
Points to the main ONE plan that should be executed. Required. |
||
|
|
Points to the runtime configuration that should be applied. Required. |
||
|
|
Points to additional input files, such as lookups, components, CSV files. To specify multiple files, use the following structure: |
||
|
|
Points to files from ONE Object Storage. Consists of two values, the first one being the path that Executor uses when running the job, the second one a link that refers to the actual location of the file in the Object Storage. The link has the following format: |
||
|
|
Used if the job needs access to a database.
DPM and DPE add this database driver into the runtime configuration, in the Make sure the value entered here matches the filename (case-insensitive) of the database driver on your DPM instance.
|
||
|
|
If using parametrized components, this parameter is used to configure those parameters. Each argument takes two values: a key and a value. |
Additional job parameters
The following parameters are optional and provide more details about jobs or set job priority.
Command line argument | Example | Description |
---|---|---|
|
|
Optional. The name of the job. If not provided, the name of the main ONE plan is used instead. |
|
|
Optional. A description of the job. |
|
|
Optional. Sets the priority of the job in the DPM job queue. The default value is Alternatively, if priority option is not used in
|
|
|
Optional. Sets a DPE label to the DPM task that is then matched against labels assigned to available DPEs. For more information, see [Run DPM job]. |
Local job settings
The following parameters are optional and allow customizing how local jobs are executed.
Command line argument | Example | Description |
---|---|---|
|
|
The location where job logs and results (if zipped) are stored.
Defaults to |
|
|
If set to Alternatively, if
To check the job status, run the command
|
|
|
If set to |
Spark job settings
These arguments are used when running Spark jobs.
Only the launchType
argument is required.
Command line argument | Example | Description |
---|---|---|
|
|
If running a job on the Spark cluster, set the value to |
|
|
The name of the Spark cluster on which the job should be executed. To get a list of available clusters, run the command
|
|
|
Optional. Sets the Spark cluster credentials. Used if the credentials are not provided in the default configuration or if you want to use another set of credentials. If Spark credentials are not provided, the authentication is skipped. |
|
|
Optional. Sets the password for the Spark user. If not provided, the password file is checked instead. |
|
|
Points to a file containing the password for the Spark user. If not provided, the user is prompted to provide a password. |
Mapping paths (advanced use)
DPM CLI lets you specify custom paths, which can be useful when job results are zipped and you prefer having simpler paths in the result.
In this case, the first value is used by the Executor for processing in DPM, while the second one points to the actual file that is uploaded to Executor, for example: -f /executor/uses/this/path.csv /real/path/to/file.csv
.
To be able to map paths, the following arguments are needed:
Command line argument | Example | Description |
---|---|---|
|
|
The absolute path of the working directory from which relative paths are computed.
If not set, it defaults to the current working folder.
If set to Must be defined as an absolute path. |
|
|
For Unix-like paths, if not specified, the system default is used ( If using Windows, this refers to the current drive.
For example, if the current drive is |
Test connectivity
The test job (echo job
) verifies connectivity by using the echo
command, which displays a test message in the terminal to confirm that the connection is working properly.
The following sections contain an overview of the arguments that the command takes.
Job input parameters
These parameters supply key information for executing jobs.
Command line argument | Example | Description |
---|---|---|
|
|
Executes a test job on the server. |
Additional job parameters
The following parameters are optional and provide more details about jobs or set job priority.
Command line argument | Example | Description |
---|---|---|
|
|
Optional. The name of the job. If not provided, the name of the main ONE plan is used instead. |
|
|
Optional. A description of the job. |
|
|
Optional. Sets the priority of the job in the DPM job queue. The default value is Alternatively, if
|
|
|
Optional. Specifies the duration of the job in milliseconds, with a maximum of 1 minute. |
|
|
Optional. Sets the delay for the job before moving to the next phase, in milliseconds, with a maximum of 10 seconds. |
|
|
Optional. Sets the delay for the job after completion, in milliseconds, with a maximum of 10 seconds. |
Examples
Run the following command to display the documentation for the DPM CLI tool in the terminal.
$ ./start.sh
To verify that a connection has been established, run the following command:
$ ./start.sh echo -h localhost -p 8531 -u admin -pwd admin -e "Hello World!"
The following command retrieves a list of available data source configurations. In this case, the credentials are interactively provided through the terminal.
$ ./start.sh datasourceConfigList -h localhost -p 8531
To select which data source driver is used to execute the job, provide the -md
argument:
$ ./start.sh submitDqc -h localhost -p 8531 --user admin --password admin -mp /home/<username>/ataccama/workspace/project/config.plan -rt /home/<username>/ataccama/workspace/project/runtime.runtimeConfig -z true -md POSTGRESQL
You can specify custom paths for input files, as described in Mapping paths (advanced use). To define two paths, one for the Executor and another one that provides the actual file location, use the following command structure:
$ ./start.sh submitDqc -h localhost -p 8531 --passwordFile password.txt -mp plan.plan /home/<username>/builds/ataccama/workspace/project/config2.plan -rt runtime.runtimeConfig /home/<username>/builds/ataccama/workspace/project/runtime2.runtimeConfig -f in.csv /home/<username>/builds/ataccama/workspace/project/in.csv --workingDir / --rootDir / -z true
If you need to cancel a running job, the following command is used:
$ ./start.sh killJob -h localhost -p 8531 --jobId 28994ec9-f0e3-4331-93d6-49087e3be656
Configuration reference
The following properties are configured in the <dpm-cli>/etc/application.properties
file.
Logging configuration
Available values for all logging properties are INFO
, WARN
, ERROR
, DEBUG
, OFF
.
Property | Data type | Description |
---|---|---|
|
String ` |
The root logging level. Default: |
|
String |
The logging level for the gRPC libraries. Default: |
|
String |
The logging level for the Netty libraries. Default: |
|
String |
The logging level for the DPM CLI. Default: |
|
String |
The logging level for the main DPM CLI class. By default, the logging is turned off to avoid cluttering the logs. Default: |
gRPC client configuration
Property | Data type | Description |
---|---|---|
|
String |
Limits the size of messages that the gRPC client can process. Default value: |
Client TLS/mTLS configuration
Both the HTTP and the gRPC client share the same configuration. The HTTP client is used to communicate with ONE Object Storage, while the gRPC client is used for accessing DPM.
The configuration can be specified differently depending on the client type by changing the prefix from ataccama.client.tls
to ataccama.client.http.tls
or ataccama.client.grpc.tls
.
Property | Data type | Description |
---|---|---|
|
Boolean |
Defines whether the gRPC and HTTP clients should use TLS when communicating with the servers. It also ensures that the communication with the servers is secure (encrypted instead of in plain text) and guarantees the integrity of messages. Default: |
|
String |
The full path to a truststore file that contains public keys and certificates against which the client verifies the certificates from the server.
For example, |
|
String |
The type of the truststore.
Allowed values: |
|
String |
The password for the truststore. Used if the truststore is encrypted. |
Troubleshooting
When TLS is enabled, an error is thrown in the stream log
Issue: In the current version of ONE, if TLS authentication is enabled on DPM CLI, submitted jobs finish successfully, but an error message of status FAILURE
and log type STD_ERROR
is displayed in the log output.
Possible solution: As a workaround, you can submit jobs with the --async
argument set to false
.
However, in this case, you need to send an additional request to retrieve the job results.
For more information, see Local job settings.
The same solution can be applied for both LOCAL and SPARK jobs.
|
Incorrect mounting of files
Issue: When submitting a job using DPM CLI without mapping parameters for workspace and root, files might be mounted incorrectly and the process ends with a failure.
Workaround: Add the following parameters when submitting your job to DPM CLI: -wrk / -root /
as follows:
start.bat submitDqc -h localhost -p 8531 --user <username> --password <password> -mp "C:\Users\user.name\Downloads\ataccama-one-desktop-13.2.0-windows\workspace\Tutorials\01 Reading input\01.02 Read DB table.plan" -rt "C:\Users\user.name\Downloads\ataccama-one-desktop-13.2.0-windows\workspace\test\runtime2.runtimeConfig" -z false -md ORACLE -wrk / -root /
Was this page useful?