User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

DPM CLI

The DPM CLI is a command-line based integration tool for Data Processing Module (DPM), more specifically Executor, that is intended to assist with submitting and canceling ONE jobs, retrieving a list of jobs and their status information, changing the job priority in the job queue.

The entry point for the tool is bin/start.sh (bin/start.bat for Windows) while the configuration is located in etc/application.properties. For more information about how to configure the DPM CLI, refer to Configuration reference.

By default, jobs started using the DPM CLI are stored in storage/<jobId>. However, a different location can be specified through input parameters.

Server credentials are passed in one of the following ways:

  • As command-line arguments.

  • Through a file.

  • Interactively, by providing them in the terminal when prompted.

Other input files, such as lookups or large CSV files, can be referenced from ONE Object Storage and therefore do not need to be stored on your local machine when submitting.

Basics

The following section describes some commonly used commands and the arguments that they take.

For a full list of commands and arguments, run the command:

List all commands and arguments
$ ./start.sh -h
The guide provides examples for Linux-based OS. If you are using Windows, use start.bat script instead.

In general, the commands are structured as follows: <action_name> <connection_details> <action_params>.

Submit jobs

Jobs are submitted using the command submitDqc. The following sections contain an overview of the arguments that the command takes. Required arguments include connection details and job input parameters.

Connection details

When specifying connection details, you need to define where DPM is running as well as provide authentication credentials.

Command line argument Example Description

--host or -h

localhost

The server hostname. Typically, this corresponds to the hostname of the DPM gRPC server. Required.

--port or -p

port

The server port. Typically, this corresponds to the port of the DPM gRPC server. Required.

--user or -u

admin

The username for accessing DPM. If not provided, the user is prompted to provide a password.

--password or -pwd

admin

The password for accessing DPM. If not provided, the password is checked for in the passwordFile parameter.

--passwordFile or -pwdf

password.txt

Points to a file containing the password for accessing DPM. If not provided, the user is prompted to provide a password.

Job input parameters

These parameters supply key information for executing jobs.

Command line argument Example Description

--plan or -mp

plan.plan

Points to the main ONE plan that should be executed. Required.

--runtimeConfig or -rt

runtime.runtimeConfig

Points to the runtime configuration that should be applied. Required.

--inputFile or -f

in.csv

Points to additional input files, such as lookups, components, CSV files.

To specify multiple files, use the following structure: -f /path/to/component.comp -f /path/to/lookup.lkp.

--inputLink or -l

/executor/path/to/shared.lkp resource://onefs/<storage_id>/lookups/lookup.lkp

-l /executor/path/to/shared.comp resource://onefs/<storage_id>/components/<subfolder>/component.comp

Points to files from ONE Object Storage. Consists of two values, the first one being the path that Executor uses when running the job, the second one a link that refers to the actual location of the file in the Object Storage.

The link has the following format: resource://onefs/[storageId]/[bucketName]/[objectName].

--mountDriver or -md

POSTGRESQL

Used if the job needs access to a database. DPM and DPE add this database driver into the runtime configuration, in the <drivers> element. If a connection is already defined, it should point to the driver name (<datasource driverName="<driver_name>" …​>).

Make sure the value entered here matches the filename (case-insensitive) of the database driver on your DPM instance.

A list of all available data sources can be retrieved using the datasourceConfigList command (no parameters required), for example:

$ ./start.sh datasourceConfigList -h localhost -p 8531 -u admin -pwd admin

--argument or -arg

key value

If using parametrized components, this parameter is used to configure those parameters. Each argument takes two values: a key and a value.

Additional job parameters

The following parameters are optional and provide more details about jobs or set job priority.

Command line argument Example Description

--name or -n

one-job

Optional. The name of the job. If not provided, the name of the main ONE plan is used instead.

--description or -d

This is for testing purposes.

Optional. A description of the job.

--priority or -pri

10

Optional. Sets the priority of the job in the DPM job queue.

The default value is 0. The higher the value, the higher the priority.

Alternatively, if priority option is not used in submitDqc, you can define the priority of a job in the job queue through the command setPriority after the job has been submitted. There are two required arguments: the job identifier (--jobId or -id) and the priority level (--priority or -pri), for example:

$ ./start.sh setPriority -h localhost -p 8531 --jobId 28994ec9-f0e3-4331-93d6-49087e3be656 --priority 7

Local job settings

The following parameters are optional and allow customizing how local jobs are executed.

Command line argument Example Description

--jobDir or -dir

/storage/<job_id>

The location where job logs and results (if zipped) are stored. Defaults to storage/<jobId>.

--async or -a

false

If set to true, DPM CLI waits until the job is finished and downloads the results. Otherwise, DPM CLI only submits the job. By default, jobs are executed synchronously (async=false).

Alternatively, if async option is set to true, you can run waitForJobResult, which waits for the job to complete and downloads the results if the job was successfully finished. The command requires the job identifier (--jobId or -id). Optionally, you can also provide where the results should be stored (--jobDir or -dir). For example:

$ ./start.sh waitForJobResult -h localhost -p 8531 --jobId 28994ec9-f0e3-4331-93d6-49087e3be656

To check the job status, run the command jobList, which lists all jobs and takes no parameters, or jobStatus, which requires the job identifier (--jobId or -id). For jobStatus, if you want to retrieve only the job status and no additional information, add --statusOnly or -s and set it to true, for example:

$ ./start.sh jobStatus -h localhost -p 8531 --jobId 28994ec9-f0e3-4331-93d6-49087e3be656 --statusOnly true

--zip or -z

true

If set to true, when running a job locally, processing results are zipped and stored in the job directory (jobDir parameter). Otherwise, they are stored in their original location (default option).

Spark job settings

These arguments are used when running Spark jobs. Only the launchType argument is required.

Command line argument Example Description

--launchType or -t

SPARK

If running a job on the Spark cluster, set the value to SPARK. Defaults to LOCAL.

--clusterName or -c

clusterName

The name of the Spark cluster on which the job should be executed.

To get a list of available clusters, run the command datasourceConfigList (no parameters required), for example:

$ ./start.sh datasourceConfigList -h localhost -p 8531 -u admin -pwd admin

--clusterUser or -cu

user

Optional. Sets the Spark cluster credentials.

Used if the credentials are not provided in the default configuration or if you want to use another set of credentials. If Spark credentials are not provided, the authentication is skipped.

--clusterPassword or -cpwd

password

Optional. Sets the password for the Spark user. If not provided, the password file is checked instead.

--clusterPasswordFile or -cpwdf

password.txt

Points to a file containing the password for the Spark user. If not provided, the user is prompted to provide a password.

Mapping paths (advanced use)

DPM CLI lets you specify custom paths, which can be useful when job results are zipped and you prefer having simpler paths in the result. In this case, the first value is used by the Executor for processing in DPM, while the second one points to the actual file that is uploaded to Executor, for example: -f /executor/uses/this/path.csv /real/path/to/file.csv.

To be able to map paths, the following arguments are needed:

Command line argument Example Description

--workingDir or -wrk

/

The absolute path of the working directory from which relative paths are computed. If not set, it defaults to the current working folder. If set to /, all relative paths are resolved to the root.

Must be defined as an absolute path.

--rootDir or -root

/

For Unix-like paths, if not specified, the system default is used (/).

If using Windows, this refers to the current drive. For example, if the current drive is D:/, then absolute paths where the drive is not provided use the same drive (D:/some/path). When it comes to custom paths, DPE can handle Windows-like paths even on Linux.

Examples

The following command retrieves a list of available data source configurations. In this case, the credentials are interactively provided through the terminal.

List data source configurations
$ ./start.sh datasourceConfigList -h localhost -p 8531

To select which data source driver is used to execute the job, provide the -md argument:

Submit a job: select data source driver
$ ./start.sh submitDqc -h localhost -p 8531 --user admin --password admin -mp /home/<username>/ataccama/workspace/project/config.plan -rt /home/<username>/ataccama/workspace/project/runtime.runtimeConfig -z true -md POSTGRESQL

You can specify custom paths for input files, as described in Mapping paths (advanced use). To define two paths, one for the Executor and another one that provides the actual file location, use the following command structure:

Submit a job: custom paths
$ ./start.sh submitDqc -h localhost -p 8531 --passwordFile password.txt -mp plan.plan /home/<username>/builds/ataccama/workspace/project/config2.plan -rt runtime.runtimeConfig /home/<username>/builds/ataccama/workspace/project/runtime2.runtimeConfig -f in.csv /home/<username>/builds/ataccama/workspace/project/in.csv --workingDir / --rootDir / -z true

If you need to cancel a running job, the following command is used:

Cancel a job
$ ./start.sh killJob -h localhost -p 8531 --jobId 28994ec9-f0e3-4331-93d6-49087e3be656

Configuration reference

The following properties are configured in the <dpm-cli>/etc/application.properties file.

Logging configuration

Available values for all logging properties are INFO, WARN, ERROR, DEBUG, OFF.

Property Data type Description

logging.level.root

String `

The root logging level.

Default: ERROR.

logging.level.io.grpc

String

The logging level for the gRPC libraries.

Default: WARN.

logging.level.io.netty

String

The logging level for the Netty libraries.

Default: WARN.

logging.level.com.ataccama.dpm.cli

String

The logging level for the DPM CLI.

Default: INFO.

logging.level.com.ataccama.dpm.cli.MainKt

String

The logging level for the main DPM CLI class. By default, the logging is turned off to avoid cluttering the logs.

Default: OFF.

gRPC client configuration

Property Data type Description

ataccama.client.grpc.properties.max-message-size

String

Limits the size of messages that the gRPC client can process.

Default value: 10MB. Accepted units: B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes).

Client TLS/mTLS configuration

Both the HTTP and the gRPC client share the same configuration. The HTTP client is used to communicate with ONE Object Storage, while the gRPC client is used for accessing DPM.

The configuration can be specified differently depending on the client type by changing the prefix from ataccama.client.tls to ataccama.client.http.tls or ataccama.client.grpc.tls.

Property Data type Description

ataccama.client.tls.enabled

Boolean

Defines whether the gRPC and HTTP clients should use TLS when communicating with the servers. It also ensures that the communication with the servers is secure (encrypted instead of in plain text) and guarantees the integrity of messages.

Default: false.

ataccama.client.tls.trust-store

String

The full path to a truststore file that contains public keys and certificates against which the client verifies the certificates from the server. For example, file:/path/to/truststore.p12.

ataccama.client.tls.trust-store-type

String

The type of the truststore. Allowed values: PKCS12, JCEKS.

ataccama.client.tls.trust-store-password

String

The password for the truststore. Used if the truststore is encrypted.

Troubleshooting

When TLS is enabled, an error is thrown in the stream log

Issue: In the current version of ONE, if TLS authentication is enabled on DPM CLI, submitted jobs finish successfully, but an error message of status FAILURE and log type STD_ERROR is displayed in the log output.

Possible solution: As a workaround, you can submit jobs with the --async argument set to false. However, in this case, you need to send an additional request to retrieve the job results. For more information, see Local job settings.

The same solution can be applied for both LOCAL and SPARK jobs.

Incorrect mounting of files

Issue: When submitting a job using DPM CLI without mapping parameters for workspace and root, files might be mounted incorrectly and the process ends with a failure.

Workaround: Add the following parameters when submitting your job to DPM CLI: -wrk / -root / as follows:

start.bat submitDqc -h localhost -p 8531 --user <username> --password <password> -mp "C:\Users\user.name\Downloads\ataccama-one-desktop-13.2.0-windows\workspace\Tutorials\01 Reading input\01.02 Read DB table.plan" -rt "C:\Users\user.name\Downloads\ataccama-one-desktop-13.2.0-windows\workspace\test\runtime2.runtimeConfig" -z false -md ORACLE -wrk / -root /

Was this page useful?