User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Run DQC on Cluster

Starts a MapReduce or Spark processing on a cluster.

Runs a ONE plan or component in a new Java process: the task waits until the process finishes. This allows to run the ONE plan within the new JVM while using a specific Java version and runtime parameters (such as memory settings).

The plan is validated before the run: if the plan is invalid, the task fails with FINISHED_FAILURE state and log details into a log. Logs of the process are stored in the task resources folder. The STDOUT log also contains a copy of the process command.

This task currently requires that the drivers of all database connections created in the IDE are placed in <ATACCAMA_home>/runtime/lib/.

You can set advanced properties for MapReduce and/or Spark processing in the .properties configuration file.

To start Spark processing from ONE Desktop, use the remote executor. The task cannot start Spark processing on a cluster via a native connection.

To trigger Spark processing via a native connection, use the runbde.sh script with the Run Shell Script task.

Properties

Name Type Description Expression support

Plan Path

mandatory

Relative (to the workflow file) or absolute path to the plan or component thab should be run on a cluster.

semi-expression

Hadoop Source

mandatory

Name of an existing cluster connection.

semi-expression

Execution Engine

mandatory

Selects the data processing engine (MapReduce or Spark) to run plans on a cluster.

none

Parameters

optional

Set of parameters to pass to the ONE component.

none

Path Variables

optional

Set of local path variables to use with the current task.

none

Java Dir

optional

Defines the path of the Java JRE or JDK directory to use. The defined directory must contain either bin/java or bin/java.exe executable.

If this attribute is not specified, the JRE is autodetected using Java’s java.home property of the current Java process. If this value cannot be resolved, the OS property JAVA_HOME is used instead. If none of the properties is defined, the task fails during validation.

none

Java Options

optional

Space separated JVM Configuration to pass to the Java Virtual Machine.

none

Parameters properties

Name Type Description Expression support

Name

mandatory

Name of the variable.

none

Expression

mandatory

Expression evaluating the value of the parameter. The expression must evaluate to a STRING data type value, otherwise an error is reported.

expression

Path variables properties

Name Type Description Expression support

Name

mandatory

Name of the path variable.

none

Value

mandatory

Value of the path variable.

semi-expression

Example

Run DQC on Cluster - Properties

Set ATACCAMA_HOME on Linux

The Run DQC Process task is a separate Java task, so it does not inherit the ATACCAMA_HOME variable. To set ATACCAMA_HOME for Run DQC Process on Linux, you have to set it as a global variable. Make sure the variable is set by the user running the online server.

To set the global variable:

  1. Navigate to the root/home, for example, /home/<user_name>.

  2. Open the hidden file .bash_profile and add the following parameter:

    export ATACCAMA_HOME=/one20/app/server/runtime

    Make sure the parameter points to the correct distribution.

  3. Stop the server.

  4. Log out from your Linux account and log back in.

  5. Start the server.

Was this page useful?