User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Operate On File

Operate on File icon

Provides 10 potential file operations: Copy, Delete, Exists, Not_exists, Mkdir, Move, Info, Unzip, Zip, and List, which are defined in the Operation parameter.

The task evaluates both /data/tmp and /data/tmp/ paths as a tmp folder, so to operate the folder contents without the folder itself, we recommend using the /data/tmp/* mask.
Since version 12, this task contains the functionality of HDFS Operate on File, HDFS Download File, and HDFS Upload File tasks.

The task can use remote resources (accessible with resource://<resourceName>/<path>/<inputFile> syntax):

  • HDFS (if your product contains ONE Spark DPE).

  • Amazon S3 and Azure Data Lake Storage Gen1 servers.

Amazon S3 does not operate with a folder structure the way other systems do; it operates only with files. It allows having a file with forward slash (/) characters in its filename, which is represented by the browser as a folder. In other words, when you create a folder on Amazon S3, you create a file with a forward slash in its filename.

The resulting file and filename cause issues with operations such as Delete or Move in the Operate on File workflow task when removing the last file from folder. Therefore, we strongly recommend using the operation Mkdir in all of your workflow tasks whenever it is necessary.

When using the Mkdir operation, your folder is still available even after its last file is deleted. All operations respect the permissions on the filesystem as expected (that is, in case of insufficient permissions on the specified source or target, the task fails).

Copy

Copies source file or folder to the destination folder. The operation automatically preserves target files timestamps.

Name Type Description Expression support

Operation

mandatory, must be "COPY"

Operation name.

none

Source File

mandatory

Path to the source mask on files or folders. The task fails when no file or directory is found to mask.

Supports using wildcards.

semi-expression

Target File

mandatory

Path to the destination folder or file:

  • Folder - File after copy observes the name; the folder must exist, otherwise the task fails.

  • File - Parent folder of the target file must exist, otherwise the task fails.

semi-expression

Recursive Flag

optional

Default value: false.

Permission to copy subdirectories.

none

Overwrite Flag

optional

Default value: false.

Permission to overwrite files in the destination folder. The task fails when Overwrite Flag is false and some of the target files or folders already exist.

none

Keep Dir Tree Flag

optional

Default value: false.

Permission to copy the source file system hierarchy. Applicable only in case the Source File contains wildcards on multiple path levels (for example, /dir*/*.csv), otherwise the Keep Dir Tree Flag is ignored.

If true, the source file hierarchy is preserved starting from the highest (lowest) level of the Source File path containing a wildcard. For example, if you copy files from the in to the out folder with the Source File value in/dir*/*.csv, the target file hierarchy would have the following structure based on the Keep Dir Tree Flag value:

  • out/dir<value>/<file_name>.csv if the flag is true.

  • out/<file_name>.csv if the flag is false.

Make sure to appropriately set the Keep Dir Tree Flag and Overwrite Flag. If both flags are false and multiple files with the same name match the Source File mask, the task attempts to write all matched files to the same destination folder and fails. For an example, see Wildcards in Workflow Tasks, section Paths with wildcards on multiple levels.

none

Delete

Deletes a target file or folder. The task fails when it does not have permissions to delete Target File.

Name Type Description Expression support

Operation

mandatory, must be "DELETE"

Operation name.

none

Target File

mandatory

Path to the mask on the target file or folder. The task fails when no file or directory is found to mask.

Supports using wildcards.

semi-expression

Recursive Flag

optional

Default value: false.

Permission to delete subdirectories.

none

Exists

Verifies the existence of a source file or folder.

Name Type Description Expression support

Operation

mandatory, must be "EXISTS"

Operation name.

none

Source File

mandatory

Path to the source file or directory. The task fails when the source does not exist.

semi-expression

Not exists

Verifies the absence of a source file or folder.

Name Type Description Expression support

Operation

mandatory, must be "NOT_EXISTS"

Operation name.

none

Source File

mandatory

Path to the source file or directory. The task fails when the source exists.

semi-expression

Mkdir

Creates a folder.

Name Type Description Expression support

Operation

mandatory, must be "MKDIR"

Operation name.

none

Target File

mandatory

Path to the destination folder. The task fails when the target is a multi-level directory structure from which some of the parent folders are missing and Recursive Flag is false.

semi-expression

Recursive Flag

optional

Default value: false.

Permission to make parent directories as needed.

none

Move

Moves a source file or folder to the destination folder. The operation automatically preserves target files timestamps.

The task fails when it does not have permissions to delete Source File.

Name Type Description Expression support

Operation

mandatory, must be "MOVE"

Operation name.

none

Source File

mandatory

Path to the source mask on files or folders. The task fails when no file or directory is found to mask.

Supports using wildcards.

semi-expression

Target File

mandatory

Path to the destination folder or file:

  • Folder - After moving, the file observes the name; the folder must exist, otherwise the task fails.

  • File - Parent folder of the target file must exist, otherwise the task fails.

semi-expression

Overwrite Flag

optional

Default value: false.

Permission to overwrite files in the destination folder. The task fails when Overwrite Flag is false and some of the Target File files or folders already exist.

none

Keep Dir Tree Flag

optional

Default value: false.

Permission to copy the source file system hierarchy.

Applicable only in case the Source File contains wildcards on multiple path levels (for example, /dir*/*.csv), otherwise the Keep Dir Tree Flag is ignored.

If true, the source file hierarchy is preserved starting from the highest (lowest) level of the Source File path containing a wildcard. For example, if you move files from the in to the out folder with the Source File value in/dir*/*.csv, the target file hierarchy would have the following structure based on the Keep Dir Tree Flag value:

  • out/dir<value>/<file_name>.csv if the flag is true.

  • out/<file_name>.csv if the flag is false.

Make sure to appropriately set the Keep Dir Tree Flag and Overwrite Flag. If both flags are false and multiple files with the same name match the Source File mask, the task attempts to move all matched files to the same destination folder and fails. For an example, see Wildcards in Workflow Tasks, section Paths with wildcards on multiple levels.

none

File info

Saves information about a source file or folder to Workflow Variables.

Name Type Description Expression support

Operation

mandatory, must be "INFO"

Operation name.

none

Source File

mandatory

Path to the source file or folder. The task fails when the source does not exist.

semi-expression

Task variable

Name Description

Name

Absolute path to the file or folder.

LocalName

Name of the file or folder.

Size

Size of the file or folder in bytes.

Timestamp

Last modification date in yyyy-MM-dd HH:mm:ss format.

The timestamp variable is not created when saving information about an Amazon S3 folder, as folders on S3 do not use timestamps.

Zip

Source files or folders are zipped to a ZIP file.

Name Type Description Expression support

Operation

mandatory, must be "ZIP"

Operation name.

none

Source File

mandatory

Path to source mask on files or folders. The task fails when no file or directory is found to mask.

semi-expression

Target File

mandatory

Path to the target ZIP file. The task fails when Target File is an existing directory (that is, the target ZIP file name cannot be the same as an existing directory name).

semi-expression

Exclude Parameters

optional

Set of mask parameters that are excluded from zipping.

semi-expression

Overwrite Flag

optional

Default value: false.

Permission to overwrite the output ZIP file. The task fails when Overwrite Flag is false and Target File already exists.

none

Recursive Flag

optional

Default value: false.

Permission to zip subdirectories.

none

Unzip

Extracts a source ZIP file to the destination folder.

Name Type Description Expression support

Operation

mandatory, must be "UNZIP"

Operation name.

none

Source File

mandatory

Path to the source file. The task fails when the source does not exist.

semi-expression

Target File

mandatory

Path to the destination folder. If the target folder structure does not exist, it is created automatically, with respect to the permissions on the filesystem.

semi-expression

Overwrite Flag

optional,

Default value: false.

Permission to overwrite ZIP file. The task fails when Overwrite Flag is false and Target File already exists.

none

List

Saves information of a source file or folder to the task variable list. The values are separated by the specified separator string.

There is also extended listing supported that contains additional information for each file or folder separated by an extra separator. In that case, values are in the order full name, type ('D' for directory, 'F' for file), last modified time in the format "yyyy-MM-dd HH:mm:ss", and size in bytes. For example, C:\data\test.txt|F|2023-10-08 12:30:50|45120;C:\data\images|D|2023-05-11 05:00:11|0.

Name Type Description Expression support

Operation

mandatory, must be "LIST"

Operation name.

none

Source File

mandatory

Path to the source file. The task fails when the source does not exist.

semi-expression

Mask

mandatory

File name mask used to select files and/or folders to list.

semi-expression

List File Flag

optional

Default value: false.

Include files in listing.

none

List Folders Flag

optional

Default value: false.

Include folders in listing.

none

Recursive Flag

optional

Default value: false.

Also list contents of any subdirectories. All subdirectories are scanned regardless of the mask or condition.

none

Separator

mandatory

String separating the items found in the resulting variable value.

none

Extended Listing Flag

optional

Default value: false.

Generates extended listing with full name, type ('D' for directory, 'F' for file), last modified time in "yyyy-MM-dd HH:mm:ss" format, and size in bytes. For example: C:\data\test.txt|F|2023-10-08 12:30:50|45120;C:\data\images|D|2023-05-11 05:00:11|0.

none

Extended Info Separator

mandatory

String separating the extended information items for each file or folder in the result.

none

Continue On Error flag

optional

Default value: false.

When specific file or folder information cannot be used, it is reported to the log and the task continues without failing.

none

Condition

optional

Expression evaluated for each candidate entry to check if it should be included in the output or not. Supports the following variables:

Name Expression type Description

name

string

Name of the file or folder.

fulName

string

Complete path of the file or folder.

isDirectory

Boolean

True if item is a directory, false if it is file.

size

long

Size of the file or folder in bytes.

timestamp

date/time

Last modification date and time.

Not all file systems return all the properties so some might be empty.

expression

Was this page useful?