User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Custom File Import

This feature allows for customization of CSV and Microsoft Excel file import settings for individual files, folders, or whole connections to ensure all CSV and Excel files are imported and processed properly.

Overview

  • Custom import settings can be configured at a connection, folder, or catalog item level: the settings are configured in the connection browser (found on the Connections tab of a data source, not in the Data Catalog) and are applied for any operation on the relevant items (for example, preview, import, profiling, import to ONE Data, and DQ evaluation).

    Connection settings
  • Settings are applied in a hierarchical manner from connections to folders, subfolders and individual files, where the settings at a lower level override any settings applied at a higher level.

  • When changing import settings at the file level, you can see a preview of the settings applied.

  • The system-detected import settings are used if the Autodetect option is enabled for that file.

  • It is possible to copy and paste settings (custom or auto-detected) between files, folders and connections.

Use case

  1. A consultant creates a data source and configures the connection to file or object storage (for example, S3, Azure Data Lake Storage, or Google Drive).

  2. The consultant (or data analyst) runs the Documentation flow.

  3. The data analyst checks the imported catalog items: in the metadata and the data preview, they spot that one or more catalog items have not been imported properly.

    They ask a data engineer to fix the file import, granting the engineer access to the catalog items and the file import settings in the connection. There is a dedicated task template for this action; see Create task.

  4. The data engineer fixes broken catalog items by updating the file import settings for the file, any (sub)folder or the whole connection.

    • They can change the settings for a file while checking the preview of data read with the modified settings.

    • They can browse the connection, checking adjacent files in the same folder or files in other folders to identify the scope on which to apply the changes.

  5. The data engineer re-runs the Documentation flow and verifies the catalog items have been imported correctly.

Create task

Users without access to the connection browser and file import settings can create tasks requesting changes directly from the affected catalog items. To do this, on the Overview tab, expand File import settings and select Create task.

Create task

A task sidebar opens in which the task name, description of the current import settings, and the catalog item information are automatically populated.

Create task sidebar

Provide more information or change information as required (for example, add suggestions for changes to the import settings or change the task priority).

Configure custom import

Object and file stores can contain hundreds or thousands of files exported from various systems, and the content is always changing. To account for this, it is possible to apply the settings to whole folders or connections as well as to individual files.

In ONE, you can configure custom import settings at the file, folder, or connection level. Using the connection browser, locate the required entry point for the configuration changes (that is, the connection, folder, or file).

To use the connection browser, first find the required source and on the Connections tab:

  1. Expand the connection browser to open the folder view.

  2. Select a folder to open the file view.

  3. Select a file to view its contents.

Expand to view the visual guide
Connection browser

Navigate back from file to folder or folder to connection using the breadcrumb navigation.

Browser breadcrumbs

If you want to change the import settings for the whole connection, select Connection import settings.

Connection import settings

If you want to change the import settings at the file or folder level, select the settings icon next to the relevant file or folder.

Import settings icon

Alternatively, to change settings at the file level, locate an item in the Data Catalog and from the Overview tab, and in File import settings select Go to connection browser.

Go to connection browser

If you edit the import configuration at the file level, a preview is available. Select Refresh to reload the preview.

Preview data custom settings

If import settings have been changed, but the item has not yet been reimported, you see a warning next to the relevant item in the connection browser, but only until the page is reloaded. After page reload there is no indication of whether files have been reimported since the custom settings were applied.

Custom import warning

Import settings

CSV

To enable custom import settings, first turn off Autodetection.

CSV settings

Then define the correct values for your file import:

Property Type Default value Notes

Value separator

String

,

Line separator

String

\r\n

String qualifier

String

"

String qualifier escape

String

"

Text encoding

String

UTF 8

Select from the options available: ISO Latin 1 - ISO-8859-1, ISO Latin 2 - ISO-8859-2, Windows 1250, Windows 1252, UTF 8, UTF 16 Big Endian, UTF 16 Little Endian, EBCDIC, SJIS, and Windows 31J.

For further customization, select Show advanced settings.

Advanced settings
CSV advanced settings
Property Type Default value Notes

Line max read length

Integer

65536

Must be a positive integer.

Values exceeding 600 000 will likely cause import to fail.

Header row present

Boolean

True

Header row

Integer

1

Number of rows in which header is present. Must be a positive integer.

From row

Non-negative integer

2

Row from which data is present. Must be a positive integer.

Number of lines in footer

Integer

0

Must be 0 or a positive integer.

Ignore row RegEx

String

Empty

A regular expression defining rows to be ignored on import.

Error handling strategy

Choice

Read possible

  • Read possible: Read as much data from any file as possible.

  • Stop: Stop reading when any discrepancy in file format is encountered.

This strategy is most suitable for when file formats are expected to be correct and there is no reason for using data from malformed files.

Data Format Parameters

Array separator

String

Empty

Date format locale

String

en_US

For the full list of available locales, see JDK 11 Supported Locales.

Date time format

String

yy-MM-dd HH:mm:ss

Format validation active. Value should follow JAVA datetime format. For more information, see Class SimpleDateFormat

Date format

String

yy/MM/dd

Format validation active.

Decimal separator

String

.

Thousands separator

String

Empty

True value

String

true

False value

String

false

Whitespace replacement string

String

Empty

Microsoft Excel

To enable custom import settings, first turn off Autodetection.

Excel settings

Property Type Default value Notes

Skip empty rows

Boolean

False

Text encoding

String

UTF 8

Select from the options available: ISO Latin 1 - ISO-8859-1, ISO Latin 2 - ISO-8859-2, Windows 1250, Windows 1252, UTF 8, UTF 16 Big Endian, UTF 16 Little Endian, EBCDIC, SJIS, and Windows 31J.

Header row present

Boolean

True

Header row

Integer

1

Rows in which header is present. Must be a positive integer.

From row

Integer

2

Row from which data is present. Must be a positive integer.

From column

Integer

1

Column from which data should start to be read. Must be a positive integer.

To column

Integer

Empty

Column until which data should be read. Must be a positive integer.

Leave this empty to read all available columns: it is only required if you want to restrict which columns should be read.

Copy and paste settings

Copy and paste settings from one folder or file to another in the connection browser. Use the three dots menu to access copy and paste commands.

Copy and paste settings

Revert to inherited settings

Where the import settings have been changed on the file or folder level, you can revert to the settings inherited from the parent folder or connection. Use the three dots menu and select Inherit settings.

Inherit settings

Was this page useful?