Custom File Import
This feature allows for customization of CSV and Microsoft Excel file import settings for individual files, folders, or whole connections to ensure all CSV and Excel files are imported and processed properly.
Overview
-
Custom import settings can be configured at a connection, folder, or catalog item level: the settings are configured in the connection browser (found on the Connections tab of a data source, not in the Data Catalog) and are applied for any operation on the relevant items (for example, preview, import, profiling, import to ONE Data, and DQ evaluation).
-
Settings are applied in a hierarchical manner from connections to folders, subfolders and individual files, where the settings at a lower level override any settings applied at a higher level.
-
When changing import settings at the file level, you can see a preview of the settings applied.
-
The system-detected import settings are used if the Autodetect option is enabled for that file.
-
It is possible to copy and paste settings (custom or auto-detected) between files, folders and connections.
Use case
-
A consultant creates a data source and configures the connection to file or object storage (for example, S3, Azure Data Lake Storage, or Google Drive).
-
The consultant (or data analyst) runs the Documentation flow.
-
The data analyst checks the imported catalog items: in the metadata and the data preview, they spot that one or more catalog items have not been imported properly.
They ask a data engineer to fix the file import, granting the engineer access to the catalog items and the file import settings in the connection. There is a dedicated task template for this action; see Create task.
-
The data engineer fixes broken catalog items by updating the file import settings for the file, any (sub)folder or the whole connection.
-
They can change the settings for a file while checking the preview of data read with the modified settings.
-
They can browse the connection, checking adjacent files in the same folder or files in other folders to identify the scope on which to apply the changes.
-
-
The data engineer re-runs the Documentation flow and verifies the catalog items have been imported correctly.
Create task
Users without access to the connection browser and file import settings can create tasks requesting changes directly from the affected catalog items. To do this, on the Overview tab, expand File import settings and select Create task.
A task sidebar opens in which the task name, description of the current import settings, and the catalog item information are automatically populated.
Provide more information or change information as required (for example, add suggestions for changes to the import settings or change the task priority).
Configure custom import
Object and file stores can contain hundreds or thousands of files exported from various systems, and the content is always changing. To account for this, it is possible to apply the settings to whole folders or connections as well as to individual files.
In ONE, you can configure custom import settings at the file, folder, or connection level. Using the connection browser, locate the required entry point for the configuration changes (that is, the connection, folder, or file).
To use the connection browser, first find the required source and on the Connections tab:
-
Expand the connection browser to open the folder view.
-
Select a folder to open the file view.
-
Select a file to view its contents.
Expand to view the visual guide
Navigate back from file to folder or folder to connection using the breadcrumb navigation.
If you want to change the import settings for the whole connection, select Connection import settings.
If you want to change the import settings at the file or folder level, select the settings icon next to the relevant file or folder.
Alternatively, to change settings at the file level, locate an item in the Data Catalog and from the Overview tab, and in File import settings select Go to connection browser.
|
If you edit the import configuration at the file level, a preview is available. Select Refresh to reload the preview.
|
If import settings have been changed, but the item has not yet been reimported, you see a warning next to the relevant item in the connection browser, but only until the page is reloaded. After page reload there is no indication of whether files have been reimported since the custom settings were applied. |
Import settings
CSV
To enable custom import settings, first turn off Autodetection.
Then define the correct values for your file import:
Property | Type | Default value | Notes |
---|---|---|---|
Value separator |
String |
|
|
Line separator |
String |
|
|
String qualifier |
String |
|
|
String qualifier escape |
String |
|
|
Text encoding |
String |
|
Select from the options available: |
For further customization, select Show advanced settings.
Advanced settings
Property | Type | Default value | Notes | ||
---|---|---|---|---|---|
Line max read length |
Integer |
|
Must be a positive integer.
|
||
Header row present |
Boolean |
|
|||
Header row |
Integer |
|
Number of rows in which header is present. Must be a positive integer. |
||
From row |
Non-negative integer |
|
Row from which data is present. Must be a positive integer. |
||
Number of lines in footer |
Integer |
|
Must be 0 or a positive integer. |
||
Ignore row RegEx |
String |
|
A regular expression defining rows to be ignored on import. |
||
Error handling strategy |
Choice |
|
|
||
Data Format Parameters |
|||||
Array separator |
String |
|
|||
Date format locale |
String |
|
For the full list of available locales, see JDK 11 Supported Locales. |
||
Date time format |
String |
|
Format validation active. Value should follow JAVA datetime format. For more information, see Class SimpleDateFormat |
||
Date format |
String |
|
Format validation active. |
||
Decimal separator |
String |
|
|||
Thousands separator |
String |
|
|||
True value |
String |
|
|||
False value |
String |
|
|||
Whitespace replacement string |
String |
|
Microsoft Excel
To enable custom import settings, first turn off Autodetection.
Property | Type | Default value | Notes |
---|---|---|---|
Skip empty rows |
Boolean |
|
|
Text encoding |
String |
|
Select from the options available: |
Header row present |
Boolean |
|
|
Header row |
Integer |
|
Rows in which header is present. Must be a positive integer. |
From row |
Integer |
|
Row from which data is present. Must be a positive integer. |
From column |
Integer |
|
Column from which data should start to be read. Must be a positive integer. |
To column |
Integer |
|
Column until which data should be read. Must be a positive integer. Leave this empty to read all available columns: it is only required if you want to restrict which columns should be read. |
Was this page useful?