Import Data from a Catalog Item
Importing an existing catalog item from the Knowledge Catalog to ONE Data means making the data discoverable in the application. Once your data is loaded, you can enhance it by adding or removing attributes or modifying the table values.
You can import an entire catalog item with or without DQ results, only data, or deduplicated data.
This is especially useful in these cases:
-
You have a data source with ungoverned data that you can’t manage otherwise.
-
You need to quickly modify some data, namely invalid records, before exporting it back to the data source.
-
The catalog item contains data that can be used as reference data.
ONE Data is a type of data source in ONE and you can access the metadata of each ONE Data table in the Knowledge Catalog as well. These tables are labeled as ONE Data catalog items. To easily navigate to ONE Data from the Knowledge Catalog, when viewing the table in the Knowledge Catalog, go to Data > Open in ONE Data. |
In the current version, we recommend working with datasets of up to 50k records for optimal performance. You are notified in the application if your data load exceeds the recommended system limit. |
Import from a catalog item
This option is not supported for catalog items with attributes of binary data type. |
To start your import, follow these steps, then proceed with one of the following sections (Import data only, Import data with DQ results, or Import deduplicated data respectively). Importing DQ results includes first running DQ evaluation on the whole catalog item.
-
In ONE Data, use the dropdown next to Create table and select From Catalog Item.
You can also import invalid records from an observed system in the Data Observability module. See Data Observability Dashboards.
-
Find and select the catalog item that you want to use. Use full-text search and filters as needed.
You can filter by applied or suggested glossary terms, DQ percentage, data source or location, number of catalog item attributes or records, last profiling date, catalog item owner, or detected anomalies. In addition, you can also view only published catalog items or drafts. Alternatively, locate the catalog item in the Data Catalog, use the three dots menu and select Load to ONE Data.
-
Once you select the catalog item, select the data load type:
-
To import data with DQ results, select Full (data with DQ results). In this case, you cannot select which attributes are imported. DQ evaluation is performed on the whole catalog item before a new ONE Data table is created.
-
Next, choose whether to import the whole dataset or only invalid records.
-
All records: Imports all available records and the latest DQ results.
-
Invalid records: Imports all records that failed any of the applied DQ rules.
It’s also possible to load failed records to ONE Data from the catalog item Overview or Data Quality tabs. See Load failed records to ONE Data.
-
-
After you have made your choice, select Next and continue with step Import data with DQ results to finish creating your table.
-
-
To import data without DQ results, select Data only and then Next. To finish creating your table, continue with step Import data only.
-
To deduplicate data, select Deduplicated data and then Next. To finish creating your table, continue with step Import deduplicated data.
-
When loading deduplicated data, any terms and rules applied to data are also imported to ONE Data. This means that any DQ results on deduplicated attributes will be available in ONE Data. |
If your catalog item contains attributes named dmm_record_id or dmm_rank , the import fails on validation as these keywords are reserved for technical attributes in ONE Data and each attribute name must be unique in a table.
|
Import data with DQ results
-
If you selected Full (data with DQ results) and then All records or Invalid records in the previous step, choose one of the two options:
To prevent any potential issues with multi-attribute rules, you can’t select which attributes are imported in this case. -
Create new ONE Data Table
-
Enter a unique name for the table and optionally a description.
-
In Stewardship, select the table owner and roles. For more information, Stewardship. Otherwise, the stewardship configuration is inherited from the data source.
-
-
Overwrite existing ONE Data table: In Target table, find and select which table you want to overwrite.
This deletes all existing data from the selected table. This option is only available if you have previously created a ONE Data table with this catalog item.
-
-
Select Create table. Depending on the size of your catalog item, it might take a few minutes to get everything ready. In addition, as this import option includes running DQ evaluation on the whole table, it typically takes longer compared to importing data only.
To continue working with the platform in the meantime, select Run in background. This takes you to the newly created ONE Data catalog item in the Knowledge Catalog.
Alternatively, remain on the same page until your ONE Data table is created. A notification lets you know when the import is finished.
If your import fails, there is likely an issue with connecting to your data source. Check the error log in Processing Center for more details. -
Your table is now ready for use. See the Next steps section for more tips about how to proceed.
Import data only
-
If you selected Data only in the previous step, you can now edit which attributes the new ONE Data table should contain.
-
To import all attributes: Select the checkbox in the column header.
-
To import a selection of attributes: Select the attributes individually. You can narrow down the list using the full-text search and sort by the attribute name, data type, comments, or description.
Technical attributes, such as the record identifier in ONE Data tables ( dmm_record_id
), must be included in the import and cannot be cleared.
-
-
Once you choose the attributes, select Next.
-
Choose one of the two options:
-
Create new ONE Data Table
-
Enter a unique name for the table and optionally a description.
-
In Stewardship, select the table owner and roles. For more information, see Stewardship. Otherwise, the stewardship configuration is inherited from the data source.
-
-
Overwrite existing ONE Data table: In Target table, find and select which table you want to overwrite.
This deletes all existing data from the selected table. This option is only available if you have previously created a ONE Data table with this catalog item.
-
-
Select Create table. Depending on the size of your catalog item, it might take a few minutes to get everything ready.
To continue working with the platform in the meantime, select Run in background. This takes you to the newly created ONE Data catalog item in the Knowledge Catalog.
Alternatively, remain on the same page until your ONE Data table is created. A notification lets you know when the import is finished.
If your import fails, there is likely an issue with connecting to your data source. Check the error log in Processing Center for more details. -
Your table is now ready for use. See the Next steps section for more tips about how to proceed.
Import deduplicated data
Before proceeding, get familiar with how deduplication is performed in ONE Data. See How deduplication works?.
-
If you selected Deduplicated data in the previous step, you can now choose which attributes the new ONE Data table should contain. You need to select at least one attribute to proceed, however, you can edit your selection at the following step. You can narrow down the list using the full-text search and sort by the attribute name, data type, comments, or description.
Technical attributes, such as the record identifier in ONE Data tables ( dmm_record_id
), must be included in the import and cannot be cleared. -
Once you choose the attributes, select Next.
-
Select the attribute or a combination of attributes based on which the data will be deduplicated. If needed, select Add or remove attributes to modify the chosen attributes.
All attributes listed here will be included in the table while the selected ones (Key field) will be used as the deduplication key. -
Once you’re happy with your choice, select Next.
-
Enter a unique name for the table and optionally a description.
-
Select Create table. Depending on the size of your catalog item, it might take a few minutes to get everything ready.
To continue working with the platform in the meantime, select Run in background. This takes you to the newly created ONE Data catalog item in the Knowledge Catalog.
Alternatively, remain on the same page until your ONE Data table is created. A notification lets you know when the import is finished.
If your import fails, there is likely an issue with connecting to your data source. Check the error log in Processing Center for more details. -
Your table is now ready for use. See the Next steps section for more tips about how to proceed.
You can also deduplicate data from the following locations:
|
How deduplication works?
When deduplicating data in ONE, records from a dataset are grouped based on one or several attributes that you define as the deduplication key. Once the table is created, it contains the first occurrence of each record from the original dataset.
If the deduplication key consists of a single attribute, any records where that attribute is null are excluded from the table. Otherwise, if the key is composed of multiple attributes, records are exported as long as at least one of the attributes is not-null.
The table can also include any other attributes that help you better describe your data. In this case, non-empty records are prioritized over empty ones for every additional attribute.
For example, if the original dataset contains the following records and the grouping attribute is Country code
, the deduplicated data will only retain the second row.
Country code (deduplication key) | Country name | Note |
---|---|---|
CZE |
The additional attribute is empty, the record is not included in the table. |
|
CZE |
Czech Republic |
The preferred record where the additional attribute is not empty. In addition, an attribute called frequency is added to the new table by default, which stores the number of occurrences for each record. Use this attribute to identify outliers and determine which values should be removed. If you don’t need it, you can also delete it once the table is created. |
Load failed records to ONE Data
It’s also possible to load failed records to ONE Data directly from the catalog item’s Overview or Data Quality tabs.
-
Locate the required catalog item and:
-
In the Overview tab, locate the Data Quality widget, use the three dots menu, and select Load failed records to ONE Data.
-
In the Data Quality tab, select Load failed records to ONE Data.
This option is only available when you are viewing the Latest results.
-
-
Choose one of the two options:
-
Create new ONE Data Table
-
Enter a unique name for the table and optionally a description.
-
In Stewardship, select the table owner and roles. For more information, see Stewardship. Otherwise, the stewardship configuration is inherited from the data source.
-
-
Overwrite existing ONE Data table: In Target table, find and select which table you want to overwrite.
This deletes all existing data from the selected table. This option is only available if you have previously created a ONE Data table with this catalog item.
-
-
Select Start loading records to ONE Data. Depending on the size of your catalog item, it might take a few minutes to get everything ready. In addition, as this import option includes running DQ evaluation on the whole table, it typically takes longer compared to importing data only.
To continue working with the platform in the meantime, select Run in background. This takes you to the newly created ONE Data catalog item in the Knowledge Catalog.
Alternatively, remain on the same page until your ONE Data table is created. A notification lets you know when the import is finished.
If your import fails, there is likely an issue with connecting to your data source. Check the error log in Processing Center for more details. -
Your table is now ready for use. See the Next steps section for more tips about how to proceed.
Tips for filtering attributes
-
Try searching for integer or string to filter attributes by their data type.
-
Unsure about what data you’re working with? Switch to the Data Preview tab to see a live preview of the top 50 records in the catalog item. You can also select attributes directly from this tab.
The preview isn’t available for virtual catalog items.
-
If you have multiple attributes with similar data and the quality overview on the Attributes tab isn’t sufficient, check out their detailed profiling and DQ results on the Profile & DQ Insights tab. You can both filter and select attributes from here, there’s no need to switch back and forth between tabs.
This information is available only for data that has been profiled.
Next steps
Once you have successfully created a ONE Data table, start exploring what ONE Data and ONE can offer you.
-
Change the structure or the contents of your table at any point from the Data tab of your table, as explained in Get Started with ONE Data, sections Edit table model and Edit data inline.
-
Check the data quality of your table (DQ Evaluation in ONE Data) or learn how to use it in a rule (Validate Data using ONE Data Tables).
-
If you imported invalid records, learn how to use ONE Data to resolve your data quality issues (Data Remediation with ONE Data).
-
If you want to export the data back to the origin data source, see Get Started with ONE Data, section Export data to another data source.
-
Or, share the table with your team members and collaborate with them using tasks and comments. See Get Started with ONE Data, sections Permissions and access and Tasks and comments.
Was this page useful?