Apache Parquet
Apache Parquet assets are supported by the following data sources:
There is no limit to the size of the Parquet asset you can import. However, only individual Parquet files up to 6 GB can be profiled.
| Unless Spark engines are utilized, only pure Parquet format is supported, not Delta Lake parquet files with Delta logs. |
How Apache Parquet assets are imported
You can import the following Apache Parquet assets into ONE:
-
Parquet files
-
Parquet tables
-
Partitioned Parquet tables
When a Parquet file is imported, a catalog item is created with the attributes from the file.
However, when Parquet assets other than files are imported, ONE analyses the asset then creates a catalog item with the attributes based on the Parquet asset. For example, a Parquet folder that contains a partitioned table creates a catalog item with the attributes from the Parquet files and the partitioned columns.
A Parquet folder that contains a Parquet table creates a catalog item with the attributes from the Parquet table.
|
Parquet file import can only recognize:
|
Profiling
| Profiling can only process individual Parquet files up to 6 GB. |
-
Full profiling is supported for individual Parquet files, Parquet tables, and partitioned Parquet tables.
-
Sample profiling is supported for Parquet tables and partitioned Parquet tables, but only runs on the first Parquet file found in the folder.
Was this page useful?