Apache Parquet
Apache Parquet assets are supported by the following data sources:
-
Amazon S3
In versions prior to 14.5.1, only AWS access key authentication type is supported. -
ADLS Gen2
There is no limit to the size of the Parquet asset you can import. However, only individual Parquet files up to 6 GB can be profiled. |
Unless Spark engines are utilized, only pure Parquet format is supported, not Delta Lake parquet files with Delta logs. |
How Apache Parquet assets are imported
You can import the following Apache Parquet assets into ONE:
-
Parquet files
-
Parquet tables
-
Partitioned Parquet tables
When a Parquet file is imported, a catalog item is created with the attributes from the file.
However, when Parquet assets other than files are imported, ONE analyses the asset then creates a catalog item with the attributes based on the Parquet asset. For example, a Parquet folder that contains a partitioned table creates a catalog item with the attributes from the Parquet files and the partitioned columns.
A Parquet folder that contains a Parquet table creates a catalog item with the attributes from the Parquet table.
Parquet file import can only recognize:
|
Profiling
Profiling can only process individual Parquet files up to 6 GB. |
-
Full profiling is supported for individual Parquet files, Parquet tables, and partitioned Parquet tables.
-
Sample profiling is supported for Parquet tables and partitioned Parquet tables, but only runs on the first Parquet file found in the folder.
Browsing
You can preview the data only for small Parquet files - up to 5 MB by default.
The maximum file size can be configured using the property ataccama.one.parquet.preview-maximum-size
in dpe/etc/application.properties
.
Was this page useful?