User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Apache Parquet

Apache Parquet assets are supported by the following data sources:

  • Amazon S3

    In versions prior to 14.5.1, only AWS access key authentication type is supported.
  • ADLS Gen2

There is no limit to the size of the Parquet asset you can import. However, only individual Parquet files up to 6 GB can be profiled.
Unless Spark engines are utilized, only pure Parquet format is supported, not Delta Lake parquet files with Delta logs.

How Apache Parquet assets are imported

You can import the following Apache Parquet assets into ONE:

  • Parquet files

  • Parquet tables

  • Partitioned Parquet tables

400

When a Parquet file is imported, a catalog item is created with the attributes from the file.

400

However, when Parquet assets other than files are imported, ONE analyses the asset then creates a catalog item with the attributes based on the Parquet asset. For example, a Parquet folder that contains a partitioned table creates a catalog item with the attributes from the Parquet files and the partitioned columns.

600

A Parquet folder that contains a Parquet table creates a catalog item with the attributes from the Parquet table.

400

Parquet file import can only recognize:

  1. A folder with Parquet Files. For example: ./Orders/{Parquet Part Files}.

  2. A with Partitioned Parquet Files, where data is partitioned by specific columns like year and month. For example: ./Orders/{Year=2024}/{Month=01}/{Parquet Part Files}.

Profiling

Profiling can only process individual Parquet files up to 6 GB.
  • Full profiling is supported for individual Parquet files, Parquet tables, and partitioned Parquet tables.

  • Sample profiling is supported for Parquet tables and partitioned Parquet tables, but only runs on the first Parquet file found in the folder.

Browsing

You can preview the data only for small Parquet files - up to 5 MB by default. The maximum file size can be configured using the property ataccama.one.parquet.preview-maximum-size in dpe/etc/application.properties.

Was this page useful?