Apache Avro
Apache Avro assets are supported by the following data sources:
-
Google Cloud Storage
-
Azure Data Lake Storage Gen2
-
Microsoft OneLake
-
Amazon S3
There is no limit to the size of the Avro asset you can import. However, only individual Avro files up to 6 GB can be profiled. |
How Apache Avro assets are imported
You can import the following Apache Avro assets into ONE:
-
Avro files
-
Avro tables
-
Partitioned Avro tables
When an Avro file is imported, a catalog item is created with the attributes from the file.
However, when Avro assets other than files are imported, ONE analyzes the asset then creates a catalog item with the attributes based on the Avro asset. For example, an Avro folder that contains a partitioned table creates a catalog item with the attributes from the Avro files and the partitioned columns.
An Avro folder that contains an Avro table creates a catalog item with the attributes from the Avro table.
Avro file import can only recognize:
|
Profiling
Profiling can only process individual Parquet files up to 6 GB. |
-
Full profiling is supported for individual Avro files, Avro tables, and partitioned Avro tables.
-
Sample profiling is supported for Avro tables, but only runs on the first Avro file found in the folder. Partitioned Avro tables do not support sample profiling.
Browsing
You can preview the data only for small Avro files - up to 5 MB by default.
The maximum file size can be configured using the property ataccama.one.avro.preview-maximum-size
in dpe/etc/application.properties
.
Was this page useful?