Ataccama Lineage Format
This page describes Ataccama lineage file structure produced by the lineage scanner. As this structure was designed to be human friendly, it can also be used to supply custom lineage metadata to Ataccama ONE.
Ataccama lineage file structure
The lineage file is a zip container file, that contains several plain data files which contain the lineage metadata. These are the following UTF-8 format data files located directly in the extraction file root.
-
assets.csv
-
flows.csv
-
transformations.csv
-
connections.csv
-
export.json
The detailed description of the file structures follow.
Lineage data files description
The following diagram shows the entity relation model of the data files:
assets.csv
The list of all assets identified in all scanned source systems.
Name | Type | Description |
---|---|---|
|
string not null unique |
Artificial or natural unique identifier of the asset in the source system. Does not necessarily contain any information describing the asset. |
|
enum not null |
Type of the asset from the data lineage perspective.
Possible values: |
|
string not null |
Name of the asset. |
|
string null |
Unique identifier of a parent asset. Null for the highest node in the hierarchy. |
|
string not null |
Unique identifier of the connection from which the asset was taken (scanned) or artificial connection belonging to deduction when the source connection wasn’t recognized. When the scanned source system has the ability to connect to other systems, the connections to these systems will be extracted and used as well. |
|
json null |
Additional metadata related to the asset in JSON format following the structure:
Supported attributes:
|
|
enum not null |
Defines the action which should be applied to the asset when the data are processed by the consuming application.
Possible values: |
flows.csv
The list of all flows identified in all scanned source systems.
Name | Type | Description |
---|---|---|
|
enum not null |
Type of the edge.
Possible values: |
|
string not null |
Unique identifier of the source asset. |
|
string not null |
Unique identifier of the target asset. |
|
json null |
Additional metadata related to the flow in JSON format following the structure:
|
|
json null |
JSON array of transformation unique identifiers. |
|
enum not null |
Defines the action which should be applied to the flow when the data are processed by the consuming application.
Possible values: |
transformations.csv
The list of all transformations identified in the source systems.
Name | Type | Description |
---|---|---|
|
string not null unique |
The unique identifier of the transformation identified in the source system. |
|
enum not null |
The type of the transformation.
Possible values: The attribute level lineage pointer from the flows.csv file should point to The expected hierarchy of the items in this table is |
|
string not null |
The text (body) of the transformation. |
|
string null |
The parent transformation unique identifier. Null for the highest transformation in the hierarchy. |
|
enum not null |
The complexity of the transformation.
Possible values: |
|
json null |
The list of all positions of the transformation text within the parent transformation text. Contains the array of the position objects in JSON format:
|
|
json null |
Additional metadata related to the transformation in JSON format following the structure:
|
|
enum not null |
Defines the action which should be applied to the asset when the data are processed by the consuming application.
Possible values: |
connections.csv
List of all scanned source system connections extended by the list of all connections identified in the source systems capable of connecting to other systems (e.g., BI tools or ETL tools) or artificial connections to the deducted assets.
Name | Type | Description |
---|---|---|
|
string not null unique |
Unique identifier of the connection. |
|
enum not null |
Type of the connection.
Possible values: |
|
string not null |
Name of the connection. |
|
string not null |
Technology name of the source system (e.g., |
|
json null |
Additional metadata related to the connection in JSON format following the structure:
|
|
enum not null |
Defines the action which should be applied to the asset when the data are processed by the consuming application.
Possible values: |
Example Ataccama lineage files
Azure Synapse Analytics
Manually created: ataccama_lineage_azure_custom.zip
MS SQL server
Manually created: ataccama_mssql_custom.zip
S3 and MS Excel
Generated: ataccama_lineage_s3_custom.zip
Was this page useful?