Configure Apache Airflow with OpenLineage
This guide shows you how to configure Apache Airflow to send OpenLineage events to your Ataccama ONE Agentic orchestrator connection.
Before you begin, ensure you have generated an API key and copied your endpoint URL.
Install the OpenLineage provider
Choose the appropriate package based on your Airflow version:
- For versions older than 2.7
pip install openlineage-airflow
- For version 2.7 and newer (recommended)
pip install apache-airflow-providers-openlineage
Enable OpenLineage
Ensure OpenLineage is enabled in your airflow.cfg:
[openlineage]
disabled = False
disabled = False is the default value, so you can also omit this setting entirely if you want OpenLineage enabled.
|
Configure the transport
The transport defines how Airflow sends OpenLineage events to Ataccama ONE Agentic. You can configure it through:
-
External YAML/JSON file - Cleaner approach, recommended for managing secrets
-
Inline JSON in airflow.cfg - Direct configuration in the
[openlineage]transport key -
Environment variable - Using
AIRFLOWOPENLINEAGETRANSPORT, useful for containerized deployments
Below are environment-agnostic configuration methods - use whichever is most suitable your deployment approach.
|
For production environments, avoid committing API keys to source control. Use external configuration files with restricted access permissions or environment variables managed by your secrets management system. |
Method 1: External configuration file (verified)
Create an openlineage.yml file:
transport:
type: http
url: https://<YOUR_INSTANCE>.ataccama.one/
endpoint: gateway/openlineage/<CONNECTION_ID>/events
auth:
type: api_key
apiKey: <YOUR_API_KEY>
Split your OpenLineage endpoint URL into url (base) and endpoint (path).
For example, <YOUR_INSTANCE>.ataccama.one/gateway/openlineage/<CONNECTION_ID>/events becomes url: <YOUR_INSTANCE>.ataccama.one/ and endpoint: gateway/openlineage/<CONNECTION_ID>/events.
|
You can use this file in two ways:
Option A: Automatic detection
Place openlineage.yml directly in your Airflow config directory.
Many Airflow images/providers will automatically detect and load it.
Option B: Explicit path
Reference the file explicitly in airflow.cfg:
[openlineage]
disabled = False
transport = /opt/airflow/openlineage.yml
Method 2: Inline configuration in airflow.cfg
| The following methods are documented in Apache Airflow but have not been verified with Ataccama ONE Agentic. |
[openlineage]
disabled = False
transport = {"type": "http", "url": "https://<YOUR_INSTANCE>.ataccama.one/", "endpoint": "gateway/openlineage/<CONNECTION_ID>/events", "auth": {"type": "api_key", "apiKey": "<YOUR_API_KEY>"}}
Method 3: Environment variables
| The following methods are documented in Apache Airflow but have not been verified with Ataccama ONE Agentic. |
export AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "https://<YOUR_INSTANCE>.ataccama.one/", "endpoint": "gateway/openlineage/<CONNECTION_ID>/events", "auth": {"type": "api_key", "apiKey": "<YOUR_API_KEY>"}}'
# Optionally, explicitly enable OpenLineage (disabled=False is the default)
export AIRFLOW__OPENLINEAGE__DISABLED=False
Restart Airflow
After configuration, restart your Airflow scheduler and workers for the changes to take effect.
Advanced: Running dbt Core within Airflow
When Airflow orchestrates dbt Core, you have multiple options for emitting OpenLineage events. The Airflow OpenLineage configuration described above applies to all these scenarios.
Understanding your options
Choose the approach that matches your requirements for lineage granularity:
| Option | Detail level | Recommended for |
|---|---|---|
Airflow provider only |
DAG and task boundaries |
Simple workflows where dbt model-level details aren’t needed |
dbt with |
Model-level detail from dbt + task-level from Airflow |
Detailed lineage without restructuring DAGs |
Astronomer Cosmos |
Native Airflow task per dbt model |
Teams wanting Airflow-native observability (retries, SLAs) for each model |
Option 1: Airflow provider only (simplest)
This option uses only the Airflow OpenLineage configuration you’ve already set up.
Airflow emits lineage events for DAG and task lifecycles.
When Airflow tasks execute dbt (e.g., using BashOperator to run dbt run), Airflow reports task-level events, but dbt-specific metadata (models, tests, column-level lineage) is not included.
When to use: You only need to see which Airflow tasks ran dbt and their success/failure status.
Configuration: Use the OpenLineage transport configuration from the previous section. No additional setup required.
Result: You’ll see Airflow task-level lineage in the Data Observability module, but without detailed dbt model information.
Option 2: Call dbt with dbt-ol inside Airflow tasks
This option provides model-level lineage in addition to Airflow’s task-level lineage.
Airflow emits its own task events, and the dbt integration emits dataset and model-level events. You will see two related runs: one from the Airflow provider and one from dbt.
When to use: You want detailed dbt lineage (models, dependencies, column-level details) while keeping your existing DAG structure.
Configuration: TBC
Result: You’ll see two related runs in the Data Observability module: one from Airflow (task-level) and one from dbt (model-level).
Option 3: Use Astronomer Cosmos
Cosmos converts individual dbt models into separate Airflow tasks, giving you native Airflow observability for each model.
The Airflow OpenLineage provider automatically emits detailed events per model without requiring the dbt-ol wrapper.
When to use: You prefer native Airflow task features (retries, SLAs, dependencies) mapped directly to dbt models and want to avoid managing dbt profiles separately from Airflow connections.
Configuration: Follow the Cosmos lineage configuration guide.
Result: Each dbt model appears as an individual Airflow task with full lineage, making it easy to retry specific models and track dependencies.
Troubleshooting
-
No events appear: Ensure
disabled = Falsein the[openlineage]section. Check Airflow scheduler and worker logs for OpenLineage errors. -
Authentication fails: Verify your API key is correct and the transport format matches your provider version.
-
Provider not found: The official Airflow Docker images may already include the provider. Otherwise, install
apache-airflow-providers-openlineageexplicitly.
Was this page useful?