Amazon Athena Connection

This guide describes how to connect to Amazon Athena.

Amazon Athena is an interactive query service that lets you analyze data directly in Amazon S3 using standard SQL. For more information, see the official Amazon Athena documentation.

This connector supports cross-account access through IAM role authentication configured at the JDBC driver level.

Prerequisites

To set up a connection between ONE and Amazon Athena, make sure the following conditions are met:

You have an AWS IAM user or role with the permissions required to access Athena, AWS Glue, and the S3 bucket used for query results. See Configure AWS permissions.
You have an S3 location to use as the query output location. For more information, see Specify a query result location in the AWS documentation.

Avoid creating Athena object names with leading spaces, as these can be difficult to detect and can cause usability issues. For more information, see Names for tables, databases, and columns in the AWS documentation.

Configure AWS permissions

The IAM user or role used to connect to Athena needs the following permissions:

Athena: StartQueryExecution, GetQueryExecution, GetQueryResults, StopQueryExecution, ListQueryExecutions, GetWorkGroup, ListWorkGroups, ListDatabases, GetDatabase, ListTableMetadata, GetTableMetadata.
AWS Glue: GetDatabase, GetDatabases, GetTable, GetTables, GetPartition, GetPartitions, CreateTable, DeleteTable, UpdateTable.
Amazon S3 (for the query output bucket): GetObject, PutObject, DeleteObject, ListBucket, GetBucketLocation.

For more information about Athena IAM permissions, see Identity and access management in Athena.

Example IAM policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AthenaAccess",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:StopQueryExecution",
                "athena:ListQueryExecutions",
                "athena:GetWorkGroup",
                "athena:ListWorkGroups",
                "athena:ListDatabases",
                "athena:GetDatabase",
                "athena:ListTableMetadata",
                "athena:GetTableMetadata"
            ],
            "Resource": "*"
        },
        {
            "Sid": "GlueAccess",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:CreateTable",
                "glue:DeleteTable",
                "glue:UpdateTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<your-athena-bucket>",
                "arn:aws:s3:::<your-athena-bucket>/*"
            ]
        }
    ]
}

Create a source

To connect to Amazon Athena:

Navigate to Data Catalog > Sources.
Select Create.
Provide the following:
- Name: The source name.
- Description: A description of the source.
- Deployment (Optional): Choose the deployment type.
  
  You can add new values if needed. See Lists of Values.
- Stewardship: The source owner and roles. For more information, see Stewardship.

Alternatively, add a connection to an existing data source. See Connect to a Source.

Add a connection

Select Add Connection.
In Connection type, choose Relational Database > Amazon Athena.

Provide the following:

Name: A meaningful name for your connection. This is used to indicate the location of catalog items.
Description (Optional): A short description of the connection.
Dpe label (Optional): Assign the processing of a data source to a particular data processing engine (DPE) by entering the DPE label assigned to the engine. For more information, see DPM and DPE Configuration in DPM Admin Console.
Override catalog item name: Select this option if you want metadata imports to overwrite the names of existing catalog items with the latest names from the data source.

If the option is not selected, catalog item names you set in ONE are preserved when metadata is reimported. See Edit catalog item metadata.

JDBC: A JDBC connection string for your Athena instance. The connection string must include the AWS region, the S3 output location, and the database.

The format of the connection string depends on the authentication method:

Username and password authentication (IAM access keys):

jdbc:athena://Region=<region>;OutputLocation=s3://<bucket>/<path>/;Database=<database>

Example:

jdbc:athena://Region=eu-central-1;OutputLocation=s3://test-athena-mission/test-database/;Database=test_database

Integrated credentials (instance profile):

jdbc:athena://Region=<region>;OutputLocation=s3://<bucket>/<path>/;Database=<database>;CredentialsProvider=InstanceProfile

Example:

jdbc:athena://Region=eu-central-1;OutputLocation=s3://test-athena-mission/test-database/;Database=test_database;CredentialsProvider=InstanceProfile

For the full list of supported parameters, see JDBC v3 driver connection parameters.

In Additional settings, select Enable exporting and loading of data if you want to export data from this connection and use it in ONE Data or outside of ONE.

If you want to export data to this source, you also need to configure write credentials. See Connection credentials.

Consider the security and privacy risks of allowing the export of data to other locations.

Add credentials

Different sets of credentials can be used for different tasks. One set of credentials must be set as default for each connection.

To determine whether you need to configure more than a single set of credentials, see Connection credentials.

Select Add Credentials.
Choose an authentication method and proceed with the corresponding step:
- Username and password: Use IAM access key credentials.
- Integrated credentials: Use the instance profile attached to the host where the DPE runs.

Username and password

Use this option to authenticate with IAM access key credentials.

Select Username and password and provide the following:
- Name (Optional): A name for this set of credentials.
- Description (Optional): A description for this set of credentials.
- Username: The IAM access key ID.
- Password: The IAM secret access key.
  
  To use a secret management service to provide these values, see Secret management service credentials.
If you want to use this set of credentials by default when connecting to the data source, select Set as default.

See also Connection credentials.
Proceed with Test the connection.

Integrated credentials

Use this option to authenticate with the instance profile of the host where the DPE runs. No additional values need to be provided in ONE; the JDBC connection string must include CredentialsProvider=InstanceProfile.

Select Integrated credentials and provide the following:
- Name (Optional): A name for this set of credentials.
- Description (Optional): A description for this set of credentials.
If you want to use this set of credentials by default when connecting to the data source, select Set as default.

See also Connection credentials.
Proceed with Test the connection.

Secret management service credentials

Select Username and password and provide the following:
- Name (Optional): A name for this set of credentials.
- Description (Optional): A description for this set of credentials.
- Select a secret management service: Specify which secret management service should be used to retrieve the credential values. For more information, see Secret Management Services.
- Username: Enable Use secret management service and provide the name the username is stored under in your selected secret management service.
- Password: Enable Use secret management service and provide the name the password is stored under in your selected secret management service.
If you want to use this set of credentials by default when connecting to the data source, select Set as default.

See also Connection credentials.
Proceed with Test the connection.

Test the connection

To test and verify whether the data source connection has been correctly configured, select Test Connection.

If the connection is successful, continue with the following step. Otherwise, verify that your configuration is correct and that the data source is running.

Save and publish

Once you have configured your connection, save and publish your changes. If you provided all the required information, the connection is now available for other users in the application.

In case your configuration is missing required fields, you can view a list of detected errors instead. Review your configuration and resolve the issues before continuing.

Next steps

You can now browse and profile assets from your connection.

In Data Catalog > Sources, find and open the source you just configured. Switch to the Connections tab and select Document. Alternatively, opt for Import or Discover documentation flow.

Or, to import or profile only some assets, select Browse on the Connections tab. Choose the assets you want to analyze and then the appropriate profiling option.

Was this page useful?