User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Amazon S3 Connection

Currently, only support for CSV, Apache Parquet, and Microsoft Excel files is available.

In versions prior to 14.5.1, only AWS access key authentication type is supported for Parquet files.

Create a source

To connect to Amazon S3:

  1. Navigate to Data Catalog > Sources.

  2. Select Create.

  3. Provide the following:

    • Name: The source name.

    • Description: A description of the source.

    • Deployment (Optional): Choose the deployment type.

      You can add new values if needed. See Lists of Values.
    • Stewardship: The source owner and roles. For more information, see Stewardship.

Alternatively, add a connection to an existing data source. See Connect to a Source.

Add a connection

  1. Select Add Connection.

  2. In Select connection type, choose File Systems > Amazon S3.

  3. Provide the following:

    720
    • Name: A meaningful name for your connection. This is used to indicate the location of catalog items.

    • Description (Optional): A short description of the connection.

    • Bucket name: Provide the name of the Amazon S3 bucket that you want to use.

    • Region: Select the region to which the bucket belongs.

  4. In Additional settings, select Allow exporting and loading of Data if you want to export data from this connection and use it in ONE Data or outside of ONE.

    If you want to export data to this source, you also need to configure write credentials as well.
    Consider the security and privacy risks of allowing the export of data to other locations.

Add credentials

When selecting the authentication method, take note of the following:

  • In versions prior to 14.5.1, the only supported authentication type for Apache Parquet assets is AWS Access Key credentials.

  • If you want to authenticate using an AWS Instance IAM Role, Data Processing Engine (DPE) must be installed in your Amazon Web Services subscription on a virtual machine (VM) instance.

    If you have multiple DPEs running, you might need to specify additional constraints. See Constraints Configuration.

  • If you want to authenticate with an AWS Web Identity IAM Role, DPE must be installed in your Amazon Web Services subscription with the appropriate service account. This is recommended for AWS EKS clusters rather than individual VM instances and supported only for hybrid or self-managed deployments.

    Storing credentials is not needed as they are obtained from the environment where DPE is running.

  1. Select Add Credentials.

  2. Choose an authentication method and proceed with the corresponding step:

AWS Access Key Credentials

The connecting user requires the following permissions in AWS:

  • s3:ListBucket: Allows retrieving all content from a bucket.

  • s3:GetObject: Allows downloading files to DPE when browsing or importing data.

  1. Provide the following:

    720
    • Name (Optional): A name for this set of credentials.

    • Description (Optional): A description for this set of credentials.

    • Access key: Specify the AWS access key ID.

    • Secret key: Specify the AWS secret access key.

  2. If you want to use this set of credentials by default when connecting to the data source, select Set as default.

    One set of credentials must be set as default for each connection. Otherwise, monitoring and DQ evaluation fail, and previewing data in the catalog is not possible.
  3. Optionally, enable assuming a role, as described in Assumed role.

  4. Proceed with Test the connection.

AWS EC2 VM Instance IAM Role

  1. Provide the following:

    • Name (Optional): A name for this set of credentials.

    • Description (Optional): A description for this set of credentials.

  2. If you want to use this set of credentials by default when connecting to the data source, select Set as default.

    One set of credentials must be set as default for each connection. Otherwise, monitoring and DQ evaluation fail, and previewing data in the catalog is not possible.
  3. Optionally, enable assuming a role, as described in Assumed role.

  4. Proceed with Test the connection.

AWS Web Identity IAM Role

  1. Provide the following:

    • Name (Optional): A name for this set of credentials.

    • Description (Optional): A description for this set of credentials.

  2. If you want to use this set of credentials by default when connecting to the data source, select Set as default.

    One set of credentials must be set as default for each connection. Otherwise, monitoring and DQ evaluation fail, and previewing data in the catalog is not possible.
  3. Optionally, enable assuming a role, as described in Assumed role.

  4. Proceed with Test the connection.

Assumed role

All S3 credential types offer the option to assume a role from a different AWS account (such as AWS Access Key, EC2 Instance Role, or Web Identity Role) instead of using the user’s known identity.

To configure this:

  1. Select Enable Assumed Role.

  2. Provide the following:

    720
    • Amazon Resource Name (ARN): The ARN of the IAM role you want to assume, for example: arn:aws:iam::123456789123:role/myAwesomeRole.

    • External ID (Optional): If the role creation in AWS administration enables the use of an external ID for the trust relationship between accounts, enter it here. For example: prod-eu.

    • Session Name (Optional): A name for the session to identify the connection in AWS logs.

      It should consist of uppercase and lowercase alphanumeric characters with no spaces. You can also include any of the following characters: _ = , . @ -.

      You can also provide a default value in the field, such as Ataccama_One.

    • STS Region: The AWS region, such as eu-central-1 or us-east-1, in which you want to use the STS service.

Add write credentials

Write credentials are required if you want to export data to this source.

To configure these, in Write credentials, select Add Credentials and follow the corresponding step depending on the chosen authentication method (see Add credentials).

Make sure to set one set of write credentials as default. Otherwise, this connection isn’t shown when configuring data export.

Test the connection

To test and verify whether the data source connection has been correctly configured, select Test Connection.

If the connection is successful, continue with the following step. Otherwise, verify that your configuration is correct and that the data source is running.

Save and publish

Once you have configured your connection, save and publish your changes. If you provided all the required information, the connection is now available for other users in the application.

In case your configuration is missing required fields, you can view a list of detected errors instead. Review your configuration and resolve the issues before continuing.

Next steps

You can now browse and profile assets from your Amazon S3 connection.

In Data Catalog > Sources, find and open the source you just configured. Switch to the Connections tab and select Document. Alternatively, opt for Import or Discover documentation flow.

Or, to import or profile only some assets, select Browse on the Connections tab. Choose the assets you want to analyze and then the appropriate profiling option.

Was this page useful?