User Community Service Desk Downloads

Azure AD Workload Identity for Hybrid DPE on AKS

Azure AD Workload Identity is a Kubernetes-native authentication method that allows pods running on Azure Kubernetes Service (AKS) to securely access Azure resources without storing credentials. The authentication is handled through federated identity credentials and OpenID Connect (OIDC) tokens.

This guide explains how to configure your hybrid Data Processing Engine (DPE) deployment to use Azure AD Workload Identity for authenticating to supported Azure data services.

While this guide focuses on AKS, Azure AD Workload Identity is also supported in other Kubernetes environments, such as GKE, EKS, or self-managed clusters. The setup is similar across environments; the main difference is how to enable and obtain the OIDC issuer URL. For details, see Managed Clusters - Azure AD Workload Identity.

Supported connections

Azure AD Workload Identity authentication is supported for the following connection types:

  • Azure Data Lake Storage Gen2

  • Azure SQL Database

  • Azure Synapse Analytics

Azure AD Workload Identity is not supported for Azure Key Vault integration in hybrid DPE deployments. For Key Vault, use client credential (service principal) authentication instead.

For other Azure data sources not listed here, use client credential authentication, which is supported universally.

Prerequisites

Before configuring Azure AD Workload Identity for your hybrid DPE, ensure you have:

  • DPE version 16.3.0 or later up and running.

  • Helm chart version 1603.0.105 or later.

  • Helm installed and configured to manage your DPE deployment.

  • AKS cluster with OIDC issuer enabled.

  • An Azure subscription with permissions to create and configure managed identities and role assignments.

  • kubectl access to your AKS cluster with appropriate permissions.

Azure configuration

Step 1: Configure Azure AD Workload Identity on AKS

Your AKS cluster must have Azure AD Workload Identity configured. To set up your cluster, refer to the official Azure documentation: Introduction - Azure AD Workload Identity.

The key requirements are:

  • AKS cluster with OIDC issuer enabled.

  • User-assigned managed identity (system-assigned managed identity is not supported).

  • Kubernetes service account annotated with the identity.

  • Federated identity credential in Azure AD.

Step 2: Create user-assigned managed identity

Create a user-assigned managed identity in Azure that will be used by your DPE pods.

Create the user-assigned managed identity from the AKS Cloud Shell, not through the Azure Portal. System-assigned managed identities are not supported for this configuration.

To run the following commands, you need:

  • Your Azure resource group name.

  • Your AKS cluster name.

# Enable managed identity on the AKS cluster
az aks update --resource-group <resource-group> --name <aks-cluster-name> --enable-managed-identity

# List identity profiles available for your AKS cluster
az aks show -g <resource-group> -n <aks-cluster-name> --query identityProfile

The output displays the identity information you need for DPE configuration:

{
  "kubeletidentity": {
    "clientId": "5d13edda-62f5-4b4e-b6e8-9511271bff8f",
    "objectId": "b0435cd0-ff59-4397-aa35-ebafeed22a8d",
    "resourceId": "/subscriptions/.../providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>"
  }
}

Note the following values for DPE configuration:

  • Client ID: The clientId value from the output.

  • Tenant ID: Your Azure AD tenant ID.

Step 3: Configure federated identity credentials

Set up federated identity credentials to establish trust between your AKS cluster and the managed identity. This involves configuring the following:

  • OIDC issuer URL from your AKS cluster.

  • Kubernetes namespace where DPE is deployed.

  • Kubernetes service account name.

  • Subject identifier format.

For detailed steps, refer to the Federated Identity Credential - Azure AD Workload Identity documentation.

Step 4: Assign Azure RBAC roles

Grant the managed identity permissions to access Azure resources.

For more information about Azure RBAC roles for blob data, see Assign an Azure role for access to blob data.

Azure Data Lake Storage Gen2

You can assign roles using Azure CLI or the Azure Portal.

Option 1: Azure CLI
az role assignment create \
  --assignee <client-id> \
  --role "Storage Blob Data Contributor" \
  --scope /subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
Option 2: Azure Portal
  1. Navigate to your ADLS storage account.

  2. Go to Access Control (IAM).

  3. Select Add > Add role assignment.

  4. Select the Storage Blob Data Contributor role.

  5. Assign access to the managed identity you created.

Azure SQL Database

Adjust role assignments based on your specific access requirements (for example, Reader, Contributor, or custom roles).

DPE configuration

Configure Azure AD Workload Identity in DPE

Add the following configuration to your DPE Helm values file. Replace <client-id> and <tenant-id> with the values from your Azure managed identity.

features:
  azureAdWorkloadIdentity:
    enabled: true
    clientId: <client-id>
    tenantId: <tenant-id>

extraEnv:
  - name: AZURE_TENANT_ID
    value: "<tenant-id>"

A Kubernetes service account can only be federated to a single Azure identity through a federated credential.

If you need to access multiple Azure resources, ensure this identity has the appropriate role assignments for all required resources.

Apply the configuration

Update your DPE deployment with the new Helm values:

helm upgrade --install <release-name> <chart> \
  --namespace <namespace> \
  -f values.yaml

Verify that the DPE pods restart and are running correctly:

kubectl get pods -n <namespace>

To view detailed DPE deployment information:

kubectl describe deployment dpe --namespace <namespace>

Using Azure AD Workload Identity in Ataccama ONE

When using Azure AD Workload Identity, you do not need to configure Azure Key Vault in Secret Management Services. Authentication to Azure resources is handled automatically through the workload identity configured in your DPE deployment.

Connection configuration

When creating connections to Azure resources:

  1. Follow the standard connection setup process for your Azure service:

  2. In the Authentication section, select Azure AD Workload Identity as the authentication method.

The connection uses the workload identity configured in your DPE deployment to authenticate to the Azure resource.

Validation

To verify your Azure AD Workload Identity configuration is working correctly:

  1. Test connection in Ataccama ONE:

    • Create a test connection to one of your Azure resources using Azure AD Workload Identity authentication.

    • Verify connectivity using Test Connection option. If successful, the connection should authenticate without requiring additional credentials.

  2. Check pod logs for authentication-related errors:

    # Get the pod name
    kubectl get pods -n <namespace>
    
    # View logs (use -f flag to follow)
    kubectl logs <dpe-pod-name> -n <namespace> -f

Troubleshooting

Authentication failures when testing connection

  • Verify the managed identity has the correct role assignments for the target Azure resource.

  • Check that the federated identity credentials are correctly configured with the right namespace and service account.

  • Ensure the DPE Helm values have the correct clientId and tenantId.

DPE pods not starting after configuration

  • Review pod logs for errors:

    kubectl logs <pod-name> -n <namespace>
  • Verify the Helm values syntax is correct.

Azure AD Workload Identity option not available

  • Confirm you’re running DPE version 16.3.0 or later.

  • Verify the azureAdWorkloadIdentity.enabled flag is set to true in your Helm values.

ADLS: Endpoint does not support BlobStorageEvents or SoftDelete

If you see this error when browsing the data source in Ataccama ONE, the issue occurs when using an ADLS Gen2 linked service with a storage account that has soft delete or blob storage events enabled.

To resolve:

  1. Navigate to your storage account in Azure Portal.

  2. Go to Data Protection.

  3. Turn off Enable soft delete for blobs.

  4. If blob event subscriptions are configured, turn off or remove them under Events.

Alternatively, ensure your storage account has hierarchical namespace enabled (see the following section).

For more information, see Known issues with Azure Data Lake Storage.

ADLS: Hierarchical namespace required

If you encounter errors such as java.lang.NumberFormatException for date strings, or other unexpected behavior when accessing ADLS Gen2, ensure your storage account has hierarchical namespace (HNS) configured.

Hierarchical namespace is a key capability of Azure Data Lake Storage Gen2 that enables file system semantics. Without it, certain ADLS Gen2 features might not work correctly.

To enable hierarchical namespace, when creating a new storage account, select Enable hierarchical namespace on the Advanced tab. For existing accounts, see Azure Data Lake Storage hierarchical namespace for upgrade options.

Enabling hierarchical namespace on an existing account is a one-way operation and cannot be reverted.

For additional troubleshooting, refer to the Azure AD Workload Identity troubleshooting guide.

Was this page useful?