Azure AD Workload Identity for Hybrid DPE on AKS
Azure AD Workload Identity is a Kubernetes-native authentication method that allows pods running on Azure Kubernetes Service (AKS) to securely access Azure resources without storing credentials. The authentication is handled through federated identity credentials and OpenID Connect (OIDC) tokens.
This guide explains how to configure your hybrid Data Processing Engine (DPE) deployment to use Azure AD Workload Identity for authenticating to supported Azure data services.
|
While this guide focuses on AKS, Azure AD Workload Identity is also supported in other Kubernetes environments, such as GKE, EKS, or self-managed clusters. The setup is similar across environments; the main difference is how to enable and obtain the OIDC issuer URL. For details, see Managed Clusters - Azure AD Workload Identity. |
Supported connections
Azure AD Workload Identity authentication is supported for the following connection types:
-
Azure Data Lake Storage Gen2
-
Azure SQL Database
-
Azure Synapse Analytics
|
Azure AD Workload Identity is not supported for Azure Key Vault integration in hybrid DPE deployments. For Key Vault, use client credential (service principal) authentication instead. For other Azure data sources not listed here, use client credential authentication, which is supported universally. |
Prerequisites
Before configuring Azure AD Workload Identity for your hybrid DPE, ensure you have:
-
DPE version 16.3.0 or later up and running.
-
Helm chart version
1603.0.105or later. -
Helm installed and configured to manage your DPE deployment.
-
AKS cluster with OIDC issuer enabled.
-
An Azure subscription with permissions to create and configure managed identities and role assignments.
-
kubectlaccess to your AKS cluster with appropriate permissions.
Azure configuration
Step 1: Configure Azure AD Workload Identity on AKS
Your AKS cluster must have Azure AD Workload Identity configured. To set up your cluster, refer to the official Azure documentation: Introduction - Azure AD Workload Identity.
The key requirements are:
-
AKS cluster with OIDC issuer enabled.
-
User-assigned managed identity (system-assigned managed identity is not supported).
-
Kubernetes service account annotated with the identity.
-
Federated identity credential in Azure AD.
Step 2: Create user-assigned managed identity
Create a user-assigned managed identity in Azure that will be used by your DPE pods.
| Create the user-assigned managed identity from the AKS Cloud Shell, not through the Azure Portal. System-assigned managed identities are not supported for this configuration. |
To run the following commands, you need:
-
Your Azure resource group name.
-
Your AKS cluster name.
# Enable managed identity on the AKS cluster
az aks update --resource-group <resource-group> --name <aks-cluster-name> --enable-managed-identity
# List identity profiles available for your AKS cluster
az aks show -g <resource-group> -n <aks-cluster-name> --query identityProfile
The output displays the identity information you need for DPE configuration:
{
"kubeletidentity": {
"clientId": "5d13edda-62f5-4b4e-b6e8-9511271bff8f",
"objectId": "b0435cd0-ff59-4397-aa35-ebafeed22a8d",
"resourceId": "/subscriptions/.../providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>"
}
}
Note the following values for DPE configuration:
-
Client ID: The
clientIdvalue from the output. -
Tenant ID: Your Azure AD tenant ID.
Step 3: Configure federated identity credentials
Set up federated identity credentials to establish trust between your AKS cluster and the managed identity. This involves configuring the following:
-
OIDC issuer URL from your AKS cluster.
-
Kubernetes namespace where DPE is deployed.
-
Kubernetes service account name.
-
Subject identifier format.
For detailed steps, refer to the Federated Identity Credential - Azure AD Workload Identity documentation.
Step 4: Assign Azure RBAC roles
Grant the managed identity permissions to access Azure resources.
For more information about Azure RBAC roles for blob data, see Assign an Azure role for access to blob data.
Azure Data Lake Storage Gen2
You can assign roles using Azure CLI or the Azure Portal.
DPE configuration
Configure Azure AD Workload Identity in DPE
Add the following configuration to your DPE Helm values file.
Replace <client-id> and <tenant-id> with the values from your Azure managed identity.
features:
azureAdWorkloadIdentity:
enabled: true
clientId: <client-id>
tenantId: <tenant-id>
extraEnv:
- name: AZURE_TENANT_ID
value: "<tenant-id>"
|
A Kubernetes service account can only be federated to a single Azure identity through a federated credential. If you need to access multiple Azure resources, ensure this identity has the appropriate role assignments for all required resources. |
Apply the configuration
Update your DPE deployment with the new Helm values:
helm upgrade --install <release-name> <chart> \
--namespace <namespace> \
-f values.yaml
Verify that the DPE pods restart and are running correctly:
kubectl get pods -n <namespace>
To view detailed DPE deployment information:
kubectl describe deployment dpe --namespace <namespace>
Using Azure AD Workload Identity in Ataccama ONE
When using Azure AD Workload Identity, you do not need to configure Azure Key Vault in Secret Management Services. Authentication to Azure resources is handled automatically through the workload identity configured in your DPE deployment.
Connection configuration
When creating connections to Azure resources:
-
Follow the standard connection setup process for your Azure service:
-
In the Authentication section, select Azure AD Workload Identity as the authentication method.
The connection uses the workload identity configured in your DPE deployment to authenticate to the Azure resource.
Validation
To verify your Azure AD Workload Identity configuration is working correctly:
-
Test connection in Ataccama ONE:
-
Create a test connection to one of your Azure resources using Azure AD Workload Identity authentication.
-
Verify connectivity using Test Connection option. If successful, the connection should authenticate without requiring additional credentials.
-
-
Check pod logs for authentication-related errors:
# Get the pod name kubectl get pods -n <namespace> # View logs (use -f flag to follow) kubectl logs <dpe-pod-name> -n <namespace> -f
Troubleshooting
Authentication failures when testing connection
-
Verify the managed identity has the correct role assignments for the target Azure resource.
-
Check that the federated identity credentials are correctly configured with the right namespace and service account.
-
Ensure the DPE Helm values have the correct
clientIdandtenantId.
DPE pods not starting after configuration
-
Review pod logs for errors:
kubectl logs <pod-name> -n <namespace> -
Verify the Helm values syntax is correct.
Azure AD Workload Identity option not available
-
Confirm you’re running DPE version 16.3.0 or later.
-
Verify the
azureAdWorkloadIdentity.enabledflag is set totruein your Helm values.
ADLS: Endpoint does not support BlobStorageEvents or SoftDelete
If you see this error when browsing the data source in Ataccama ONE, the issue occurs when using an ADLS Gen2 linked service with a storage account that has soft delete or blob storage events enabled.
To resolve:
-
Navigate to your storage account in Azure Portal.
-
Go to Data Protection.
-
Turn off Enable soft delete for blobs.
-
If blob event subscriptions are configured, turn off or remove them under Events.
Alternatively, ensure your storage account has hierarchical namespace enabled (see the following section).
For more information, see Known issues with Azure Data Lake Storage.
ADLS: Hierarchical namespace required
If you encounter errors such as java.lang.NumberFormatException for date strings, or other unexpected behavior when accessing ADLS Gen2, ensure your storage account has hierarchical namespace (HNS) configured.
Hierarchical namespace is a key capability of Azure Data Lake Storage Gen2 that enables file system semantics. Without it, certain ADLS Gen2 features might not work correctly.
To enable hierarchical namespace, when creating a new storage account, select Enable hierarchical namespace on the Advanced tab. For existing accounts, see Azure Data Lake Storage hierarchical namespace for upgrade options.
| Enabling hierarchical namespace on an existing account is a one-way operation and cannot be reverted. |
For additional troubleshooting, refer to the Azure AD Workload Identity troubleshooting guide.
Was this page useful?