Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The AI toolchain operator (KAITO) is a managed add-on that simplifies the experience of running open-source and private AI models on your AKS cluster. KAITO reduces the time to onboard models and provision resources, enabling faster AI model prototyping and development rather than infrastructure management.
This article shows you how to enable the AI toolchain operator add-on and deploy an AI model for inferencing on AKS.
Important
AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:
Before you begin
- This article assumes a basic understanding of Kubernetes concepts. For more information, see Kubernetes core concepts for AKS.
- For all hosted model preset images and default resource configuration, see the KAITO GitHub repository.
- The AI toolchain operator add-on currently supports KAITO version 0.4.4, please make a note of this in considering your choice of model from the KAITO model repository.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin.
If you have multiple Azure subscriptions, make sure you select the correct subscription in which the resources will be created and charged using the az account set command.
Note
Your Azure subscription must have GPU VM quota recommended for your model deployment in the same Azure region as your AKS resources.
Azure CLI version 2.47.0 or later installed and configured. Run
az --version
to find the version. If you need to install or upgrade, see Install Azure CLI.The Kubernetes command-line client, kubectl, installed and configured. For more information, see Install kubectl.
Install the Azure CLI preview extension
Install the Azure CLI preview extension using the az extension add command.
az extension add --name aks-preview
Update the extension to make sure you have the latest version using the az extension update command.
az extension update --name aks-preview
Register the AI toolchain operator add-on feature flag
Register the AIToolchainOperatorPreview feature flag using the az feature register command.
az feature register --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview"
It takes a few minutes for the registration to complete.
Verify the registration using the az feature show command.
az feature show --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview"
Export environment variables
To simplify the configuration steps in this article, you can define environment variables using the following commands. Make sure to replace the placeholder values with your own.
export AZURE_SUBSCRIPTION_ID="mySubscriptionID" export AZURE_RESOURCE_GROUP="myResourceGroup" export AZURE_LOCATION="myLocation" export CLUSTER_NAME="myClusterName"
Enable the AI toolchain operator add-on on an AKS cluster
The following sections describe how to create an AKS cluster with the AI toolchain operator add-on enabled and deploy a default hosted AI model.
Create an AKS cluster with the AI toolchain operator add-on enabled
Create an Azure resource group using the az group create command.
az group create --name $AZURE_RESOURCE_GROUP --location $AZURE_LOCATION
Create an AKS cluster with the AI toolchain operator add-on enabled using the az aks create command with the
--enable-ai-toolchain-operator
and--enable-oidc-issuer
flags.az aks create --location $AZURE_LOCATION \ --resource-group $AZURE_RESOURCE_GROUP \ --name $CLUSTER_NAME \ --enable-oidc-issuer \ --enable-ai-toolchain-operator \ --generate-ssh-keys
Note
AKS creates a managed identity once you enable the AI toolchain operator add-on. The managed identity is used to create GPU node pools in the managed AKS cluster. Proper permissions need to be set for it manually following the steps introduced in the following sections.
On an existing AKS cluster, you can enable the AI toolchain operator add-on using the az aks update command.
az aks update --name $CLUSTER_NAME \ --resource-group $AZURE_RESOURCE_GROUP \ --enable-oidc-issuer \ --enable-ai-toolchain-operator
Connect to your cluster
Configure
kubectl
to connect to your cluster using the az aks get-credentials command.az aks get-credentials --resource-group $AZURE_RESOURCE_GROUP --name $CLUSTER_NAME
Verify the connection to your cluster using the
kubectl get
command.kubectl get nodes
Export environment variables
Export environment variables for the MC resource group, principal ID identity, and KAITO identity using the following commands:
export MC_RESOURCE_GROUP=$(az aks show --resource-group $AZURE_RESOURCE_GROUP \ --name $CLUSTER_NAME \ --query nodeResourceGroup \ -o tsv) export PRINCIPAL_ID=$(az identity show --name "ai-toolchain-operator-${CLUSTER_NAME}" \ --resource-group $MC_RESOURCE_GROUP \ --query 'principalId' \ -o tsv) export KAITO_IDENTITY_NAME="ai-toolchain-operator-${CLUSTER_NAME}"
Get the AKS OpenID Connect (OIDC) Issuer
Get the AKS OIDC Issuer URL and export it as an environment variable:
export AKS_OIDC_ISSUER=$(az aks show --resource-group $AZURE_RESOURCE_GROUP \ --name $CLUSTER_NAME \ --query "oidcIssuerProfile.issuerUrl" \ -o tsv)
Create role assignment for the service principal
Create a new role assignment for the service principal using the az role assignment create command.
az role assignment create --role "Contributor" \ --assignee $PRINCIPAL_ID \ --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourcegroups/$AZURE_RESOURCE_GROUP"
Establish a federated identity credential
Create the federated identity credential between the managed identity, AKS OIDC issuer, and subject using the az identity federated-credential create command.
az identity federated-credential create --name "kaito-federated-identity" \ --identity-name $KAITO_IDENTITY_NAME \ -g $MC_RESOURCE_GROUP \ --issuer $AKS_OIDC_ISSUER \ --subject system:serviceaccount:"kube-system:kaito-gpu-provisioner" \ --audience api://AzureADTokenExchange
Note
Before this step is complete, the
gpu-provisioner
controller pod will remain in a crash loop status. Once the federated credential is created, thegpu-provisioner
controller pod will reach a running state and you will be able to verify that the deployment is running in the following steps.
Verify that your deployment is running
Restart the KAITO GPU provisioner deployment on your pods using the
kubectl rollout restart
command:kubectl rollout restart deployment/kaito-gpu-provisioner -n kube-system
Verify that the GPU provisioner deployment is running using the
kubectl get
command:kubectl get deployment -n kube-system | grep kaito
Deploy a default hosted AI model
Deploy the Falcon 7B-instruct model preset from the KAITO model repository using the
kubectl apply
command.kubectl apply -f https://raw.githubusercontent.com/Azure/kaito/main/examples/inference/kaito_workspace_falcon_7b-instruct.yaml
Track the live resource changes in your workspace using the
kubectl get
command.kubectl get workspace workspace-falcon-7b-instruct -w
Note
As you track the KAITO workspace deployment, note that machine readiness can take up to 10 minutes, and workspace readiness up to 20 minutes depending on the size of your model.
Check your inference service and get the service IP address using the
kubectl get svc
command.export SERVICE_IP=$(kubectl get svc workspace-falcon-7b-instruct -o jsonpath='{.spec.clusterIP}')
Test the Falcon 7B-instruct inference service with a sample input of your choice using the OpenAI chat completions API format:
kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://$SERVICE_IP/v1/completions -H "Content-Type: application/json" \ -d '{ "model": "falcon-7b-instruct", "prompt": "What is Kubernetes?", "max_tokens": 10 }'
Clean up resources
If you no longer need these resources, you can delete them to avoid incurring extra Azure compute charges.
Delete the KAITO workspace using the
kubectl delete workspace
command.kubectl delete workspace workspace-falcon-7b-instruct
You need to manually delete the GPU node pools provisioned by the KAITO deployment. Use the node label created by Falcon-7b instruct workspace to get the node pool name using the
az aks nodepool list
command. In this example, the node label is "kaito.sh/workspace": "workspace-falcon-7b-instruct".az aks nodepool list --resource-group $AZURE_RESOURCE_GROUP --cluster-name $CLUSTER_NAME
Delete the node pool with this name from your AKS cluster and repeat the steps in this section for each KAITO workspace that will be removed.
Common troubleshooting scenarios
After applying the KAITO model inference workspace, your resource readiness and workspace conditions might not update to True
for the following reasons:
You might not have sufficient permissions to operate on the AKS cluster. Ensure that the
ai-toolchain-operator-$CLUSTER_NAME
identity has been assignedContributor
role to your Azure resource group. Run theaz role assignment list
command and confirm that the result is non-empty:az role assignment list --scope /subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AZURE_RESOURCE_GROUP
Your Azure subscription doesn't have quota for the minimum GPU instance type specified in your KAITO workspace. You'll need to request a quota increase for the GPU VM family in your Azure subscription.
The GPU instance type isn't available in your AKS region. Confirm the GPU instance availability in your specific region and switch the Azure region if your GPU VM family isn't available.
Next steps
Learn more about KAITO model deployment options below:
- Fine tune a model with the AI toolchain operator add-on on AKS.
- Learn about MLOps best practices for your AI pipelines on AKS
- Onboard a custom model for KAITO inference on AKS.
Azure Kubernetes Service