Getting error after deployed a model in azure ai foundry

Question

Getting error after deployed a model in azure ai foundry

Boddu Madan Gopal 0

I have fine-tuned a model using Azure AI Foundry and deployed it, and then when I am trying to use the model I am getting a error

The request failed with status code: 429

Content-Length: 79

Content-Type: application/json

Request-Context: appId=cid-v1:b80100f5-1286-423d-967f-3517f665186d

Date: Sun, 11 May 2025 12:44:26 GMT

Connection: close

{"statusCode": 429, "message": "Rate Limit on Number of Requests is exceeded."}

Suwarna S Kale 2,591 Reputation points

2025-05-11T17:46:06.8933333+00:00
Hello Boddu Madan Gopal,

Thank you for posting your question in the Microsoft Q&A forum.

The HTTP 429 "Rate Limit Exceeded" error when calling your fine-tuned Azure AI Foundry model indicates that your requests have surpassed the quota or throttling limits imposed on the deployed endpoint. This typically occurs due to:

Request Bursting: Sending too many calls in a short timeframe, triggering Azure’s rate-limiting policies.

Tier Restrictions: Free or low-tier SKUs (e.g., S1) enforce stricter limits compared to production-grade tiers.

Model-Specific Quotas: Fine-tuned models may have lower default thresholds than base models.

To resolve this:

Implement exponential backoff in your code to space out retries.

Upgrade your SKU (e.g., to S3) for higher limits via Azure Portal.

Monitor usage in Azure Metrics to identify spikes and adjust call patterns.

Request a quota increase via Azure Support if needed.

For immediate mitigation, reduce request frequency or batch inputs.

References:

https://learn.microsoft.com/en-us/azure/architecture/patterns/retry

https://learn.microsoft.com/en-us/azure/ai-services/agents/quotas-limits

If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.

1 answer

Your answer

Suwarna S Kale 2,591 Reputation points

2025-05-11T17:46:06.8933333+00:00

Hello Boddu Madan Gopal,

Thank you for posting your question in the Microsoft Q&A forum.

The HTTP 429 "Rate Limit Exceeded" error when calling your fine-tuned Azure AI Foundry model indicates that your requests have surpassed the quota or throttling limits imposed on the deployed endpoint. This typically occurs due to:

Request Bursting: Sending too many calls in a short timeframe, triggering Azure’s rate-limiting policies.

Tier Restrictions: Free or low-tier SKUs (e.g., S1) enforce stricter limits compared to production-grade tiers.

Model-Specific Quotas: Fine-tuned models may have lower default thresholds than base models.

To resolve this:

Implement exponential backoff in your code to space out retries.

Upgrade your SKU (e.g., to S3) for higher limits via Azure Portal.

Monitor usage in Azure Metrics to identify spikes and adjust call patterns.

Request a quota increase via Azure Support if needed.

For immediate mitigation, reduce request frequency or batch inputs.

References:

https://learn.microsoft.com/en-us/azure/architecture/patterns/retry

https://learn.microsoft.com/en-us/azure/ai-services/agents/quotas-limits

If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.

Answer 1

Hi Boddu Madan Gopal,

You are getting error 429 Rate Limit on Number of Requests is exceeded.

This means the number of requests to your deployed model exceeds the allowed rate limit configured for the Azure AI Foundry deployment or the underlying Azure OpenAI resource.

Common Causes and Fixes:

1.Too Many Requests in a Short Time

· Cause: Your client or app is making requests too rapidly, breaching the Requests Per Minute (RPM) or Tokens Per Minute (TPM) limit.

· Fix:

o Add a retry mechanism with exponential backoff in your client code.

o Limit the frequency of requests or add delay between them.

2.Insufficient Quota on Azure OpenAI Resource

· Cause: Your Azure subscription or region has a limited quota (e.g., 20 RPM / 40k TPM).

· Fix:

o Check your quota in the Azure Portal: Go to Azure Portal > Azure OpenAI resource > Quotas.

o If needed, submit a quota increase request.

3.Concurrency Limit

· Cause: AI Foundry endpoints might restrict the number of concurrent requests.

· Fix:

o Use a queue mechanism to throttle concurrent requests.

o Check deployment settings for concurrency limits.

4.Incorrect Deployment Configuration in AI Foundry

· Fix:

o Go to your model in Azure AI Foundry.

o Review Deployment Settings → Check Rate Limits, Auto-scaling, or SKU capacity.

o Scale up the deployment if you're hitting capacity limits.

Add a retry loop with exponential backoff:

import time
import requests

def call_model_with_retry(url, headers, payload, retries=5):
    for attempt in range(retries):
        response = requests.post(url, headers=headers, json=payload)
        if response.status_code == 429:
            wait_time = 2 ** attempt
            print(f"Rate limit hit. Retrying in {wait_time} seconds...")
            time.sleep(wait_time)
        else:
            return response
    raise Exception("Exceeded maximum retries due to rate limit.")

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank you!

Share via

Getting error after deployed a model in azure ai foundry

1 answer

Your answer