Getting error after deployed a model in azure ai foundry

Boddu Madan Gopal 0 Reputation points
2025-05-11T13:07:48.95+00:00

I have fine-tuned a model using Azure AI Foundry and deployed it, and then when I am trying to use the model I am getting a error

The request failed with status code: 429

Content-Length: 79

Content-Type: application/json

Request-Context: appId=cid-v1:b80100f5-1286-423d-967f-3517f665186d

Date: Sun, 11 May 2025 12:44:26 GMT

Connection: close

{"statusCode": 429, "message": "Rate Limit on Number of Requests is exceeded."}

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,419 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Prashanth Veeragoni 4,360 Reputation points Microsoft External Staff Moderator
    2025-05-12T00:42:07.7666667+00:00

    Hi Boddu Madan Gopal,

    You are getting error 429 Rate Limit on Number of Requests is exceeded.

    This means the number of requests to your deployed model exceeds the allowed rate limit configured for the Azure AI Foundry deployment or the underlying Azure OpenAI resource.

    Common Causes and Fixes:

    1.Too Many Requests in a Short Time

    ·   Cause: Your client or app is making requests too rapidly, breaching the Requests Per Minute (RPM) or Tokens Per Minute (TPM) limit.

    ·   Fix:

    o   Add a retry mechanism with exponential backoff in your client code.

    o   Limit the frequency of requests or add delay between them.

    2.Insufficient Quota on Azure OpenAI Resource

    ·   Cause: Your Azure subscription or region has a limited quota (e.g., 20 RPM / 40k TPM).

    ·    Fix:

    o   Check your quota in the Azure Portal: Go to Azure Portal > Azure OpenAI resource > Quotas.

    o   If needed, submit a quota increase request.

    3.Concurrency Limit

    ·   Cause: AI Foundry endpoints might restrict the number of concurrent requests.

    ·    Fix:

    o   Use a queue mechanism to throttle concurrent requests.

    o   Check deployment settings for concurrency limits.

    4.Incorrect Deployment Configuration in AI Foundry

    ·   Fix:

    o   Go to your model in Azure AI Foundry.

    o   Review Deployment Settings → Check Rate Limits, Auto-scaling, or SKU capacity.

    o   Scale up the deployment if you're hitting capacity limits.

    Add a retry loop with exponential backoff:

    import time
    import requests
    
    def call_model_with_retry(url, headers, payload, retries=5):
        for attempt in range(retries):
            response = requests.post(url, headers=headers, json=payload)
            if response.status_code == 429:
                wait_time = 2 ** attempt
                print(f"Rate limit hit. Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                return response
        raise Exception("Exceeded maximum retries due to rate limit.")
    

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    Thank you! 

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.