You are getting error 429 Rate Limit on Number of Requests is exceeded.
This means the number of requests to your deployed model exceeds the allowed rate limit configured for the Azure AI Foundry deployment or the underlying Azure OpenAI resource.
Common Causes and Fixes:
1.Too Many Requests in a Short Time
· Cause: Your client or app is making requests too rapidly, breaching the Requests Per Minute (RPM) or Tokens Per Minute (TPM) limit.
· Fix:
o Add a retry mechanism with exponential backoff in your client code.
o Limit the frequency of requests or add delay between them.
2.Insufficient Quota on Azure OpenAI Resource
· Cause: Your Azure subscription or region has a limited quota (e.g., 20 RPM / 40k TPM).
· Fix:
o Check your quota in the Azure Portal: Go to Azure Portal > Azure OpenAI resource > Quotas.
o If needed, submit a quota increase request.
3.Concurrency Limit
· Cause: AI Foundry endpoints might restrict the number of concurrent requests.
· Fix:
o Use a queue mechanism to throttle concurrent requests.
o Check deployment settings for concurrency limits.
4.Incorrect Deployment Configuration in AI Foundry
· Fix:
o Go to your model in Azure AI Foundry.
o Review Deployment Settings → Check Rate Limits, Auto-scaling, or SKU capacity.
o Scale up the deployment if you're hitting capacity limits.
Add a retry loop with exponential backoff:
import time
import requests
def call_model_with_retry(url, headers, payload, retries=5):
for attempt in range(retries):
response = requests.post(url, headers=headers, json=payload)
if response.status_code == 429:
wait_time = 2 ** attempt
print(f"Rate limit hit. Retrying in {wait_time} seconds...")
time.sleep(wait_time)
else:
return response
raise Exception("Exceeded maximum retries due to rate limit.")
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.
Thank you!