Azure AI Language Retry policy: Reactive or Proactive

Question

Azure AI Language Retry policy: Reactive or Proactive

Umer Rashid 150

Hello,

As I understand, Azure AI Language can deal with short outbursts.

I am wondering if I should enforce retry policy whenever API rate limit is reached or let Azure handle short outbursts and enforce retry policy with exponential backoff only after receiving HTTP 429 error.

Please let me know what is the recommended approach to handle rate limits.

Regards,

Umer

Manas Mohanty 3,545 Reputation points Microsoft External Staff

2025-04-17T14:57:37.8933333+00:00

Hi Umer Rashid

If we think from Realtime application perspective (when it is in production or receiving high traffic), we should enforce our custom retry (exponentiation backup retry preferably) instead of relying on default retry policy to handle rate limits.

Default retries are limited to 3 times using an exponential retry strategy with an initial delay of 0.8 sec, and a max delay of 1 minute and might handle intermittent outbursts.

429 rate limits will occur whenever you cross your RPM limit

if you're in the S tier and send 1000 requests at once, you wouldn't be able to send another request for 59 seconds.

But it is better to use inputs in small sizes and batches under quota limits and adjust your exponential back up retry logic accordingly.

Additionally, you can use Asynchronous calls

Reference -

https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/Azure.Core/samples/Configuration.md

Rate limits

Hope it addresses your query.

Thank you.
Umer Rashid 150 Reputation points

2025-04-17T15:10:21.2233333+00:00

Hi @Manas Mohanty

Thanks for the prompt reply.

If I understand correctly, the recommended approach is to enforce exponential backoff as soon as we reach the rate limit (i.e. 1000 requests per minute in S tier), and not to wait for Azure to generate HTTP 429 error before enforcing our retry policy.

Please confirm if I understand it correctly.

Thanks,
Umer Rashid 150 Reputation points

2025-04-18T05:52:33.8866667+00:00

Hi @Manas Mohanty

Thanks for the explanation.

So it seems we should enforce our custom retry policy on receiving HTTP 429 error, not whenever we exceed rate limit (i.e. 1000 requests per minute in S tier), as Azure can handle short outbursts.
Manas Mohanty 3,545 Reputation points Microsoft External Staff

2025-04-18T06:44:58.9933333+00:00

Yes Umer Rashid

Custom policy only changes the behavior of default retry policy (how many times, delay, backoff time etc.). We can tune our code to 429 rate limits to wait out until next minute start.

We should use custom policy and not rely on default retry policy while in production environments.

Please accept the below answer if the pointers were useful to you.

Thank you.

Accepted answer

0 additional answers

Your answer

Manas Mohanty 3,545 Reputation points Microsoft External Staff

2025-04-17T14:57:37.8933333+00:00

Hi Umer Rashid

If we think from Realtime application perspective (when it is in production or receiving high traffic), we should enforce our custom retry (exponentiation backup retry preferably) instead of relying on default retry policy to handle rate limits.

Default retries are limited to 3 times using an exponential retry strategy with an initial delay of 0.8 sec, and a max delay of 1 minute and might handle intermittent outbursts.

429 rate limits will occur whenever you cross your RPM limit

if you're in the S tier and send 1000 requests at once, you wouldn't be able to send another request for 59 seconds.

But it is better to use inputs in small sizes and batches under quota limits and adjust your exponential back up retry logic accordingly.

Additionally, you can use Asynchronous calls

Reference -

https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/Azure.Core/samples/Configuration.md

Rate limits

Hope it addresses your query.

Thank you.
Umer Rashid 150 Reputation points

2025-04-17T15:10:21.2233333+00:00

Hi @Manas Mohanty

Thanks for the prompt reply.

If I understand correctly, the recommended approach is to enforce exponential backoff as soon as we reach the rate limit (i.e. 1000 requests per minute in S tier), and not to wait for Azure to generate HTTP 429 error before enforcing our retry policy.

Please confirm if I understand it correctly.

Thanks,
Umer Rashid 150 Reputation points

2025-04-18T05:52:33.8866667+00:00

Hi @Manas Mohanty

Thanks for the explanation.

So it seems we should enforce our custom retry policy on receiving HTTP 429 error, not whenever we exceed rate limit (i.e. 1000 requests per minute in S tier), as Azure can handle short outbursts.
Manas Mohanty 3,545 Reputation points Microsoft External Staff

2025-04-18T06:44:58.9933333+00:00

Yes Umer Rashid

Custom policy only changes the behavior of default retry policy (how many times, delay, backoff time etc.). We can tune our code to 429 rate limits to wait out until next minute start.

We should use custom policy and not rely on default retry policy while in production environments.

Please accept the below answer if the pointers were useful to you.

Thank you.

Answer 1

Hi Umer Rashid

The default or custom retry policy will enable retry policy automatically if the client face 429 rate limits. But if you want to wait out the start of next minute based on 429 error we are receiving, please check out below code.

For e.g.

If you send 300 requests initially, followed by another 300 requests 5 seconds later, and then 400 requests 5 seconds after that, you will have sent a total of 1000 requests within a 10-second window, you have to 50 second as the error will suggest.

You can trigger client layered with custom retry after 50 second.

Assumed error response.

{'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 50 seconds.'}}"

Below code trims the wait time 50 out of above error message then use sleep time of 50 second then retry until maximum retry has been reached.

import time
from azure.core.pipeline.policies import RetryPolicy
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import HttpResponseError

# Define the retry policy
retry_policy = RetryPolicy(
    retry_total=5,  # Total number of retries
    retry_backoff_factor=0.8,  # Factor to calculate the backoff time
    retry_backoff_max=16,  # Maximum backoff time in seconds
    retry_mode='exponential'  # Use exponential backoff
)

# Create the Text Analytics client with the retry policy
client = TextAnalyticsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(api_key),
    retry_policy=retry_policy
)

def analyze_text(client, documents):
    for attempt in range(retry_policy.retry_total):
        try:
            response = client.analyze_sentiment(documents=documents)
            return response
        except HttpResponseError as e:
            if e.status_code == 429:
                error_message = e.response.json().get('error', {}).get('message', '')
                if 'Rate limit is exceeded' in error_message:
                    retry_after = int(error_message.split('in ').split(' seconds'))
                    print(f"Rate limit exceeded. Retrying after {retry_after} seconds.")
                    time.sleep(retry_after)
                else:
                    retry_after = int(e.response.headers.get("Retry-After", 1))
                    print(f"Rate limit exceeded. Retrying after {retry_after} seconds.")
                    time.sleep(retry_after)
            else:
                raise e

# Example usage
documents = ["Hello world!"]
response = analyze_text(client, documents)
print(response)

Hope it adds more clarity now

Thank you.

Share via

Azure AI Language Retry policy: Reactive or Proactive

0 additional answers

Your answer