performance issues while using the Azure OpenAI API

Question

performance issues while using the Azure OpenAI API

Mahsa 15

We are experiencing performance issues while using the Azure OpenAI API, specifically with the gpt-4o deployment. Our application processes various document types (PDFs, images, text, and spreadsheets) and makes API calls for text and image analysis. However, we have noticed the following issues:

Slow Response Times: The API sometimes takes significantly longer to return a response, even for simple text queries.
Inconsistent Latency: Some requests are processed quickly, while others take much longer, even under similar load conditions.
Performance with PDFs & Images: Processing PDFs (converted to images) seems particularly slow. Are there any known optimizations for handling large files?
Caching Options: Is there a way to enable caching to improve response times for repeated queries?

We are using the gpt-4o model and calling the API via AzureOpenAI. Please advise if there are any optimizations, best practices, or recommended configurations to enhance performance.

SriLakshmi C 4,795 Reputation points Microsoft External Staff

2025-03-07T09:11:48.05+00:00

Hello Mahsa,

Did you get any chance to check the above response. Thank you!
SriLakshmi C 4,795 Reputation points Microsoft External Staff

2025-03-11T11:58:31.88+00:00

Hello Mahsa,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thank you!
KonProz Tech 0 Reputation points

2025-04-30T15:35:36.0166667+00:00

+1 to this issue

1 answer

Your answer

SriLakshmi C 4,795 Reputation points Microsoft External Staff

2025-03-07T09:11:48.05+00:00

Hello Mahsa,

Did you get any chance to check the above response. Thank you!
SriLakshmi C 4,795 Reputation points Microsoft External Staff

2025-03-11T11:58:31.88+00:00

Hello Mahsa,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thank you!
KonProz Tech 0 Reputation points

2025-04-30T15:35:36.0166667+00:00

+1 to this issue

Answer 1

Hello Mahsa,

Greetings and Welcome to Microsoft Q&A!

I understand that you are facing performance issues with the Azure OpenAI API, particularly with the gpt-4o model. Here are some best practice recommendations to consider.

Make sure you are using the latest and most optimized model for your needs. For lower latency, the GPT-4o Mini model is recommended. If you haven’t already, consider switching to this model to improve performance.
Reducing the max_tokens parameter can significantly improve response time. The fewer tokens the model generates, the faster the response. To enhance performance, set this parameter to the smallest value necessary for your requests.

Enable streaming by setting stream: true in your requests. This allows tokens to be returned as they are generated, enhancing the perceived response time for end-users.

If you need to make multiple requests, try batching them into a single call. This can help reduce the number of requests and improve overall response times.

When processing large files like PDFs, optimize your ingestion process to minimize latency. If converting PDFs to images, evaluate whether the conversion is causing delays. Consider adjusting the chunk size or using optimized methods for handling tables and structured data when applicable.

Combining different workloads on the same endpoint can impact latency. To enhance performance, consider separating deployments based on workload types whenever possible.

Azure OpenAI does not currently offer built-in caching for API calls. However, you can implement a custom caching mechanism at the application level to store and retrieve responses efficiently for repeated queries.

Kindly refer this Improve performance.

I hope you understand. And, if you have any further query do let us know.

Thank you!

Share via

performance issues while using the Azure OpenAI API

1 answer

Your answer