Azure AI Foundry - Vector Index creation failing when building on 2 pdf documents

Question

Azure AI Foundry - Vector Index creation failing when building on 2 pdf documents

Sunil Nagireddy 0

Azure AI Foundry - Vector Index creation failing:

I was trying to create Vector Index on source having 2 pdf files consisting of text and tables. After running for 30 min it is failing with error.

Cracking and chunking - Data ingestion failed.

But when i created for only one pdf document it is creating successfully in 8 min.

One pdf document is of size 170 kb and another one 240 kb.

Log file is having error message. Attached log file as well.

RuntimeError: Failed to embed 16 documents after 600s and 9 retries. Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current AIServices S0 pricing tier. Please retry after 60 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit. For Free Account customers, upgrade to Pay as you Go here: https://aka.ms/429TrialUpgrade.'}}

Requirement is to build vector index on many pdf documents, but with 2 documents we are not able build index. Please advise how to resolve this issue.

Manas Mohanty 3,210 Reputation points Microsoft External Staff

2025-05-01T06:29:40.3+00:00

Hello Sunil Nagireddy,

Hope Suwarna S Kale's pointer using increasing TPM from model deployment, retry logic provided, Pré-chunking etc. provided more clarity on resolving the issue.

Just wanted to clarify one point on 429 rate limit errors.

They happen when your average Token per minute quota usage cross in certain moment of time.

Azure OpenAI resources are provided with S0 tiers only.You can increase TPM (token per minute) from deployment+endpoint tab of foundry page for respective embedding model and use exponential back up re-try to get pass rate limits Sample code for Exponential backup retry Thank you.
Manas Mohanty 3,210 Reputation points Microsoft External Staff

2025-05-02T09:07:26.56+00:00

Hello Sunil Nagireddy,

We are checking to see if the pointers shared below helped address your issue.

Thank you.

1 answer

Your answer

Manas Mohanty 3,210 Reputation points Microsoft External Staff

2025-05-01T06:29:40.3+00:00

Hello Sunil Nagireddy,

Hope Suwarna S Kale's pointer using increasing TPM from model deployment, retry logic provided, Pré-chunking etc. provided more clarity on resolving the issue.

Just wanted to clarify one point on 429 rate limit errors.

They happen when your average Token per minute quota usage cross in certain moment of time.

Azure OpenAI resources are provided with S0 tiers only.You can increase TPM (token per minute) from deployment+endpoint tab of foundry page for respective embedding model and use exponential back up re-try to get pass rate limits Sample code for Exponential backup retry Thank you.
Manas Mohanty 3,210 Reputation points Microsoft External Staff

2025-05-02T09:07:26.56+00:00

Hello Sunil Nagireddy,

We are checking to see if the pointers shared below helped address your issue.

Thank you.

Answer 1

Hello Sunil Nagireddy,

Thank you for posting your question in the Microsoft Q&A forum.

The error indicates your Azure OpenAI embeddings API is hitting rate limits (HTTP 429) when processing multiple PDFs, despite their small size. This occurs because the default S0 pricing tier has strict call rate restrictions.

To resolve this, please increase your TPM on model deployment via the deployment/endpoint Azure portal to increase your TPM (token per minute) quota.

Alternatively, implement client-side retry logic with exponential backoff in your indexing script to handle transient throttling gracefully.

For batch processing, pre-chunk documents into smaller segments before sending them to the embeddings API, reducing concurrent requests. Additionally, verify your text extraction method, tables in PDFs may generate excessive chunks, triggering rate limits. If possible, preprocess tables into structured text before embedding.

For production-scale indexing, consider asynchronous processing with queue-based workflows (e.g., Azure Functions + Storage Queues) to distribute load. Monitor usage via Azure OpenAI’s metrics blade to align capacity with demand. If these steps fail, request a quota increase via the linked form in the error message. Structured batching and tier upgrades typically resolve such throughput issues.

If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.

Share via

Azure AI Foundry - Vector Index creation failing when building on 2 pdf documents

1 answer

Your answer