Azure AI Foundry - Vector Index creation failing when building on 2 pdf documents

Sunil Nagireddy 0 Reputation points
2025-04-30T16:44:28.9766667+00:00

Azure AI Foundry - Vector Index creation failing:

I was trying to create Vector Index on source having 2 pdf files consisting of text and tables. After running for 30 min it is failing with error.

Cracking and chunking - Data ingestion failed.

But when i created for only one pdf document it is creating successfully in 8 min.

One pdf document is of size 170 kb and another one 240 kb.

Log file is having error message. Attached log file as well.

RuntimeError: Failed to embed 16 documents after 600s and 9 retries. Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current AIServices S0 pricing tier. Please retry after 60 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit. For Free Account customers, upgrade to Pay as you Go here: https://aka.ms/429TrialUpgrade.'}}

Requirement is to build vector index on many pdf documents, but with 2 documents we are not able build index. Please advise how to resolve this issue.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,950 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Suwarna S Kale 2,211 Reputation points
    2025-05-01T03:25:23.14+00:00

    Hello Sunil Nagireddy,

    Thank you for posting your question in the Microsoft Q&A forum. 

    The error indicates your Azure OpenAI embeddings API is hitting rate limits (HTTP 429) when processing multiple PDFs, despite their small size. This occurs because the default S0 pricing tier has strict call rate restrictions.

    To resolve this, please increase your TPM on model deployment via the deployment/endpoint Azure portal to increase your TPM (token per minute) quota.

    Alternatively, implement client-side retry logic with exponential backoff in your indexing script to handle transient throttling gracefully. 

    For batch processing, pre-chunk documents into smaller segments before sending them to the embeddings API, reducing concurrent requests. Additionally, verify your text extraction method, tables in PDFs may generate excessive chunks, triggering rate limits. If possible, preprocess tables into structured text before embedding. 

    For production-scale indexing, consider asynchronous processing with queue-based workflows (e.g., Azure Functions + Storage Queues) to distribute load. Monitor usage via Azure OpenAI’s metrics blade to align capacity with demand. If these steps fail, request a quota increase via the linked form in the error message. Structured batching and tier upgrades typically resolve such throughput issues. 

    If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated. 

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.