Azure AI Search Indexer Timeout Issue with PDF Document

Question

Azure AI Search Indexer Timeout Issue with PDF Document

Su Myat Hlaing 160

I'm experiencing persistent indexer timeout issues with my Azure AI Search setup. Here's my workflow:

Node server uploads files to Azure Blob Storage
Azure Function triggers automatically on blob upload to index the document
Function executes successfully (confirmed in logs, ~1056ms duration)
However, the indexer consistently times out

Despite resetting and manually running the indexer multiple times, it continues to timeout. The document in question is a PDF file, and I cannot search with this file in Azure AI Search at all. Interestingly, this setup was working properly on previous days, so I'm not sure if the timeout is the actual problem or if something else is happening.

Any troubleshooting suggestions or recommended configuration adjustments would be greatly appreciated.

Thank you!
User's image

User's image

Bhargavi Naragani 3,165 Reputation points Microsoft External Staff

2025-04-24T07:10:43.26+00:00

Hi @Su Myat Hlaing ,

It seems like you're encountering a timeout issue with your Azure AI Search indexer when processing a PDF document. Here are some troubleshooting suggestions based on common issues related to indexer timeouts:

Ensure that your PDF document does not exceed the maximum size limit for document extraction, which is 134,217,728 bytes for your current service tier. If it does, the indexer will not be able to process it.

Verify that the content type of the PDF is supported. Unsupported content types will cause the indexer to skip the document.

Since you mentioned that this setup was working previously, it could be a transient issue. Consider waiting and trying to run the indexer again later. Setting your indexer to run on a schedule may help mitigate these transient errors.

If you're using a custom skill for processing the document, check if it is returning results consistently and ensure that it is not stuck in an infinite loop. The default timeout for a custom skill is 30 seconds, but you can increase this up to a maximum of 230 seconds if necessary.

If your custom skill processes multiple documents at once, consider reducing the batch size to ensure that it can execute within the timeout limits. Use the Azure portal or the REST API to check the indexer's status and see if there are any specific errors being reported.

Ensure that field mappings or change tracking values are not causing the document to be skipped. If the document was updated after the indexer ran, it may not appear in the search index until the next scheduled run.

Implementing these suggestions should help you identify and resolve the timeout issue with your indexer.

Error: Skill did not execute within the time limit
Missing documents
Transient errors

Hope this information is helpful, let me know if you have any further queries.

Su Myat Hlaing 160

Hi @Bhargavi Naragani, Thank you for your detailed response!

To clarify some points:

Regarding file size: I want to confirm - is that 134,217,728 bytes limit for the total size of all files in my blob storage that are indexed by AI search? The PDF I'm trying to index is only around 1000KB (~1MB), which seems well below the limit you mentioned. So I don't think individual file size is the issue.

Content type: PDFs have worked fine previously and are still working for files that were indexed on previous days. This particular issue seems to be only with new files.

Regarding your mention of custom skills: I am using skills in my setup. Here's my current skillset configuration:

SplitSkill: Splitting document content into pages (5000 char max length)
- KeyPhraseExtractionSkill: Extracting key phrases from each page
  - OcrSkill: Extracting text from images in the document
    - MergeSkill: Merging OCR text with document content
    About timeout and batch size: Where exactly would I need to increase the timeout from the default 30 seconds to a maximum of 230 seconds? Is this a setting in the indexer configuration or somewhere else? Also, how do I reduce the batch size for my custom skills to ensure they execute within timeout limits? Regarding field mappings: I'm confident this isn't the issue because I'm testing with the exact same document that worked well previously.

One thing I should mention is that I uploaded many files to blob storage at once through an automated crawling process. Could this have caused some kind of queue or processing issue? I've deleted some of these files, but since the retention period is 6 days, they won't be completely removed until then - could this be affecting the indexer?

Would you recommend any adjustments to my skillset configuration or other settings that might help resolve the timeout issue?

Thanks again for your help!

{
  "@odata.etag": "\"0x8DD7D45025A4C72\"",
  "name": "skillset1719988983064",
  "description": "",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "#1",
      "context": "/document/content",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 5000,
      "pageOverlapLength": 0,
      "maximumPagesToTake": 0,
      "unit": "characters",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
      "name": "#2",
      "context": "/document/content/pages/*",
      "defaultLanguageCode": "en",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content/pages/*",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "keyPhrases",
          "targetName": "keyphrases"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "name": "#3",
      "description": "Extracts text from images",
      "context": "/document/normalized_images/*",
      "textExtractionAlgorithm": "Printed",
      "lineEnding": "Space",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "ocr_text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "name": "#4",
      "description": "Merge OCR text with document content",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content",
          "inputs": []
        },
        {
          "name": "itemsToInsert",
          "source": "/document/normalized_images/*/text",
          "inputs": []
        },
        {
          "name": "offsets",
          "source": "/document/normalized_images/*/contentOffset",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "mergedText",
          "targetName": "merged_text"
        }
      ]
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.DefaultCognitiveServices"
  }
}

1 answer

Your answer

Bhargavi Naragani 3,165 Reputation points Microsoft External Staff

2025-04-24T07:10:43.26+00:00

Hi @Su Myat Hlaing ,

It seems like you're encountering a timeout issue with your Azure AI Search indexer when processing a PDF document. Here are some troubleshooting suggestions based on common issues related to indexer timeouts:

Ensure that your PDF document does not exceed the maximum size limit for document extraction, which is 134,217,728 bytes for your current service tier. If it does, the indexer will not be able to process it.

Verify that the content type of the PDF is supported. Unsupported content types will cause the indexer to skip the document.

Since you mentioned that this setup was working previously, it could be a transient issue. Consider waiting and trying to run the indexer again later. Setting your indexer to run on a schedule may help mitigate these transient errors.

If you're using a custom skill for processing the document, check if it is returning results consistently and ensure that it is not stuck in an infinite loop. The default timeout for a custom skill is 30 seconds, but you can increase this up to a maximum of 230 seconds if necessary.

If your custom skill processes multiple documents at once, consider reducing the batch size to ensure that it can execute within the timeout limits. Use the Azure portal or the REST API to check the indexer's status and see if there are any specific errors being reported.

Ensure that field mappings or change tracking values are not causing the document to be skipped. If the document was updated after the indexer ran, it may not appear in the search index until the next scheduled run.

Implementing these suggestions should help you identify and resolve the timeout issue with your indexer.

Error: Skill did not execute within the time limit
Missing documents
Transient errors

Hope this information is helpful, let me know if you have any further queries.

Answer 1

@Su Myat Hlaing

From your error message "Skill did not execute within the time limit", the problem is almost certainly happening within one of the skills, likely:

OcrSkill (which can be slow on image-heavy PDFs) or MergeSkill (which depends on large inputs and potentially offset alignment).

This timeout isn't about the total blob size or indexer schedule; it's due to the skill pipeline’s execution duration per document.

You can raise Timeout for Your Custom Skills to a max of 230 seconds by updating the skillset definition using the REST API with the timeout property inside the skill. Custom skill timeout

Example:

"timeout": "PT180S" // ISO 8601 format: 3 minutes

You’d add this to each skill where needed (particularly OcrSkill and MergeSkill).

The indexer processes documents in batches. If multiple large or complex documents are processed at once, the cognitive skills pipeline may timeout. You can reduce the batch size like this:

"parameters": {
  "batchSize": 1
}

This slows down processing a bit but improves reliability. Indexer parameters

Although your SplitSkill handles text splitting, you might consider preprocessing large PDFs (especially scanned ones or image-heavy files) before upload, splitting them by page or section to reduce indexing load.

Bulk uploading many files at once may have created a processing queue internally. Even though deleted blobs don’t get indexed again, their metadata might still exist depending on the change detection policy. To avoid this, make sure change tracking is properly configured (or disabled if not needed) and temporarily pause uploads and allow the indexer to catch up.
Change detection policies

Hope this helps, let me know if you have any further queries.

Share via

Azure AI Search Indexer Timeout Issue with PDF Document

1 answer

Your answer