Azure Function Host Crashes When Parsing PDF Documents with Docling Library

David Tschirky 1 Reputation point
2025-05-02T09:04:59.7766667+00:00

We've started using this python library docling to parse documents. Locally, that works perfectly. On Azure though, we experience the issue that parsing PDFs fails and makes the function host crash. This results in a downtime for the whole function, until it has restarted. Also, I can't find any helpful information as to why parsing PDFs makes the runtime crash. Parsing Word and Powerpoint documents works without any issues on Azure.

As part of the investigation, I've added Application Insights OpenTelemetry instrumentation to the app.

Insights gathered while trying to figure out the core issue:

  1. Only PDFs are affected. From a docling perspective, this is reasonable because the processing pipeline for PDFs is quite different than the one for Word and PowerPoint.
  2. In order to parse PDFs, docling is downloading appropriate AI models on demand. These downloads are successful, as can be observed in Application Insights.
  3. It looks like the failure happens in the moment when we attempt the document conversion (having collected the documents and the necessary AI models to do so).
  4. Once the function makes the function host crash (silently, i.e. no exception visible), one can observe the startup logs from a new instance of the same function. As part of these startup logs, the following exception shows up (usually 3 times):
    1. Transport endpoint is not connected : '/home/site/wwwroot/host.json'
    2. Because of that exception, the startup of the function immediately after the crash is not successful and it will take one or two tries to successfully spin up the function again.

Here are some of our environment details:

  • We're using Azure functions for Python v2
  • We're using python 3.11
  • The app currently runs on a consumption plan.
  • Dependencies
      azure-functions
      requests
      azure-identity
      
      # to fix (another) deployment issue on azure
      cryptography==43.0.3
      
      # Search
      azure-search-documents
      
      # Ai Service
      openai
      pydantic
      
      # OCR
      docling==2.31.0
      
      # graph ql client
      gql[all]
      
      # Monitoring
      azure-core-tracing-opentelemetry
      azure-monitor-opentelemetry
    
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,723 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.