Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure AI Document Intelligence Layout API can transform your documents into rich Markdown, preserving their original structure and formatting. Just specify outputContentFormat=markdown
in your request to receive semantically structured content that maintains paragraphs, headings, tables, and other document elements in their proper hierarchy.
This Markdown output elegantly captures the document's original organization while providing standardized, easily consumable content for downstream applications. The preserved semantic structure enables more sophisticated document processing workflows without losing the context and relationships between document elements.
Markdown elements supported in Layout Analysis
The following Markdown elements are included in Layout API responses:
- Paragraph
- Heading
- Table
- Figure
- Selection Mark
- Formula
- Barcode
- PageNumber/PageHeader/PageFooter
- PageBreak
- KeyValuePairs/Language/Style
- Spans and Content
Paragraph
Paragraphs represent cohesive blocks of text that belong together semantically. The Layout API maintains paragraph integrity by:
- Preserving paragraph boundaries with empty lines between separate paragraphs
- Using line breaks within paragraphs to maintain the visual structure of the original document
- Maintaining proper text flow that respects the original document's reading order
Here's an example:
This is paragraph 1.
This is still paragraph 1, even if in another Markdown line.
This is paragraph 2. There is a blank line between paragraph 1 and paragraph 2.
Heading
Headings organize document content into a hierarchical structure to make navigation and understanding easier. The Layout API has the following capabilities:
- Uses standard Markdown heading syntax with 1-6 hash symbols (#) corresponding to heading levels.
- Maintains proper spacing with two blank lines before each heading for improved readability.
Here's an example:
# This is a title
## This is heading 1
### This is heading 2
#### This is heading 3
Table
Tables preserve complex structured data in a visually organized format. The Layout API uses HTML table syntax for maximum fidelity and compatibility:
- Implements full HTML table markup (
<table>
,<tr>
,<th>
,<td>
) rather than standard Markdown tables - Preserves merged cell with HTML rowspan and colspan attributes.
- Preserves table captions with the
<caption>
tag to maintain document context - Handles complex table structures including headers, cells, and footers
- Maintains proper spacing with two blank lines before each table for improved readability
- Preserves table footnotes as separate paragraph following the table
Here's an example:
<table>
<caption>Table 1. This is a demo table</caption>
<tr><th>Header</th><th>Header</th></tr>
<tr><td>Cell</td><td>Cell</td></tr>
<tr><td>Cell</td><td>Cell</td></tr>
<tr><td>Cell</td><td>Cell</td></tr>
<tr><td>Footer</td><td>Footer</td></tr>
</table>
This is the footnote of the table.
Figure
The Layout API preserves figure elements:
- Encapsulates figure content in
<figure>
tags to maintain semantic distinction from surrounding text - Preserves figure captions with the
<figcaption>
tag to provide important context - Preserves figure footnotes as separate paragraphs following the figure container
Here's an example:
<figure>
<figcaption>Figure 2 This is a figure</figcaption>
Values
300
200
100
0
Jan Feb Mar Apr May Jun Months
</figure>
This is footnote if the figure have.
Selection Mark
Selection marks represent checkbox-like elements in forms and documents. The Layout API:
- Uses Unicode characters for visual clarity: ☒ (checked) and ☐ (unchecked)
- Filters out low-confidence checkbox detections (below 0.1 confidence) to improve reliability
- Maintains the semantic relationship between selection marks and their associated text
Formula
Mathematical formulas are preserved with LaTeX-compatible syntax that allows for rendering of complex mathematical expressions:
- Inline formulas are enclosed in single dollar signs (
$...$
) to maintain text flow - Block formulas use double dollar signs (
$$...$$
) for standalone display - Multi-line formulas are represented as consecutive block formulas, preserving mathematical relationships
- Original spacing and formatting are maintained to ensure accurate representation
Here's an example of inline formula, single-line formula block and multiple-lines formula block:
The mass-energy equivalence formula $E = m c ^ { 2 }$ is an example of an inline formula
$$\frac { n ! } { k ! \left( n - k \right) ! } = \binom { n } { k }$$
$$\frac { p _ { j } } { p _ { 1 } } = \prod _ { k = 1 } ^ { j - 1 } e ^ { - \beta _ { k , k + 1 } \Delta E _ { k , k + 1 } }$$
$$= \exp \left[ - \sum _ { k = 1 } ^ { j - 1 } \beta _ { k , k + 1 } \Delta E _ { k , k + 1 } \right] .$$
Barcode
Barcodes and QR codes are represented using Markdown image syntax with added semantic information:
- Uses standard image Markdown syntax with descriptive attributes
- Captures both the barcode type (QR code, barcode, etc.) and its encoded value
- Preserves the semantic relationship between barcodes and surrounding content
Here's an example:



PageNumber/PageHeader/PageFooter
Page metadata elements provide context about document pagination but aren't meant to be displayed inline with the main content:
- Enclosed in HTML comments to preserve the information while keeping it hidden from standard Markdown rendering
- Maintains original page structure information that might be valuable for document reconstruction
- Enables applications to understand document pagination without disrupting the content flow
Here's an example:
<!-- PageHeader="This is page header" -->
<!-- PageFooter="This is page footer" -->
<!-- PageNumber="1" -->
PageBreak
To easily figure out which parts belong to which page base on the pure Markdown content, we introduced PageBreak as the delimiter of the pages
Here's an example:
<!-- PageBreak -->
KeyValuePairs/Language/Style
For KeyValuePairs/Language/Style, we map them to Analytics JSON body and not in the Markdown content.
Note
For more information on Markdown that is currently supported for user content on GitHub.com, see GitHub Flavored Markdown Spec.
Conclusion
Document Intelligence's Markdown elements provide a powerful way to represent the structure and content of analyzed documents. By understanding and properly utilizing these Markdown elements, you can enhance your document processing workflows and build more sophisticated content extraction applications.
Next steps
Try processing your documents with Document Intelligence Studio.
Complete a Document Intelligence quickstart and get started creating a document processing app in the development language of your choice.