Document to Markdown

Convert any document or image to clean markdown with a single API call. Send one file and receive the markdown output — no schema, no extraction, just clean text ready for LLM pipelines or downstream processing.

Key Features

Multi-Format Support — Convert PDF, DOCX, XLSX, images, HTML, Markdown, CSV, JSON, plain text, and public website URLs.
Built-In OCR — Scanned PDFs and image files are processed through OCR automatically. No separate step required.
Image Descriptions — For image files and useful referenced images, OCR text and a plain-language visual description are included as structured image context.
Website Image Context — Public website URL inputs include useful referenced images in the markdown context while skipping logos, tracking pixels, decorative graphics, and semantically unhelpful images.
LLM-Grade Output — The markdown format is the same used internally by the Document Extraction API. Tables, structure, numbered link references, and layout are preserved for reliable LLM consumption.

Overview

The Document to Markdown API converts a document or public website page to clean markdown. You send one file or website URL (base64 or URL) and receive a JSON object with the result.

Endpoint: POST /document-to-markdown/v1/convert

Limits:

Max file size: 50 MB

Supported File Formats

Documents: PDF, DOCX, PPTX, ODT, EPUB, RTF
Spreadsheets: XLSX, XLS, ODS, CSV, TSV
Email: EML, MSG (headers, body, and attachment extraction)
Notebooks: Jupyter (.ipynb)
Academic & Publishing: LaTeX (.tex, .latex), BibTeX (.bib), Typst (.typst, .typ)
Markup & Text: HTML, Markdown, JSON, XML, YAML, TOML, RST, Org, Djot, MDX, TXT
Images: PNG, JPEG, GIF, WebP, AVIF, HEIF, BMP, TIFF, JP2, PNM/PBM/PGM/PPM, SVG

How It Works

Every conversion runs the same ingestion pipeline used by Document Extraction:

Parse — the file format is detected and validated.
Ingest — the file is converted to markdown using the appropriate processor:
- PDF — pages are rendered to images and run through OCR.
- Images (PNG, JPEG, GIF, WebP, AVIF, HEIF, BMP, TIFF, JP2, PNM) — OCR extracts text and a vision model generates a description of the visual content.
- Office documents (DOCX, PPTX, ODT, ODS, XLSX/XLS) — content is extracted and normalized to markdown with formatting, tables, lists, and footnotes preserved.
- EPUB — chapters are extracted and converted to markdown via the HTML pipeline.
- LaTeX — converted to markdown: headings, formatting, lists, tables, math equations, and code blocks.
- Jupyter Notebooks — code and markdown cells are extracted with outputs.
- RTF — converted to markdown with bold, italic, strikethrough, Unicode, and special characters.
- Email (EML, MSG) — headers and body are parsed into structured markdown. Attachments are extracted, ingested through the pipeline, and returned separately.
- CSV/TSV — converted to markdown tables with auto-detected delimiters.
- HTML and website pages — converted through the HTML ingestion path before markdown output, with navigation/footer boilerplate removed, links rendered as numbered references, and useful image references annotated with OCR and visual description context.
- Text and markup formats (Markdown, JSON, XML, YAML, TOML, RST, Org, Djot, MDX, BibTeX, Typst, TXT) — returned as-is for direct LLM consumption.
Return — the result is returned as a JSON object.

There is no LLM extraction step. The API stops after ingestion.

How Nested Files Work

Some file formats contain other files — emails have attachments, archives have entries. When the API encounters a container format, it extracts the nested files and ingests each one through the same pipeline.

Currently supported containers: EML and MSG (email attachments).

The response includes:

markdown — the container’s own content (email headers and body), plus an “Attachments” section listing filenames
nested_files — an array of ingested nested files, each with name and markdown (and description for image files)

Each nested file is billed as its own document. An email with a 3-page PDF attachment costs 1 credit (email) + 3 credits (PDF pages) = 4 credits total.

For Document Extraction, nested file markdown is appended to the container’s markdown so the LLM sees the full content — email body and all attachments — as one combined context.

Request Format

BashTypeScriptPythonGo

Request

curl -X POST \
  https://api.iterationlayer.com/document-to-markdown/v1/convert \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file": {
      "type": "base64",
      "name": "invoice.pdf",
      "base64": "<base64-encoded-file>"
    }
  }'

Response

{
  "success": true,
  "data": {
    "name": "invoice.pdf",
    "mime_type": "application/pdf",
    "markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
  }
}

Request

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const result = await client.convertDocumentToMarkdown({
  file: {
    type: "base64",
    name: "invoice.pdf",
    base64: new Uint8Array([/* file bytes */]),
  },
});

Response

{
  "success": true,
  "data": {
    "name": "invoice.pdf",
    "mime_type": "application/pdf",
    "markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
  }
}

Request

from iterationlayer import IterationLayer
client = IterationLayer(api_key="YOUR_API_KEY")

result = client.convert_document_to_markdown(
    file={
        "type": "base64",
        "name": "invoice.pdf",
        "base64": b"...",
    }
)

Response

{
  "success": true,
  "data": {
    "name": "invoice.pdf",
    "mime_type": "application/pdf",
    "markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
  }
}

Request

import il "github.com/iterationlayer/sdk-go"
client := il.NewClient("YOUR_API_KEY")

result, err := client.ConvertDocumentToMarkdown(il.ConvertDocumentToMarkdownRequest{
    File: il.FileInput{Type: "base64", Name: "invoice.pdf", Base64: []byte{ /* file bytes */ }},
})

Response

{
  "success": true,
  "data": {
    "name": "invoice.pdf",
    "mime_type": "application/pdf",
    "markdown": "# Invoice\n\n**Invoice Number:** INV-2024-0042\n\n**Date:** 2024-03-15\n\n| Description | Qty | Unit Price | Total |\n|---|---|---|---|\n| Consulting | 10h | $100.00 | $1,000.00 |\n| Support | 5h | $80.00 | $400.00 |\n\n**Total: $1,400.00**"
  }
}

Request Parameters

Parameter	Type	Required	Description
`file`	`FileInput`	Yes	The file to convert.
`webhook_url`	`string`	No	HTTPS URL to receive results asynchronously. If provided, returns 201 immediately. See Webhooks.

Async Mode

Add a webhook_url parameter to process the request in the background. The API returns 201 Accepted immediately and delivers the result to your webhook URL when processing completes. See Webhooks for payload format and retry behavior.

FileInput

The file is either a base64-encoded binary or a URL reference.

Parameter	Type	Required	Description
`type`	`"base64"` \| `"url"`	Yes	Input method.
`name`	`string`	Required for `base64`, optional for `url`	File name including extension. URL inputs without a name may be resolved as website pages.
`base64`	`string`	When `type = "base64"`	Base64-encoded file content.
`url`	`string`	When `type = "url"`	Public HTTPS URL to fetch the file from. HTTP is not accepted.
`fetch_options`	`object`	No	Website retrieval options for public URL inputs.

For websites, the API always checks the site’s robots.txt before fetching the page. If the URL is disallowed for the effective User-Agent, the request fails with 400 Bad Request and the page is not fetched. Sitemap entries in robots.txt are recorded as metadata only; they are not traversed.

Website URL fetches are rate-limited per destination host. The service automatically respects robots.txt crawl-delay hints, slows down after upstream 429 Too Many Requests responses, and uses upstream Retry-After or rate-limit reset headers when provided. This protects public sites and may delay later requests to the same host; it is not configurable through fetch_options.

If a website URL fetch looks like a WAF or security challenge, the API automatically retries the same URL in an isolated Chromium browser session. Detection covers common Cloudflare, Akamai, AWS WAF/CloudFront, Imperva/Incapsula, DataDome, PerimeterX, Sucuri, F5/BIG-IP, hCaptcha, reCAPTCHA, and generic verification pages. Successful browser fallback is recorded under metadata.security; failed fallback returns 400 Bad Request with a clear security-challenge message.

Fetch Options

fetch_options lets you control retrieval behavior for a URL input treated as a website page. Robots compliance and per-host rate limiting are mandatory and are not configurable.

Field	Type	Required	Description
`locale`	string	No	BCP 47 locale tag sent as `Accept-Language` (e.g., `"en-US"`, `"de-DE"`, `"fr"`).
`user_agent`	string	No	Custom `User-Agent` header string (1–500 characters).
`auth`	object	No	Authentication for website URL inputs. Supports bearer tokens, HTTP Basic auth, and a custom auth header. Secret values are not returned in metadata.
`headers`	object	No	Additional request headers for website URL inputs. Header names and values are validated; unsafe headers such as `Cookie`, `Set-Cookie`, `Host`, `Content-Length`, hop-by-hop headers, and browser-controlled `Sec-*` headers are rejected.
`timeout_ms`	integer	No	Website fetch timeout in milliseconds. Must be between `1000` and `60000`.
`should_render_javascript`	boolean	No	Use Chromium browser rendering before conversion. Default: `false`.

Custom Headers

Custom headers are sent on direct fetches and, when JavaScript rendering is enabled or security-challenge fallback is needed, as Chromium request headers where supported by the browser. Each Chromium fetch uses an isolated browser context that is disposed after the request; cookies and session state are not persisted across URLs or requests.

Fetch options control retrieval behavior, not billing.

Auth Examples

Use one auth shape per request.

Code

{ "auth": { "type": "bearer", "token": "..." } }
{ "auth": { "type": "basic", "username": "...", "password": "..." } }
{ "auth": { "type": "custom_header", "name": "x-api-key", "value": "..." } }

Website URL Example

To convert a public website page to markdown, send a URL input without a filename. Add fetch_options only when the page needs explicit retrieval controls.

Code

{
  "file": {
    "type": "url",
    "url": "https://example.com/docs/api-reference",
    "fetch_options": {
      "should_render_javascript": true
    }
  }
}

Response Format

The response is a JSON object with the conversion result.

Field	Type	Description
`name`	`string`	File name from the request.
`mime_type`	`string`	Detected MIME type of the file.
`markdown`	`string`	Extracted markdown content. HTML and website inputs include numbered link references when links are present. Useful image references may be followed by structured `<image>` context blocks. Empty string if no text was found.
`description`	`string`	Plain-language description of the image content. Present only for image files (PNG, JPEG, GIF, WebP).

Image Files

For image files, the response includes both markdown (OCR output) and description (vision model output). The description field describes what the image depicts — suitable for use as alt text, for downstream search indexing, or as context in LLM prompts.

Code

{
  "name": "product-photo.png",
  "mime_type": "image/png",
  "markdown": "Sale — 30% off all items",
  "description": "A product photograph of a white ceramic mug on a wooden table. The mug has a minimalist design with no text or logo. Natural lighting from the left."
}

Image Context in Markdown

When a document, HTML file, website page, or container attachment includes a useful image, the markdown preserves the image reference and appends a structured context block directly after it. The block links the OCR text and visual description to the image URL used in the markdown.

Code

![](/assets/product-photo.png)
<image>
<url>/assets/product-photo.png</url>
<description>
A product photograph of a white ceramic mug on a wooden table. The mug has a minimalist design with no text or logo. Natural lighting from the left.
</description>
<contents>
Sale — 30% off all items
</contents>
</image>

If only OCR text or only a visual description is available, the unavailable section is omitted. If neither is available, only the markdown image reference is returned.

Recipes

For complete, runnable examples see the Recipes page.

Convert Invoice to Markdown — Convert a PDF invoice to structured markdown.
Convert Contract to Markdown — Extract contract text and clauses as clean markdown.
Convert Resume to Markdown — Convert a resume PDF to structured markdown for downstream processing.

Error Responses

Status	Description
400	Invalid request (missing or invalid `file` parameter)
401	Missing or invalid API key
422	Processing error (file could not be parsed or ingested)
429	Rate limit exceeded

Ingest

Generate

Integrations

Built for

By product

By industry

Docs

Overview

APIs

Workflows

SDKs

Agent Tools

Agent Frameworks

Chat UIs

API Reference

Billing

Trust & Compliance

Benchmarks

Blog

More

Key Features

Overview

Supported File Formats

How It Works

How Nested Files Work

Request Format

Request Parameters

Async Mode

FileInput

Fetch Options

Custom Headers

Auth Examples

Website URL Example

Response Format

Image Files

Image Context in Markdown

Recipes

Error Responses