OCR Benchmark: Testing extraction accuracy on real-world documents
Scanned invoices, forms, receipts, tables, and charts feed into extraction, reporting, and automation workflows. How well the document converts to markdown decides everything downstream — garbage markdown in, garbage results out. This benchmark shows how the current pipeline performed on the document set we use for evaluation.
No credit card required — start with free trial credits
How we measured extraction quality
We ran 41 real workflow files — forms, invoices, scans, tables, charts, and photos — through each OCR pipeline, then had Gemini 2.5 Flash Lite judge every markdown output against the source image for text accuracy, layout, tables, and detail preservation.
41
Real workflow inputs across 27 document categories
8
Current API plus 7 reference models
Gemini 2.5 Flash Lite
Same prompt for every model
0.93
Second place, 0.01 behind the top model
Tested file categories
Results across all models
Same files, same judge, same prompt for every model. Iteration Layer OCR scored 0.93 — second place overall, 0.01 behind the best-scoring model in the suite.
Average score by OCR pipeline
Fixed 0.0 to 1.0 scale. Differences are visible without cropping the axis.
| Model | Avg score | Passed | Strength | Weaknesses |
|---|---|---|---|---|
|
Chandra-OCR-2
5B Q4 local run
| 0.94 | 41/41 | Best overall text and layout preservation in this suite. | Occasionally over-formats simple pages where plain markdown is easier to use. |
|
Iteration Layer OCR
Current Document to Markdown API
| 0.93 | 41/41 | Strong across mixed business documents, forms, scans, and tables. | Very dense diagrams and charts can still need a human check. |
|
Qwen3-VL-Instruct
Instruct Q4 local run
| 0.91 | 39/41 | Good general-purpose markdown on forms, receipts, and tables. | Missed several cases where exact detail preservation mattered. |
|
Gemma 4 A4B
A4B MoE with vision budget fix
| 0.89 | 34/41 | Good markdown structure on many document layouts. | Missed important details in several cases. |
|
MiniCPM-o 4.5
Q4 local run
| 0.88 | 39/41 | Strong extraction quality on typical document pages. | Less consistent on edge cases than the top rows. |
|
InternVL3.5
InternVL3.5 Q4 local run
| 0.82 | 33/41 | Good formatting on many pages. | Lower accuracy on financial and form details. |
|
GLM-OCR
0.9B Q8 local run
| 0.80 | 34/41 | Good for simple text-heavy pages where layout is secondary. | Lower layout and detail reliability than the top rows. |
|
LightOnOCR-2
1B Q8 local run
| 0.79 | 32/41 | Good markdown formatting for simple documents. | Weaker on dense layouts and missed more cases. |
What this means for you
Higher extraction quality means fewer manual checks and more reliable workflow output. Single-provider EU-hosted conversion, extraction, and generation share one API style and one credit pool — keeping the pipeline simple and reliable.
Less human review
Fewer missing rows, changed numbers, or garbled fields means fewer routine documents need a person to inspect the markdown before the workflow continues.
More confidence in automation
Consistent conversion output means invoice, contract, and intake workflows run further before routing exceptions to a human.
Better downstream data
Cleaner markdown means extraction and generation APIs produce better output, whether the result goes to a spreadsheet, a report, or an MCP-connected agent.
Faster document turnaround
Files move from upload to extracted fields, generated reports, or spreadsheet exports with less manual correction between steps.
Fewer broken pipeline steps
Reliable OCR output reduces custom cleanup code between pipeline steps. With conversion, extraction, and generation on one platform, the integration points that remain are simpler too.
More trust in client deliverables
When the source markdown is accurate, the reports, summaries, and spreadsheets generated from it are too. Less time fixing deliverables before they reach the client.
OCR is just the first step
Most teams do not stop at markdown. They extract fields, generate reports, create images, or hand the result to an agent. These workflows show how document-to-markdown conversion connects to the rest of the platform.
Invoice workflow
Real-estate workflow
Agent workflow
Try with your own data
Upload your own documents and see the results. Conversion, extraction, and generation all share one credit pool.
Document to Markdown
Convert any document to clean markdown. 500 free markdown conversions included.
Document Extraction
Document extraction API for structured data extraction. 500 free document extractions included.
Website Extraction
Website extraction API for typed JSON from public pages. 100 free website extractions included.