OCR Benchmark: Testing extraction accuracy on real-world documents

Scanned invoices, forms, receipts, tables, and charts feed into extraction, reporting, and automation workflows. How well the document converts to markdown decides everything downstream — garbage markdown in, garbage results out. This benchmark shows how the current pipeline performed on the document set we use for evaluation.

Get Your Free API Key

No credit card required — start with free trial credits

Zero data retention · GDPR Made & hosted in the EU $65 free trial credits No credit card required 14-day money-back guarantee

How we measured extraction quality

We ran 41 real workflow files — forms, invoices, scans, tables, charts, and photos — through each OCR pipeline, then had Gemini 2.5 Flash Lite judge every markdown output against the source image for text accuracy, layout, tables, and detail preservation.

Input 41 real workflow files

Convert OCR pipeline produces markdown

Judge Gemini compares output to source image

Score 0.0–1.0 per file

Input 41 real workflow files

Convert OCR pipeline produces markdown

Judge Gemini compares output to source image

Score 0.0–1.0 per file

Source files

Real workflow inputs across 27 document categories

Models tested

Current API plus 7 reference models

Judge

Gemini 2.5 Flash Lite

Same prompt for every model

Our score

0.93

Second place, 0.01 behind the top model

Tested file categories

Account statements

Bank checks

Charts and diagrams

Commercial leases

Credit card statements

Delivery notes

Government ID documents

Earnings reports

Equipment inspection forms

Government and tax-style forms

FUNSD-style forms

Glossaries and memos

Nutrition labels

Patents

Patient intake forms

Pay-in sheets and paystubs

Petition forms

Photo documents, receipts, and tables

Proxy voting documents

Quarterly reports

Real-estate documents

Scanned forms and scanned tables

Shift schedules

Shipping invoices

Slide screenshots

SROIE-style receipts

Plain tables

Results across all models

Same files, same judge, same prompt for every model. Iteration Layer OCR scored 0.93 — second place overall, 0.01 behind the best-scoring model in the suite.

Average score by OCR pipeline

Fixed 0.0 to 1.0 scale. Differences are visible without cropping the axis.

Chandra-OCR-2

0.94

Iteration Layer OCR

0.93

Qwen3-VL-Instruct

0.91

Gemma 4 A4B

0.89

MiniCPM-o 4.5

0.88

InternVL3.5

0.82

GLM-OCR

0.80

LightOnOCR-2

0.79

Model	Avg score	Passed	Strength	Weaknesses
Chandra-OCR-2 5B Q4 local run	0.94	41/41	Best overall text and layout preservation in this suite.	Occasionally over-formats simple pages where plain markdown is easier to use.
Iteration Layer OCR Current Document to Markdown API	0.93	41/41	Strong across mixed business documents, forms, scans, and tables.	Very dense diagrams and charts can still need a human check.
Qwen3-VL-Instruct Instruct Q4 local run	0.91	39/41	Good general-purpose markdown on forms, receipts, and tables.	Missed several cases where exact detail preservation mattered.
Gemma 4 A4B A4B MoE with vision budget fix	0.89	34/41	Good markdown structure on many document layouts.	Missed important details in several cases.
MiniCPM-o 4.5 Q4 local run	0.88	39/41	Strong extraction quality on typical document pages.	Less consistent on edge cases than the top rows.
InternVL3.5 InternVL3.5 Q4 local run	0.82	33/41	Good formatting on many pages.	Lower accuracy on financial and form details.
GLM-OCR 0.9B Q8 local run	0.80	34/41	Good for simple text-heavy pages where layout is secondary.	Lower layout and detail reliability than the top rows.
LightOnOCR-2 1B Q8 local run	0.79	32/41	Good markdown formatting for simple documents.	Weaker on dense layouts and missed more cases.

What this means for you

Higher extraction quality means fewer manual checks and more reliable workflow output. Single-provider EU-hosted conversion, extraction, and generation share one API style and one credit pool — keeping the pipeline simple and reliable.

Less human review

Fewer missing rows, changed numbers, or garbled fields means fewer routine documents need a person to inspect the markdown before the workflow continues.

More confidence in automation

Consistent conversion output means invoice, contract, and intake workflows run further before routing exceptions to a human.

Better downstream data

Cleaner markdown means extraction and generation APIs produce better output, whether the result goes to a spreadsheet, a report, or an MCP-connected agent.

Faster document turnaround

Files move from upload to extracted fields, generated reports, or spreadsheet exports with less manual correction between steps.

Fewer broken pipeline steps

Reliable OCR output reduces custom cleanup code between pipeline steps. With conversion, extraction, and generation on one platform, the integration points that remain are simpler too.

More trust in client deliverables

When the source markdown is accurate, the reports, summaries, and spreadsheets generated from it are too. Less time fixing deliverables before they reach the client.

OCR is just the first step

Most teams do not stop at markdown. They extract fields, generate reports, create images, or hand the result to an agent. These workflows show how document-to-markdown conversion connects to the rest of the platform.

Invoice workflow

Real-estate workflow

Agent workflow

Try with your own data

Upload your own documents and see the results. Conversion, extraction, and generation all share one credit pool.

Document to Markdown

Convert any document to clean markdown. 500 free markdown conversions included.

Document Extraction

Document extraction API for structured data extraction. 500 free document extractions included.

Website Extraction

Website extraction API for typed JSON from public pages. 100 free website extractions included.

Get Your Free API Key

Frequently asked questions

How was this benchmark run?

We ran the same 41-document OCR evaluation suite against the current Iteration Layer OCR pipeline and one reference run per model family. The files cover forms, invoices, scans, receipts, tables, charts, photos, statements, reports, and similar workflow inputs. This is our evaluation suite, not a universal claim about every possible document.

What did the Gemini judge evaluate?

Gemini 2.5 Flash Lite received the original source image and the extracted markdown output. It scored each result from 0.0 to 1.0 using the same prompt for every pipeline, checking completeness, text accuracy, document structure, and whether the output added text that was not present in the image. A score of 0.70 or higher counted as a pass.

Why not publish the source documents?

The suite is built from realistic workflow documents, so we publish the document categories and methodology rather than the source files themselves. That lets readers understand the coverage without turning private, copyrighted, or sensitive-looking examples into public benchmark fixtures.

Are these results a guarantee for my documents?

No. The benchmark shows how the current pipeline performed on our 41-file evaluation suite. It is useful evidence for expected behavior on similar inputs, but your own files may differ in scan quality, layout, language, handwriting, image compression, or document conventions.

Can the ranking change over time?

Yes. OCR models, prompts, quantization settings, and serving conditions change. We disclose the evaluated pipeline names and run details so results can be challenged, repeated, or updated when better data becomes available.

Ingest

Generate

Integrations

Built for

By industry

Overview

APIs