Extract structured data from PDFs and business documents
Send any of 40+ file formats or a public website URL, define a schema, and get typed JSON back. Built for structured data extraction with confidence scores and source citations.
No credit card required — start with free trial credits
One output feeds the next
Document Extraction is part of a complete content pipeline. One key, one credit pool, and structured JSON responses designed to chain together.
Mix and match freely
Extract data from a document, generate visuals from the results, then compile everything into a finished report. Mix, match, and build your own pipeline.
Three steps to your first extraction
Define a schema
Describe the structured data you want returned using our schema format. Each field has a name, a type, and an optional description to guide extraction.
- Rich field types including text, currency, date, IBAN, and address
- Nested arrays for line items, tables, and repeating sections
- Optional descriptions to clarify ambiguous fields
Send your documents
Upload any of 40+ file formats including PDFs, scans, Office files, emails, images, public website URLs, and more. Send up to 20 files per request and combine them into one extraction result.
- 40+ formats plus public website URLs
- Up to 20 files combined into one structured result
- Built-in OCR for scanned pages and photos
Get structured data
Receive typed JSON with extracted fields, confidence scores, and source citations so you can automate downstream workflows and route uncertain results to review.
- Confidence scores between 0 and 1 for every field
- Source citations linking each value to its location in the document
- Missing fields return null with a confidence score of 0
Intelligent Extraction
The API automatically selects the best extraction approach for your schema and documents, without stitching together OCR, prompting, and post-processing logic yourself.
Schema-Driven Results
Define typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No prompt engineering, no output parsing.
Deep Content Understanding
Images and scanned documents aren't treated as pixel grids to OCR. The API understands what they depict — product photos, charts, handwritten notes — and extracts field values from that visual meaning.
Built-In Trust Scores
Every extracted value includes a confidence score and a verbatim source citation from the document. Route low-confidence results to human review.
Multi-File Merge
Send up to 20 files per request and get one unified extraction across all of them. Mix formats freely — a PDF invoice, a DOCX contract, a JPEG receipt, and a public website URL in the same call.
40+ File Formats
PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, LaTeX, EML, Jupyter notebooks, images, public website URLs, plus text and markup formats like YAML, TOML, RST, and Org — all in the same endpoint.
No Model Training
Your documents are never used to train or improve AI models. This is guaranteed for all plans — not gated behind an enterprise contract.
Real-world pipelines, ready to ship
Each recipe chains multiple APIs into a complete workflow. Pick one, tweak it, and deploy — or use it as a starting point for your own pipeline.
Extract Academic Paper Metadata
Extract title, authors, abstract, and citation info from academic papers.
Extract Article Text
Extract clean article content — title, author, date, and body text — from PDFs, Word docs, and web pages.
Extract Contract Clause Data
Extract parties, dates, and clauses from a contract into structured JSON for legal review workflows.
Extract Court Filing Data
Extract case numbers, parties, filing dates, court details, and relief sought from court filing documents and legal pleadings.
Extract Customs Declaration
Merge a commercial invoice, packing list, and bill of lading into a unified customs declaration.
Extract Delivery Note Data
Extract shipment details, item quantities, and delivery confirmation data from warehouse delivery notes and goods received notes.
Extract Fleet Vehicle Registration Data
Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.
Extract Invoice Data
Extract vendor name, line items, totals, and dates from invoice documents.
Extract KPI Data
Extract campaign or business KPIs from report documents — metrics, values, periods, and targets.
Extract KYC Onboarding Data
Extract client identity verification details, company information, and beneficial ownership data from KYC onboarding documents.
Extract Legal Invoice Data
Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.
Extract Medical Record
Extract patient details, diagnoses, and medications from a medical record into structured JSON for healthcare workflows.
Extract Multi-Invoice Data
Extract structured data from multiple invoice files in a single API call using an array schema.
Extract NDA Terms
Extract parties, obligations, restrictions, permitted disclosures, and expiry dates from non-disclosure agreements.
Extract Product Catalog Entry
Extract product name, SKU, price, and specifications from a catalog document into structured JSON for e-commerce workflows.
Extract Property Appraisal
Extract appraised value, property details, and comparable sales from a property appraisal report into structured JSON.
Extract Property Deed Data
Extract property ownership, legal descriptions, encumbrances, and recording details from property deeds and land registry documents.
Extract Purchase Order Data
Extract line items, quantities, unit prices, delivery dates, and supplier details from purchase order documents.
Extract Real Estate Listing
Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.
Extract Receipt Data
Extract merchant, date, line items, tax, and total from receipts.
Extract Rental Application
Extract applicant details, employment history, income, and references from a rental application form into structured JSON for tenant screening.
Extract Resume Data
Extract candidate name, contact details, work history, and skills from resumes.
Extract Supplier Invoice Data for ERP Import
Extract supplier invoice details structured for direct import into ERP systems like SAP, Oracle, or Microsoft Dynamics.
Extract Terms and Conditions
Extract clause types, obligations, limitations, and governing law from terms and conditions documents.
Extract Traffic Fine Data
Extract violation details, fine amounts, vehicle information, and payment deadlines from traffic fine notices.
One n8n node for your entire pipeline
Most n8n document workflows chain three or four separate services. The Iteration Layer community node covers extraction, transformation, and generation in a single install — wire up multi-step pipelines visually instead of writing glue code.
Start building right now
One API call, one credit deducted. Chains naturally with our other APIs — pipe the output of one into the next without glue code. You'll be up and running in minutes.
- Full OpenAPI 3.1 specification available for code generation and IDE integration.
- MCP server support for seamless integration with AI agents and tools.
- Comprehensive documentation with examples for every field type and edge case.
invoice_number
INV-2024-0042
vendor
Northwind Accounting Services GmbH
due_date
2024-04-14
line_items
description
Month-end close automation workshop
amount
USD 720.00
description
Invoice schema rollout and testing
amount
USD 480.00
description
Vendor onboarding playbook update
amount
USD 190.00
total_due
USD 1,390.00
name
Elena Vasquez
current_title
Senior Software Engineer
location
Berlin, Germany
elena.vasquez@email.com
github
github.com/elenavasquez
experience_years
8
education
M.Sc. Computer Science, Technical University of Munich
skills
Elixir, Python, Go, Kubernetes, PostgreSQL, Kafka, Terraform, AWS
merchant
Custom Burger
phone
(415) 252-2634
order_datetime
Apr'28'09 03:09PM
order_number
9007
address
121 Seventh Street, San Francisco, CA 94103
wifi
SOMA250 / 59632
line_items
description
Veggie Burger
amount
USD 5.99
description
Bleu Cheese
amount
USD 1.49
description
1 Bal 1/2
amount
USD 3.79
description
Cash
amount
USD 13.00
subtotal
USD 11.27
tax
USD 1.07
payment
USD 12.34
change_due
USD 0.66
date
2024-11-05
author
Operations Office
to
COO, CFO, VP Operations
subject
Q1 readiness, staffing posture, and warehouse expansion
action_items
Finalize racking vendor shortlist
Publish revised hiring guardrails
Send weekly margin dashboard to leadership
\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{booktabs}
\usepackage{hyperref}
\title{Document Processing Benchmark Notes}
\author{Nadia Keller}
\date{March 2026}
\begin{document}
\maketitle
\begin{abstract}
We compared structured extraction, markdown conversion, and spreadsheet generation workflows across invoices, scanned warehouse sheets, and compliance packets.
\end{abstract}
\section{Scope}
The benchmark covered 120 source files spanning PDF, DOCX, JPEG, and LaTeX inputs. We measured field accuracy, table retention, and handoff effort for downstream automation.
\section{Summary Table}
\begin{tabular}{lrr}
\toprule
Workflow & Accuracy & Median Runtime \\
\midrule
Invoice extraction & 98.4\% & 1.8s \\
Markdown conversion & 96.9\% & 1.2s \\
Sheet generation & 100.0\% & 0.7s \\
\bottomrule
\end{tabular}
\section{Key Findings}
Structured document APIs reduce glue code, preserve tabular content better than OCR-only pipelines, and shorten review time for finance operations.
\begin{itemize}
\item OCR-only pipelines lost row groupings in 17\% of warehouse tables.
\item Markdown output remained suitable for LLM ingestion without custom cleanup.
\item Spreadsheet generation removed a manual CSV reformatting step from the finance workflow.
\end{itemize}
\section{Next Steps}
Extend the benchmark to receipts, insurance packets, and multi-file extraction. Publish the evaluation harness after the April review.
\end{document}
title
Document Processing Benchmark Notes
author
Nadia Keller
abstract
We compared structured extraction, markdown conversion, and spreadsheet generation workflows across invoices, scanned warehouse sheets, and compliance packets.
benchmark_scope
120 files across PDF, DOCX, JPEG, and LaTeX inputs
key_findings
Structured document APIs reduce glue code, preserve tabular content better than OCR-only pipelines, and shorten review time for finance operations
next_step
Extend the benchmark to receipts, insurance packets, and multi-file extraction. Publish the evaluation harness after the April review.
curl -X POST \
https://api.iterationlayer.com/document-extraction/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"files": [{
"type": "url",
"name": "accounts-payable-invoice.pdf",
"url": "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf"
}],
"schema": {
"fields": [
{
"name": "invoice_number",
"type": "TEXT",
"description": "The invoice number"
},
{
"name": "vendor",
"type": "TEXT",
"description": "The vendor legal name"
},
{
"name": "due_date",
"type": "DATE",
"description": "The invoice due date"
},
{
"name": "line_items",
"type": "ARRAY",
"description": "Line items",
"fields": [
{
"name": "description",
"type": "TEXT",
"description": "Line item description"
},
{
"name": "amount",
"type": "CURRENCY_AMOUNT",
"description": "Line item amount"
}
]
},
{
"name": "total_due",
"type": "CURRENCY_AMOUNT",
"description": "The final amount due"
}
]
}
}'
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "accounts-payable-invoice.pdf"
},
"vendor": {
"type": "TEXT",
"value": "Northwind Accounting Services GmbH",
"confidence": 0.98,
"citations": ["Northwind Accounting Services GmbH"],
"source": "accounts-payable-invoice.pdf"
},
"due_date": {
"type": "DATE",
"value": "2024-04-14",
"confidence": 0.96,
"citations": ["Due Date: 2024-04-14"],
"source": "accounts-payable-invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Month-end close automation workshop",
"confidence": 0.98,
"citations": ["Month-end close automation workshop"]
},
"amount": {
"value": 720.00,
"confidence": 0.96,
"citations": ["USD 720.00"]
}
},
{
"description": {
"value": "Invoice schema rollout and testing",
"confidence": 0.97,
"citations": ["Invoice schema rollout and testing"]
},
"amount": {
"value": 480.00,
"confidence": 0.95,
"citations": ["USD 480.00"]
}
},
{
"description": {
"value": "Vendor onboarding playbook update",
"confidence": 0.95,
"citations": ["Vendor onboarding playbook update"]
},
"amount": {
"value": 190.00,
"confidence": 0.94,
"citations": ["USD 190.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "accounts-payable-invoice.pdf"
},
"total_due": {
"type": "CURRENCY_AMOUNT",
"value": 1390.00,
"confidence": 0.97,
"citations": ["Total Due: USD 1,390.00"],
"source": "accounts-payable-invoice.pdf"
}
}
}
import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
apiKey: "YOUR_API_KEY",
});
const result = await client.extractDocument({
files: [{
type: "url",
name: "accounts-payable-invoice.pdf",
url: "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
}],
schema: {
fields: [
{
type: "TEXT",
name: "invoice_number",
description: "The invoice number",
},
{
type: "TEXT",
name: "vendor",
description: "The vendor legal name",
},
{
type: "DATE",
name: "due_date",
description: "The invoice due date",
},
{
type: "ARRAY",
name: "line_items",
description: "Line items",
fields: [
{ type: "TEXT", name: "description", description: "Line item description" },
{ type: "CURRENCY_AMOUNT", name: "amount", description: "Line item amount" },
],
},
{
type: "CURRENCY_AMOUNT",
name: "total_due",
description: "The final amount due",
},
],
},
});
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "accounts-payable-invoice.pdf"
},
"vendor": {
"type": "TEXT",
"value": "Northwind Accounting Services GmbH",
"confidence": 0.98,
"citations": ["Northwind Accounting Services GmbH"],
"source": "accounts-payable-invoice.pdf"
},
"due_date": {
"type": "DATE",
"value": "2024-04-14",
"confidence": 0.96,
"citations": ["Due Date: 2024-04-14"],
"source": "accounts-payable-invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Month-end close automation workshop",
"confidence": 0.98,
"citations": ["Month-end close automation workshop"]
},
"amount": {
"value": 720.00,
"confidence": 0.96,
"citations": ["USD 720.00"]
}
},
{
"description": {
"value": "Invoice schema rollout and testing",
"confidence": 0.97,
"citations": ["Invoice schema rollout and testing"]
},
"amount": {
"value": 480.00,
"confidence": 0.95,
"citations": ["USD 480.00"]
}
},
{
"description": {
"value": "Vendor onboarding playbook update",
"confidence": 0.95,
"citations": ["Vendor onboarding playbook update"]
},
"amount": {
"value": 190.00,
"confidence": 0.94,
"citations": ["USD 190.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "accounts-payable-invoice.pdf"
},
"total_due": {
"type": "CURRENCY_AMOUNT",
"value": 1390.00,
"confidence": 0.97,
"citations": ["Total Due: USD 1,390.00"],
"source": "accounts-payable-invoice.pdf"
}
}
}
from iterationlayer import IterationLayer
client = IterationLayer(
api_key="YOUR_API_KEY"
)
result = client.extract_document(
files=[{
"type": "url",
"name": "accounts-payable-invoice.pdf",
"url": "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
}],
schema={
"fields": [
{
"type": "TEXT",
"name": "invoice_number",
"description": "The invoice number",
},
{
"type": "TEXT",
"name": "vendor",
"description": "The vendor legal name",
},
{
"type": "DATE",
"name": "due_date",
"description": "The invoice due date",
},
{
"type": "ARRAY",
"name": "line_items",
"description": "Line items",
"fields": [
{"type": "TEXT", "name": "description", "description": "Line item description"},
{"type": "CURRENCY_AMOUNT", "name": "amount", "description": "Line item amount"},
],
},
{
"type": "CURRENCY_AMOUNT",
"name": "total_due",
"description": "The final amount due",
},
],
},
)
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "accounts-payable-invoice.pdf"
},
"vendor": {
"type": "TEXT",
"value": "Northwind Accounting Services GmbH",
"confidence": 0.98,
"citations": ["Northwind Accounting Services GmbH"],
"source": "accounts-payable-invoice.pdf"
},
"due_date": {
"type": "DATE",
"value": "2024-04-14",
"confidence": 0.96,
"citations": ["Due Date: 2024-04-14"],
"source": "accounts-payable-invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Month-end close automation workshop",
"confidence": 0.98,
"citations": ["Month-end close automation workshop"]
},
"amount": {
"value": 720.00,
"confidence": 0.96,
"citations": ["USD 720.00"]
}
},
{
"description": {
"value": "Invoice schema rollout and testing",
"confidence": 0.97,
"citations": ["Invoice schema rollout and testing"]
},
"amount": {
"value": 480.00,
"confidence": 0.95,
"citations": ["USD 480.00"]
}
},
{
"description": {
"value": "Vendor onboarding playbook update",
"confidence": 0.95,
"citations": ["Vendor onboarding playbook update"]
},
"amount": {
"value": 190.00,
"confidence": 0.94,
"citations": ["USD 190.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "accounts-payable-invoice.pdf"
},
"total_due": {
"type": "CURRENCY_AMOUNT",
"value": 1390.00,
"confidence": 0.97,
"citations": ["Total Due: USD 1,390.00"],
"source": "accounts-payable-invoice.pdf"
}
}
}
import il "github.com/iterationlayer/sdk-go"
client := il.NewClient("YOUR_API_KEY")
result, err := client.ExtractDocument(il.ExtractDocumentRequest{
Files: []il.FileInput{
il.NewFileFromURL(
"accounts-payable-invoice.pdf",
"https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
),
},
Schema: il.ExtractionSchema{
"invoice_number": il.NewTextFieldConfig(
"invoice_number",
"The invoice number",
),
"vendor": il.NewTextFieldConfig(
"vendor",
"The vendor legal name",
),
"due_date": il.NewDateFieldConfig(
"due_date",
"The invoice due date",
),
"line_items": il.NewArrayFieldConfig(
"line_items",
"Line items",
[]il.FieldConfig{
il.NewTextFieldConfig("description", "Line item description"),
il.NewCurrencyAmountFieldConfig("amount", "Line item amount"),
},
),
"total_due": il.NewCurrencyAmountFieldConfig(
"total_due",
"The final amount due",
),
},
})
{
"success": true,
"data": {
"invoice_number": {
"type": "TEXT",
"value": "INV-2024-0042",
"confidence": 0.97,
"citations": ["Invoice #INV-2024-0042"],
"source": "accounts-payable-invoice.pdf"
},
"vendor": {
"type": "TEXT",
"value": "Northwind Accounting Services GmbH",
"confidence": 0.98,
"citations": ["Northwind Accounting Services GmbH"],
"source": "accounts-payable-invoice.pdf"
},
"due_date": {
"type": "DATE",
"value": "2024-04-14",
"confidence": 0.96,
"citations": ["Due Date: 2024-04-14"],
"source": "accounts-payable-invoice.pdf"
},
"line_items": {
"type": "ARRAY",
"value": [
{
"description": {
"value": "Month-end close automation workshop",
"confidence": 0.98,
"citations": ["Month-end close automation workshop"]
},
"amount": {
"value": 720.00,
"confidence": 0.96,
"citations": ["USD 720.00"]
}
},
{
"description": {
"value": "Invoice schema rollout and testing",
"confidence": 0.97,
"citations": ["Invoice schema rollout and testing"]
},
"amount": {
"value": 480.00,
"confidence": 0.95,
"citations": ["USD 480.00"]
}
},
{
"description": {
"value": "Vendor onboarding playbook update",
"confidence": 0.95,
"citations": ["Vendor onboarding playbook update"]
},
"amount": {
"value": 190.00,
"confidence": 0.94,
"citations": ["USD 190.00"]
}
}
],
"confidence": 0.97,
"citations": [],
"source": "accounts-payable-invoice.pdf"
},
"total_due": {
"type": "CURRENCY_AMOUNT",
"value": 1390.00,
"confidence": 0.97,
"citations": ["Total Due: USD 1,390.00"],
"source": "accounts-payable-invoice.pdf"
}
}
}
Official SDKs for every major language
Install the SDK, set your API key, and start chaining requests. Full type safety, automatic retries, and idiomatic error handling included.
Your data stays in the EU
Your data is processed on EU servers and never stored beyond temporary logs. Zero retention, GDPR-compliant by design, with a Data Processing Agreement available for every customer. Learn more about our security practices .
No data storage, no model training
We don't store your files or processing results, and your data is never used to train or improve AI models. Logs are automatically deleted after 90 days.
EU-hosted infrastructure
All processing runs on servers located in the European Union. Your data never leaves the EU.
GDPR-compliant by design
Full compliance with EU data protection regulations. Data Processing Agreement available for all customers.
Pricing
Start with free trial credits. No credit card required.
Developer
For individuals & small projects
Startup
Save 40%For growing teams
Business
Save 47%For high-volume workloads
Or pay as you go from $0.022/credit with automatic volume discounts.
Still evaluating?
Compare Iteration Layer against the biggest alternatives at a glance, then open the full head-to-head pages when you want the details.
| Feature | Iteration Layer | DocuPipe | Azure Document Intelligence | Google Document AI |
|---|---|---|---|---|
| Schema-defined extraction |
Yes
Define extraction fields via a purpose-built schema with 17 typed field types
|
Yes
Zero-shot extraction with custom schema definitions — no training required
|
Model-dependent
Requires choosing a pre-built model or training a custom one on labelled documents
|
Processor-based
Requires choosing and configuring a specific processor type in the GCP console for each document type
|
| Confidence scores |
Per field
Confidence score between 0 and 1 for every extracted schema field
|
Per field
Confidence metrics provided per extracted field value
|
Per field
Confidence scores provided per extracted field value
|
Per entity
Confidence scores provided per extracted entity
|
| Source citations |
Yes
Verbatim source citation from the document for every extracted field
|
Visual only
Source highlighting available in the review UI but no verbatim text citations in the API response
|
No
No source citation linking extracted values back to document text
|
No
No source citation linking extracted values back to document text
|
| Multi-file support |
Up to 20 files
Process up to 20 files in a single API request with merged extraction results
|
1 file
Each API request processes a single file
|
1 file
Each API request processes a single file
|
1 file
Each API request processes a single file
|
See how we compare to our competitors
DocuPipe Azure Document Intelligence Google Document AI Reducto Nanonets LlamaParse Mistral OCR AWS Textract Kreuzberg Regex & TemplatesFrequently asked questions
How accurate is the extraction quality?
What file formats are supported?
How does schema-based extraction work?
What are confidence scores?
How many files can I send per request?
Does it handle scanned documents?
What happens when a field isn't found?
Built for how you work
Whether you're building pipelines in code, automating workflows, orchestrating AI agents, or shipping client projects — Iteration Layer fits your process.
Developers
One vendor, one credit pool — stop maintaining five libraries for document and image processing.
Operations Teams
Automate the manual document and image tasks that eat hours every week — no custom code required.
AI Agents
Give your AI agents a complete content processing toolkit via a single MCP server.
Agencies
One account, one credit pool — deploy the same processing pipeline across every client project.