Document Extraction

Extract structured data from any document

Send a PDF, image, or document — get structured JSON back. Define the fields you need, and the API extracts them with confidence scores.

No credit card required — start with free trial credits

Zero data retention · GDPR Made & hosted in the EU 25 free test requests No credit card required 14-day money-back guarantee

What's included

Schema-Driven Extraction

Define 17 typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No prompt engineering, no output parsing.

Built-In Trust Scores

Every extracted value includes a confidence score and a verbatim source citation from the document. Route low-confidence results to human review.

Multi-File Merge

Send up to 20 files per request — PDFs, images, spreadsheets, Word docs — and get one unified extraction across all of them.

MCP (Model Context Protocol)

Connect directly to AI agents. Use our APIs as tools in Claude, GPT, and other LLM-powered workflows.

Zapier

Connect to 5,000+ apps with Zapier integration.

Coming Soon

n8n

Build automated workflows with n8n integration.

Coming Soon

How it works

01

Define a schema

Describe the fields you want to extract using our schema format. Each field has a name, a type, and an optional description to guide the extraction.

  • 17 field types including text, currency, date, IBAN, and address
  • Nested arrays for line items, tables, and repeating sections
  • Optional descriptions to clarify ambiguous fields
02

Send your documents

Upload PDFs, images, or office documents via URL or base64. Send up to 20 files per request — they are combined into a single extraction result.

  • PDF, Word, Excel, images, and scanned documents
  • Up to 20 files combined into one structured result
  • Built-in OCR for scanned pages and photos
03

Get structured data

Receive JSON with extracted fields, confidence scores, and source citations. Every field includes provenance so you know exactly where the value came from.

  • Confidence scores between 0 and 1 for every field
  • Source citations linking each value to its location in the document
  • Missing fields return null with a confidence score of 0

Quick Start

One API call, one credit deducted. Chains naturally with our other APIs — pipe the output of one into the next without glue code. You'll be up and running in minutes.

  • Full OpenAPI 3.1 specification available for code generation and IDE integration.
  • MCP server support for seamless integration with AI agents and tools.
  • Comprehensive documentation with examples for every field type and edge case.
curl -X POST https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [{ "type": "url", "name": "invoice.pdf", "url": "https://example.com/invoice.pdf" }],
    "schema": {
      "fields": [
        { "name": "invoice_number", "type": "TEXT", "description": "The invoice number" },
        { "name": "total_amount", "type": "CURRENCY_AMOUNT", "description": "The total amount" },
        { "name": "line_items", "type": "ARRAY", "description": "Line items", "fields": [
          { "name": "description", "type": "TEXT", "description": "Item description" },
          { "name": "amount", "type": "CURRENCY_AMOUNT", "description": "Item amount" }
        ]}
      ]
    }
  }'
import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.extract({
  files: [{ type: "url", name: "invoice.pdf", url: "https://example.com/invoice.pdf" }],
  schema: {
    fields: [
      { type: "TEXT", name: "invoice_number", description: "The invoice number" },
      { type: "CURRENCY_AMOUNT", name: "total_amount", description: "The total amount" },
      { type: "ARRAY", name: "line_items", description: "Line items", fields: [
        { type: "TEXT", name: "description", description: "Item description" },
        { type: "CURRENCY_AMOUNT", name: "amount", description: "Item amount" },
      ]},
    ],
  },
});
from iterationlayer import IterationLayer

client = IterationLayer(api_key="YOUR_API_KEY")

result = client.extract(
    files=[{"type": "url", "name": "invoice.pdf", "url": "https://example.com/invoice.pdf"}],
    schema={
        "fields": [
            {"type": "TEXT", "name": "invoice_number", "description": "The invoice number"},
            {"type": "CURRENCY_AMOUNT", "name": "total_amount", "description": "The total amount"},
            {"type": "ARRAY", "name": "line_items", "description": "Line items", "fields": [
                {"type": "TEXT", "name": "description", "description": "Item description"},
                {"type": "CURRENCY_AMOUNT", "name": "amount", "description": "Item amount"},
            ]},
        ],
    },
)
import il "github.com/iterationlayer/sdk-go"

client := il.NewClient("YOUR_API_KEY")

result, err := client.Extract(il.ExtractRequest{
    Files: []il.FileInput{
        il.NewFileFromURL("invoice.pdf", "https://example.com/invoice.pdf"),
    },
    Schema: il.ExtractionSchema{
        "invoice_number": il.NewTextFieldConfig("invoice_number", "The invoice number"),
        "total_amount":   il.NewCurrencyAmountFieldConfig("total_amount", "The total amount"),
    },
})

See it in action

Ready-to-use workflows for the most common data processing tasks.

Automate Invoice Processing

Extract line items, totals, and vendor details from invoices into structured JSON for accounting workflows.

Digitize Academic Papers

Extract titles, authors, abstracts, and citations from academic papers into structured JSON for research workflows.

Extract Contract Clauses

Extract parties, dates, and clauses from contracts into structured JSON for legal review workflows.

Extract Product Catalog Data

Extract product names, SKUs, prices, and specifications from catalog documents into structured JSON for e-commerce workflows.

Extract Real Estate Listings

Extract property addresses, prices, room counts, and features from listing documents into structured JSON for MLS and property platforms.

Extract Rental Application Data

Extract applicant details, employment history, income, and references from rental application forms into structured JSON for tenant screening.

Onboard Employees

Merge an employment contract, ID document, and tax form into a single employee onboarding record.

Onboard Suppliers

Merge a supplier application, bank details, and tax certificate into a single structured supplier profile.

Parse Receipts and Expenses

Extract merchant details, dates, and line items from receipts into structured JSON for expense tracking workflows.

Parse Resumes and CVs

Extract candidate details, skills, and work experience from resumes into structured JSON for recruiting workflows.

Process Customs Declarations

Merge a commercial invoice, packing list, and bill of lading into a unified customs declaration.

Process Medical Records

Extract patient details, diagnoses, and medications from medical records into structured JSON for healthcare workflows.

Scrape Structured Web Data

Extract page titles, headings, links, and content from web pages into structured JSON for data collection workflows.

Privacy by default

We built Iteration Layer with privacy by design. Your data is processed in the EU and never stored beyond temporary logs. Learn more about our security practices .

No data storage

We don't store your files or processing results. Logs are automatically deleted after 30 days.

EU-hosted infrastructure

All processing runs on servers located in the European Union. Your data never leaves the EU.

GDPR-compliant by design

Full compliance with EU data protection regulations. Data Processing Agreement available for all customers.

Pricing

Start with free trial credits. No credit card required.

Developer

For individuals & small projects

$29.99 /per month
  • 1,000 credits / month
    1,000 image transformations 500 document generations 500 image generations 100 document extractions
  • All APIs included
  • Free trial credits per API
  • Email support
  • Budget caps per key
Most Popular

Startup

Save 40%

For growing teams

$119.99 /per month
  • 5,000 credits / month
    5,000 image transformations 2,500 document generations 2,500 image generations 500 document extractions
  • All APIs included
  • Free trial credits per API
  • Priority support
  • Budget caps per key

Business

Save 47%

For high-volume workloads

$319.99 /per month
  • 15,000 credits / month
    15,000 image transformations 7,500 document generations 7,500 image generations 1,500 document extractions
  • All APIs included
  • Free trial credits per API
  • Priority support
  • Budget caps per key

Frequently asked questions

What file formats are supported?

The API accepts PDF, DOCX, XLSX, CSV, TXT, HTML, PNG, JPEG, GIF, and WebP. Scanned documents are processed with built-in OCR.

How does schema-based extraction work?

You define a schema describing the fields you want (name, type, description). The API uses AI to locate and extract those fields from the document.

What are confidence scores?

Every extracted field includes a confidence score between 0 and 1, indicating how certain the API is about the result. Use these to build human review flows.

How many files can I send per request?

You can send up to 20 files per request. All files are combined into a single extraction result — the API pulls fields from across all documents. The total size limit is 200 MB with 50 MB per file.

Does it handle scanned documents?

Yes. The API includes built-in OCR for scanned documents and images. No separate OCR step is needed.

What happens when a field isn't found?

Missing fields return null with a confidence score of 0. You can use confidence thresholds to decide when to flag documents for manual review.