Document Extraction

Extract structured data from PDFs and business documents

Name: Document Extraction
Brand: Iteration Layer
Availability: OnlineOnly

Send any of 40+ file formats or a public website URL, define a schema, and get typed JSON back. Built for structured data extraction with confidence scores and source citations.

Get Your Free API Key

No credit card required — start with free trial credits

Zero data retention · GDPR Made & hosted in the EU $65 free trial credits No credit card required 14-day money-back guarantee

One output feeds the next

Document Extraction is part of a complete content pipeline. One key, one credit pool, and structured JSON responses designed to chain together.

Document Extraction

Document to Markdown

New

Website Extraction New

Website Extraction

Coming Soon

Audio Extraction Coming Soon

Audio Extraction

Website Extraction New

Website Extraction

Coming Soon

Audio Extraction Coming Soon

Fits into your existing stack

Native SDKs for Node, Python, and Go. OpenAPI spec for everything else. MCP server for AI agents and Claude Code skills. n8n community node for visual workflows.

Mix and match freely

Extract data from a document, generate visuals from the results, then compile everything into a finished report. Mix, match, and build your own pipeline.

Three steps to your first extraction

Define a schema

Describe the structured data you want returned using our schema format. Each field has a name, a type, and an optional description to guide extraction.

Rich field types including text, currency, date, IBAN, and address
Nested arrays for line items, tables, and repeating sections
Optional descriptions to clarify ambiguous fields

Send your documents

Upload any of 40+ file formats including PDFs, scans, Office files, emails, images, public website URLs, and more. Send up to 20 files per request and combine them into one extraction result.

40+ formats plus public website URLs
Up to 20 files combined into one structured result
Built-in OCR for scanned pages and photos

Get structured data

Receive typed JSON with extracted fields, confidence scores, and source citations so you can automate downstream workflows and route uncertain results to review.

Confidence scores between 0 and 1 for every field
Source citations linking each value to its location in the document
Missing fields return null with a confidence score of 0

Intelligent Extraction

The API automatically selects the best extraction approach for your schema and documents, without stitching together OCR, prompting, and post-processing logic yourself.

Top-tier extraction quality

Strong extraction accuracy across real workflow files — forms, invoices, scans, tables, charts, and photos. Our benchmark scored 0.93, second place overall.

Schema-Driven Results

Define typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No prompt engineering, no output parsing.

Deep Content Understanding

Images and scanned documents aren't treated as pixel grids to OCR. The API understands what they depict — product photos, charts, handwritten notes — and extracts field values from that visual meaning.

Built-In Trust Scores

Every extracted value includes a confidence score and a verbatim source citation from the document. Route low-confidence results to human review.

Multi-File Merge

Send up to 20 files per request and get one unified extraction across all of them. Mix formats freely — a PDF invoice, a DOCX contract, a JPEG receipt, and a public website URL in the same call.

40+ File Formats

PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, LaTeX, EML, Jupyter notebooks, images, public website URLs, plus text and markup formats like YAML, TOML, RST, and Org — all in the same endpoint.

No Model Training

Your documents are never used to train or improve AI models. This is guaranteed for all plans — not gated behind an enterprise contract.

Real-world pipelines, ready to ship

Each recipe chains multiple APIs into a complete workflow. Pick one, tweak it, and deploy — or use it as a starting point for your own pipeline.

Extract Academic Paper Metadata

Extract title, authors, abstract, and citation info from academic papers.

Extract Article Text

Extract clean article content — title, author, date, and body text — from PDFs, Word docs, and web pages.

Extract Contract Clause Data

Extract parties, dates, and clauses from a contract into structured JSON for legal review workflows.

Extract Court Filing Data

Extract case numbers, parties, filing dates, court details, and relief sought from court filing documents and legal pleadings.

Extract Customs Declaration

Merge a commercial invoice, packing list, and bill of lading into a unified customs declaration.

Extract Delivery Note Data

Extract shipment details, item quantities, and delivery confirmation data from warehouse delivery notes and goods received notes.

Extract Fleet Vehicle Registration Data

Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.

Extract Invoice Data

Extract vendor name, line items, totals, and dates from invoice documents.

Extract KPI Data

Extract campaign or business KPIs from report documents — metrics, values, periods, and targets.

Extract KYC Onboarding Data

Extract client identity verification details, company information, and beneficial ownership data from KYC onboarding documents.

Extract Legal Invoice Data

Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.

Extract Medical Record

Extract patient details, diagnoses, and medications from a medical record into structured JSON for healthcare workflows.

Extract Multi-Invoice Data

Extract structured data from multiple invoice files in a single API call using an array schema.

Extract NDA Terms

Extract parties, obligations, restrictions, permitted disclosures, and expiry dates from non-disclosure agreements.

Extract Product Catalog Entry

Extract product name, SKU, price, and specifications from a catalog document into structured JSON for e-commerce workflows.

Extract Property Appraisal

Extract appraised value, property details, and comparable sales from a property appraisal report into structured JSON.

Extract Property Deed Data

Extract property ownership, legal descriptions, encumbrances, and recording details from property deeds and land registry documents.

Extract Purchase Order Data

Extract line items, quantities, unit prices, delivery dates, and supplier details from purchase order documents.

Extract Real Estate Listing

Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.

Extract Receipt Data

Extract merchant, date, line items, tax, and total from receipts.

Extract Rental Application

Extract applicant details, employment history, income, and references from a rental application form into structured JSON for tenant screening.

Extract Resume Data

Extract candidate name, contact details, work history, and skills from resumes.

Extract Supplier Invoice Data for ERP Import

Extract supplier invoice details structured for direct import into ERP systems like SAP, Oracle, or Microsoft Dynamics.

Extract Terms and Conditions

Extract clause types, obligations, limitations, and governing law from terms and conditions documents.

Extract Traffic Fine Data

Extract violation details, fine amounts, vehicle information, and payment deadlines from traffic fine notices.

Browse all recipes

One n8n node for your entire pipeline

Most n8n document workflows chain three or four separate services. The Iteration Layer community node covers extraction, transformation, and generation in a single install — wire up multi-step pipelines visually instead of writing glue code.

n8n Community Node Read the Guide

Start building right now

One API call, one credit deducted. Chains naturally with our other APIs — pipe the output of one into the next without glue code. You'll be up and running in minutes.

Full OpenAPI 3.1 specification available for code generation and IDE integration.
MCP server support for seamless integration with AI agents and tools.
Comprehensive documentation with examples for every field type and edge case.

Read the docs

Preview Bash TypeScript Python Go

Input Preview

Output Preview

invoice_number


  
            INV-2024-0042

vendor


  
            Northwind Accounting Services GmbH

due_date


  
            2024-04-14

line_items

description


  Month-end close automation workshop

amount


  USD 720.00

description


  Invoice schema rollout and testing

amount


  USD 480.00

description


  Vendor onboarding playbook update

amount


  USD 190.00

total_due


  
            USD 1,390.00

Input Preview

Output Preview

merchant


  
            Custom Burger

phone


  
            (415) 252-2634

order_datetime


  
            Apr'28'09 03:09PM

order_number

address


  
            121 Seventh Street, San Francisco, CA 94103

wifi


  
            SOMA250 / 59632

line_items

description


  Veggie Burger

amount


  USD 5.99

description


  Bleu Cheese

amount


  USD 1.49

description


  1 Bal 1/2

amount


  USD 3.79

description


  Cash

amount


  USD 13.00

subtotal


  
            USD 11.27

tax


  
            USD 1.07

payment


  
            USD 12.34

change_due


  
            USD 0.66

Input Preview

\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{booktabs}
\usepackage{hyperref}
\title{Document Processing Benchmark Notes}
\author{Nadia Keller}
\date{March 2026}
\begin{document}
\maketitle

\begin{abstract}
We compared structured extraction, markdown conversion, and spreadsheet generation workflows across invoices, scanned warehouse sheets, and compliance packets.
\end{abstract}

\section{Scope}
The benchmark covered 120 source files spanning PDF, DOCX, JPEG, and LaTeX inputs. We measured field accuracy, table retention, and handoff effort for downstream automation.

\section{Summary Table}
\begin{tabular}{lrr}
\toprule
Workflow & Accuracy & Median Runtime \\
\midrule
Invoice extraction & 98.4\% & 1.8s \\
Markdown conversion & 96.9\% & 1.2s \\
Sheet generation & 100.0\% & 0.7s \\
\bottomrule
\end{tabular}

\section{Key Findings}
Structured document APIs reduce glue code, preserve tabular content better than OCR-only pipelines, and shorten review time for finance operations.

\begin{itemize}
  \item OCR-only pipelines lost row groupings in 17\% of warehouse tables.
  \item Markdown output remained suitable for LLM ingestion without custom cleanup.
  \item Spreadsheet generation removed a manual CSV reformatting step from the finance workflow.
\end{itemize}

\section{Next Steps}
Extend the benchmark to receipts, insurance packets, and multi-file extraction. Publish the evaluation harness after the April review.
\end{document}

Output Preview

title


  
            Document Processing Benchmark Notes

author


  
            Nadia Keller

abstract


  
            We compared structured extraction, markdown conversion, and spreadsheet generation workflows across invoices, scanned warehouse sheets, and compliance packets.

benchmark_scope


  
            120 files across PDF, DOCX, JPEG, and LaTeX inputs

key_findings


  
            Structured document APIs reduce glue code, preserve tabular content better than OCR-only pipelines, and shorten review time for finance operations

next_step


  
            Extend the benchmark to receipts, insurance packets, and multi-file extraction. Publish the evaluation harness after the April review.

Request

curl -X POST \
  https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "files": [{
    "type": "url",
    "name": "accounts-payable-invoice.pdf",
    "url": "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf"
  }],
  "schema": {
    "fields": [
      {
        "name": "invoice_number",
        "type": "TEXT",
        "description": "The invoice number"
      },
      {
        "name": "vendor",
        "type": "TEXT",
        "description": "The vendor legal name"
      },
      {
        "name": "due_date",
        "type": "DATE",
        "description": "The invoice due date"
      },
      {
        "name": "line_items",
        "type": "ARRAY",
        "description": "Line items",
        "fields": [
          {
            "name": "description",
            "type": "TEXT",
            "description": "Line item description"
          },
          {
            "name": "amount",
            "type": "CURRENCY_AMOUNT",
            "description": "Line item amount"
          }
        ]
      },
      {
        "name": "total_due",
        "type": "CURRENCY_AMOUNT",
        "description": "The final amount due"
      }
    ]
  }
}'

Response

{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}

Request

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const result = await client.extractDocument({
  files: [{
    type: "url",
    name: "accounts-payable-invoice.pdf",
    url: "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
  }],
  schema: {
    fields: [
      {
        type: "TEXT",
        name: "invoice_number",
        description: "The invoice number",
      },
      {
        type: "TEXT",
        name: "vendor",
        description: "The vendor legal name",
      },
      {
        type: "DATE",
        name: "due_date",
        description: "The invoice due date",
      },
      {
        type: "ARRAY",
        name: "line_items",
        description: "Line items",
        fields: [
          { type: "TEXT", name: "description", description: "Line item description" },
          { type: "CURRENCY_AMOUNT", name: "amount", description: "Line item amount" },
        ],
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "total_due",
        description: "The final amount due",
      },
    ],
  },
});

Response

{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}

Request

from iterationlayer import IterationLayer

client = IterationLayer(
    api_key="YOUR_API_KEY"
)

result = client.extract_document(
    files=[{
        "type": "url",
        "name": "accounts-payable-invoice.pdf",
        "url": "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
    }],
    schema={
        "fields": [
            {
                "type": "TEXT",
                "name": "invoice_number",
                "description": "The invoice number",
            },
            {
                "type": "TEXT",
                "name": "vendor",
                "description": "The vendor legal name",
            },
            {
                "type": "DATE",
                "name": "due_date",
                "description": "The invoice due date",
            },
            {
                "type": "ARRAY",
                "name": "line_items",
                "description": "Line items",
                "fields": [
                    {"type": "TEXT", "name": "description", "description": "Line item description"},
                    {"type": "CURRENCY_AMOUNT", "name": "amount", "description": "Line item amount"},
                ],
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "total_due",
                "description": "The final amount due",
            },
        ],
    },
)

Response

{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}

Request

import il "github.com/iterationlayer/sdk-go"

client := il.NewClient("YOUR_API_KEY")

result, err := client.ExtractDocument(il.ExtractDocumentRequest{
  Files: []il.FileInput{
    il.NewFileFromURL(
      "accounts-payable-invoice.pdf",
      "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
    ),
  },
  Schema: il.ExtractionSchema{
    "invoice_number": il.NewTextFieldConfig(
      "invoice_number",
      "The invoice number",
    ),
    "vendor": il.NewTextFieldConfig(
      "vendor",
      "The vendor legal name",
    ),
    "due_date": il.NewDateFieldConfig(
      "due_date",
      "The invoice due date",
    ),
    "line_items": il.NewArrayFieldConfig(
      "line_items",
      "Line items",
      []il.FieldConfig{
        il.NewTextFieldConfig("description", "Line item description"),
        il.NewCurrencyAmountFieldConfig("amount", "Line item amount"),
      },
    ),
    "total_due": il.NewCurrencyAmountFieldConfig(
      "total_due",
      "The final amount due",
    ),
  },
})

Response

{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}

Official SDKs for every major language

Install the SDK, set your API key, and start chaining requests. Full type safety, automatic retries, and idiomatic error handling included.

TypeScript Python Go

Your data stays in the EU

Your data is processed on EU servers and never stored beyond temporary logs. Zero retention, GDPR-compliant by design, with a Data Processing Agreement available for every customer. Learn more about our security practices .

No data storage, no model training

We don't store your files or processing results, and your data is never used to train or improve AI models. Logs are automatically deleted after 90 days.

EU-hosted infrastructure

All processing runs on servers located in the European Union. Your data never leaves the EU.

GDPR-compliant by design

Full compliance with EU data protection regulations. Data Processing Agreement available for all customers.

Pricing

Start with free trial credits. No credit card required.

Developer

For individuals & small projects

$29.99 /month

1,000 credits included

Get Your Free API Key

Startup

Save 40%

For growing teams

$119.99 /month

5,000 credits included

Get Your Free API Key

Business

Save 47%

For high-volume workloads

$319.99 /month

15,000 credits included

Get Your Free API Key

Or pay as you go from $0.022/credit with automatic volume discounts.

All APIs included Free trial credits per API Project-based budget caps Auto overage billing

See full pricing

Still evaluating?

Compare Iteration Layer against the biggest alternatives at a glance, then open the full head-to-head pages when you want the details.

Feature	Iteration Layer	DocuPipe	Azure Document Intelligence	Google Document AI
Schema-defined extraction	Yes Define extraction fields via a purpose-built schema with 17 typed field types	Yes Zero-shot extraction with custom schema definitions — no training required	Model-dependent Requires choosing a pre-built model or training a custom one on labelled documents	Processor-based Requires choosing and configuring a specific processor type in the GCP console for each document type
Confidence scores	Per field Confidence score between 0 and 1 for every extracted schema field	Per field Confidence metrics provided per extracted field value	Per field Confidence scores provided per extracted field value	Per entity Confidence scores provided per extracted entity
Source citations	Yes Verbatim source citation from the document for every extracted field	Visual only Source highlighting available in the review UI but no verbatim text citations in the API response	No No source citation linking extracted values back to document text	No No source citation linking extracted values back to document text
Multi-file support	Up to 20 files Process up to 20 files in a single API request with merged extraction results	1 file Each API request processes a single file	1 file Each API request processes a single file	1 file Each API request processes a single file

See how we compare to our competitors

DocuPipe Azure Document Intelligence Google Document AI Reducto Nanonets LlamaParse Mistral OCR AWS Textract Kreuzberg Regex & Templates

Frequently asked questions

How accurate is the extraction quality?

Our OCR benchmark shows strong extraction accuracy, reliability, and performance across 41 real workflow files, including forms, invoices, scans, tables, charts, and photos.

What file formats are supported?

The API accepts 40+ file formats including PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, CSV, TSV, HTML, LaTeX, EML, Jupyter notebooks, and all common image formats. Scanned documents are processed with built-in OCR.

How does schema-based extraction work?

You define the structured data schema you want returned by describing each field with a name, type, and optional description. The API then extracts those fields from the document and returns them as typed JSON.

What are confidence scores?

Every extracted field includes a confidence score between 0 and 1 so you can decide what to automate directly and what to send to human review.

How many files can I send per request?

You can send up to 20 files per request. All files are combined into a single extraction result — the API pulls fields from across all documents. The total size limit is 200 MB with 50 MB per file.

Does it handle scanned documents?

Yes. The API includes built-in OCR for scanned documents and images. No separate OCR step is needed.

What happens when a field isn't found?

Missing fields return null with a confidence score of 0. You can use confidence thresholds to decide when to flag documents for manual review.

Built for how you work

Whether you're building pipelines in code, automating workflows, orchestrating AI agents, or shipping client projects — Iteration Layer fits your process.

Extract structured data from PDFs and business documents

One output feeds the next

Fits into your existing stack

Mix and match freely

Three steps to your first extraction

Define a schema

Send your documents

Get structured data

Intelligent Extraction

Top-tier extraction quality

Schema-Driven Results

Deep Content Understanding

Built-In Trust Scores

Multi-File Merge

40+ File Formats

No Model Training

Real-world pipelines, ready to ship

Extract Academic Paper Metadata

Extract Article Text

Extract Contract Clause Data

Extract Court Filing Data

Extract Customs Declaration

Extract Delivery Note Data

Extract Fleet Vehicle Registration Data

Extract Invoice Data

Extract KPI Data

Extract KYC Onboarding Data

Extract Legal Invoice Data

Extract Medical Record

Extract Multi-Invoice Data

Extract NDA Terms

Extract Product Catalog Entry

Extract Property Appraisal

Extract Property Deed Data

Extract Purchase Order Data

Extract Real Estate Listing

Extract Receipt Data

Extract Rental Application

Extract Resume Data

Extract Supplier Invoice Data for ERP Import

Extract Terms and Conditions

Extract Traffic Fine Data

One n8n node for your entire pipeline

Start building right now

Official SDKs for every major language

Your data stays in the EU

No data storage, no model training

EU-hosted infrastructure

GDPR-compliant by design

Pricing

Developer

Startup

Business

Still evaluating?

See how we compare to our competitors

Frequently asked questions

Built for how you work

Developers

Operations Teams

AI Agents

Agencies