Iteration Layer
Document Extraction

Extract structured data from PDFs and business documents

Send any of 40+ file formats or a public website URL, define a schema, and get typed JSON back. Built for structured data extraction with confidence scores and source citations.

No credit card required — start with free trial credits

Zero data retention · GDPR Made & hosted in the EU $65 free trial credits No credit card required 14-day money-back guarantee

One output feeds the next

Document Extraction is part of a complete content pipeline. One key, one credit pool, and structured JSON responses designed to chain together.

Fits into your existing stack

Native SDKs for Node, Python, and Go. OpenAPI spec for everything else. MCP server for AI agents and Claude Code skills. n8n community node for visual workflows.

Mix and match freely

Extract data from a document, generate visuals from the results, then compile everything into a finished report. Mix, match, and build your own pipeline.

Three steps to your first extraction

01

Define a schema

Describe the structured data you want returned using our schema format. Each field has a name, a type, and an optional description to guide extraction.

  • Rich field types including text, currency, date, IBAN, and address
  • Nested arrays for line items, tables, and repeating sections
  • Optional descriptions to clarify ambiguous fields
02

Send your documents

Upload any of 40+ file formats including PDFs, scans, Office files, emails, images, public website URLs, and more. Send up to 20 files per request and combine them into one extraction result.

  • 40+ formats plus public website URLs
  • Up to 20 files combined into one structured result
  • Built-in OCR for scanned pages and photos
03

Get structured data

Receive typed JSON with extracted fields, confidence scores, and source citations so you can automate downstream workflows and route uncertain results to review.

  • Confidence scores between 0 and 1 for every field
  • Source citations linking each value to its location in the document
  • Missing fields return null with a confidence score of 0

Intelligent Extraction

The API automatically selects the best extraction approach for your schema and documents, without stitching together OCR, prompting, and post-processing logic yourself.

Schema-Driven Results

Define typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No prompt engineering, no output parsing.

Deep Content Understanding

Images and scanned documents aren't treated as pixel grids to OCR. The API understands what they depict — product photos, charts, handwritten notes — and extracts field values from that visual meaning.

Built-In Trust Scores

Every extracted value includes a confidence score and a verbatim source citation from the document. Route low-confidence results to human review.

Multi-File Merge

Send up to 20 files per request and get one unified extraction across all of them. Mix formats freely — a PDF invoice, a DOCX contract, a JPEG receipt, and a public website URL in the same call.

40+ File Formats

PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, LaTeX, EML, Jupyter notebooks, images, public website URLs, plus text and markup formats like YAML, TOML, RST, and Org — all in the same endpoint.

No Model Training

Your documents are never used to train or improve AI models. This is guaranteed for all plans — not gated behind an enterprise contract.

Real-world pipelines, ready to ship

Each recipe chains multiple APIs into a complete workflow. Pick one, tweak it, and deploy — or use it as a starting point for your own pipeline.

Extract Academic Paper Metadata

Extract title, authors, abstract, and citation info from academic papers.

Extract Article Text

Extract clean article content — title, author, date, and body text — from PDFs, Word docs, and web pages.

Extract Contract Clause Data

Extract parties, dates, and clauses from a contract into structured JSON for legal review workflows.

Extract Court Filing Data

Extract case numbers, parties, filing dates, court details, and relief sought from court filing documents and legal pleadings.

Extract Customs Declaration

Merge a commercial invoice, packing list, and bill of lading into a unified customs declaration.

Extract Delivery Note Data

Extract shipment details, item quantities, and delivery confirmation data from warehouse delivery notes and goods received notes.

Extract Fleet Vehicle Registration Data

Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.

Extract Invoice Data

Extract vendor name, line items, totals, and dates from invoice documents.

Extract KPI Data

Extract campaign or business KPIs from report documents — metrics, values, periods, and targets.

Extract KYC Onboarding Data

Extract client identity verification details, company information, and beneficial ownership data from KYC onboarding documents.

Extract Legal Invoice Data

Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.

Extract Medical Record

Extract patient details, diagnoses, and medications from a medical record into structured JSON for healthcare workflows.

Extract Multi-Invoice Data

Extract structured data from multiple invoice files in a single API call using an array schema.

Extract NDA Terms

Extract parties, obligations, restrictions, permitted disclosures, and expiry dates from non-disclosure agreements.

Extract Product Catalog Entry

Extract product name, SKU, price, and specifications from a catalog document into structured JSON for e-commerce workflows.

Extract Property Appraisal

Extract appraised value, property details, and comparable sales from a property appraisal report into structured JSON.

Extract Property Deed Data

Extract property ownership, legal descriptions, encumbrances, and recording details from property deeds and land registry documents.

Extract Purchase Order Data

Extract line items, quantities, unit prices, delivery dates, and supplier details from purchase order documents.

Extract Real Estate Listing

Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.

Extract Receipt Data

Extract merchant, date, line items, tax, and total from receipts.

Extract Rental Application

Extract applicant details, employment history, income, and references from a rental application form into structured JSON for tenant screening.

Extract Resume Data

Extract candidate name, contact details, work history, and skills from resumes.

Extract Supplier Invoice Data for ERP Import

Extract supplier invoice details structured for direct import into ERP systems like SAP, Oracle, or Microsoft Dynamics.

Extract Terms and Conditions

Extract clause types, obligations, limitations, and governing law from terms and conditions documents.

Extract Traffic Fine Data

Extract violation details, fine amounts, vehicle information, and payment deadlines from traffic fine notices.

One n8n node for your entire pipeline

Most n8n document workflows chain three or four separate services. The Iteration Layer community node covers extraction, transformation, and generation in a single install — wire up multi-step pipelines visually instead of writing glue code.

Start building right now

One API call, one credit deducted. Chains naturally with our other APIs — pipe the output of one into the next without glue code. You'll be up and running in minutes.

  • Full OpenAPI 3.1 specification available for code generation and IDE integration.
  • MCP server support for seamless integration with AI agents and tools.
  • Comprehensive documentation with examples for every field type and edge case.
Input Preview
Invoice INV-2024-0042
Output Preview

invoice_number

INV-2024-0042

vendor

Northwind Accounting Services GmbH

due_date

2024-04-14

line_items

description

Month-end close automation workshop

amount

USD 720.00

description

Invoice schema rollout and testing

amount

USD 480.00

description

Vendor onboarding playbook update

amount

USD 190.00

total_due

USD 1,390.00
Request
curl -X POST \
  https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "files": [{
    "type": "url",
    "name": "accounts-payable-invoice.pdf",
    "url": "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf"
  }],
  "schema": {
    "fields": [
      {
        "name": "invoice_number",
        "type": "TEXT",
        "description": "The invoice number"
      },
      {
        "name": "vendor",
        "type": "TEXT",
        "description": "The vendor legal name"
      },
      {
        "name": "due_date",
        "type": "DATE",
        "description": "The invoice due date"
      },
      {
        "name": "line_items",
        "type": "ARRAY",
        "description": "Line items",
        "fields": [
          {
            "name": "description",
            "type": "TEXT",
            "description": "Line item description"
          },
          {
            "name": "amount",
            "type": "CURRENCY_AMOUNT",
            "description": "Line item amount"
          }
        ]
      },
      {
        "name": "total_due",
        "type": "CURRENCY_AMOUNT",
        "description": "The final amount due"
      }
    ]
  }
}'
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}
Request
import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const result = await client.extractDocument({
  files: [{
    type: "url",
    name: "accounts-payable-invoice.pdf",
    url: "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
  }],
  schema: {
    fields: [
      {
        type: "TEXT",
        name: "invoice_number",
        description: "The invoice number",
      },
      {
        type: "TEXT",
        name: "vendor",
        description: "The vendor legal name",
      },
      {
        type: "DATE",
        name: "due_date",
        description: "The invoice due date",
      },
      {
        type: "ARRAY",
        name: "line_items",
        description: "Line items",
        fields: [
          { type: "TEXT", name: "description", description: "Line item description" },
          { type: "CURRENCY_AMOUNT", name: "amount", description: "Line item amount" },
        ],
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "total_due",
        description: "The final amount due",
      },
    ],
  },
});
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}
Request
from iterationlayer import IterationLayer

client = IterationLayer(
    api_key="YOUR_API_KEY"
)

result = client.extract_document(
    files=[{
        "type": "url",
        "name": "accounts-payable-invoice.pdf",
        "url": "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
    }],
    schema={
        "fields": [
            {
                "type": "TEXT",
                "name": "invoice_number",
                "description": "The invoice number",
            },
            {
                "type": "TEXT",
                "name": "vendor",
                "description": "The vendor legal name",
            },
            {
                "type": "DATE",
                "name": "due_date",
                "description": "The invoice due date",
            },
            {
                "type": "ARRAY",
                "name": "line_items",
                "description": "Line items",
                "fields": [
                    {"type": "TEXT", "name": "description", "description": "Line item description"},
                    {"type": "CURRENCY_AMOUNT", "name": "amount", "description": "Line item amount"},
                ],
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "total_due",
                "description": "The final amount due",
            },
        ],
    },
)
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}
Request
import il "github.com/iterationlayer/sdk-go"

client := il.NewClient("YOUR_API_KEY")

result, err := client.ExtractDocument(il.ExtractDocumentRequest{
  Files: []il.FileInput{
    il.NewFileFromURL(
      "accounts-payable-invoice.pdf",
      "https://iterationlayer.com/code-samples/accounts-payable-invoice.pdf",
    ),
  },
  Schema: il.ExtractionSchema{
    "invoice_number": il.NewTextFieldConfig(
      "invoice_number",
      "The invoice number",
    ),
    "vendor": il.NewTextFieldConfig(
      "vendor",
      "The vendor legal name",
    ),
    "due_date": il.NewDateFieldConfig(
      "due_date",
      "The invoice due date",
    ),
    "line_items": il.NewArrayFieldConfig(
      "line_items",
      "Line items",
      []il.FieldConfig{
        il.NewTextFieldConfig("description", "Line item description"),
        il.NewCurrencyAmountFieldConfig("amount", "Line item amount"),
      },
    ),
    "total_due": il.NewCurrencyAmountFieldConfig(
      "total_due",
      "The final amount due",
    ),
  },
})
Response
{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-0042",
      "confidence": 0.97,
      "citations": ["Invoice #INV-2024-0042"],
      "source": "accounts-payable-invoice.pdf"
    },
    "vendor": {
      "type": "TEXT",
      "value": "Northwind Accounting Services GmbH",
      "confidence": 0.98,
      "citations": ["Northwind Accounting Services GmbH"],
      "source": "accounts-payable-invoice.pdf"
    },
    "due_date": {
      "type": "DATE",
      "value": "2024-04-14",
      "confidence": 0.96,
      "citations": ["Due Date: 2024-04-14"],
      "source": "accounts-payable-invoice.pdf"
    },
    "line_items": {
      "type": "ARRAY",
      "value": [
        {
          "description": {
            "value": "Month-end close automation workshop",
            "confidence": 0.98,
            "citations": ["Month-end close automation workshop"]
          },
          "amount": {
            "value": 720.00,
            "confidence": 0.96,
            "citations": ["USD 720.00"]
          }
        },
        {
          "description": {
            "value": "Invoice schema rollout and testing",
            "confidence": 0.97,
            "citations": ["Invoice schema rollout and testing"]
          },
          "amount": {
            "value": 480.00,
            "confidence": 0.95,
            "citations": ["USD 480.00"]
          }
        },
        {
          "description": {
            "value": "Vendor onboarding playbook update",
            "confidence": 0.95,
            "citations": ["Vendor onboarding playbook update"]
          },
          "amount": {
            "value": 190.00,
            "confidence": 0.94,
            "citations": ["USD 190.00"]
          }
        }
      ],
      "confidence": 0.97,
      "citations": [],
      "source": "accounts-payable-invoice.pdf"
    },
    "total_due": {
      "type": "CURRENCY_AMOUNT",
      "value": 1390.00,
      "confidence": 0.97,
      "citations": ["Total Due: USD 1,390.00"],
      "source": "accounts-payable-invoice.pdf"
    }
  }
}

Official SDKs for every major language

Install the SDK, set your API key, and start chaining requests. Full type safety, automatic retries, and idiomatic error handling included.

Your data stays in the EU

Your data is processed on EU servers and never stored beyond temporary logs. Zero retention, GDPR-compliant by design, with a Data Processing Agreement available for every customer. Learn more about our security practices .

No data storage, no model training

We don't store your files or processing results, and your data is never used to train or improve AI models. Logs are automatically deleted after 90 days.

EU-hosted infrastructure

All processing runs on servers located in the European Union. Your data never leaves the EU.

GDPR-compliant by design

Full compliance with EU data protection regulations. Data Processing Agreement available for all customers.

Pricing

Start with free trial credits. No credit card required.

Developer

For individuals & small projects

$29.99 /month
1,000 credits included
Most Popular

Startup

Save 40%

For growing teams

$119.99 /month
5,000 credits included

Business

Save 47%

For high-volume workloads

$319.99 /month
15,000 credits included

Or pay as you go from $0.022/credit with automatic volume discounts.

All APIs included Free trial credits per API Project-based budget caps Auto overage billing

Still evaluating?

Compare Iteration Layer against the biggest alternatives at a glance, then open the full head-to-head pages when you want the details.

Feature Iteration Layer DocuPipe Azure Document Intelligence Google Document AI
Schema-defined extraction
Yes
Define extraction fields via a purpose-built schema with 17 typed field types
Yes
Zero-shot extraction with custom schema definitions — no training required
Model-dependent
Requires choosing a pre-built model or training a custom one on labelled documents
Processor-based
Requires choosing and configuring a specific processor type in the GCP console for each document type
Confidence scores
Per field
Confidence score between 0 and 1 for every extracted schema field
Per field
Confidence metrics provided per extracted field value
Per field
Confidence scores provided per extracted field value
Per entity
Confidence scores provided per extracted entity
Source citations
Yes
Verbatim source citation from the document for every extracted field
Visual only
Source highlighting available in the review UI but no verbatim text citations in the API response
No
No source citation linking extracted values back to document text
No
No source citation linking extracted values back to document text
Multi-file support
Up to 20 files
Process up to 20 files in a single API request with merged extraction results
1 file
Each API request processes a single file
1 file
Each API request processes a single file
1 file
Each API request processes a single file

Frequently asked questions

How accurate is the extraction quality?
Our OCR benchmark shows strong extraction accuracy, reliability, and performance across 41 real workflow files, including forms, invoices, scans, tables, charts, and photos.
What file formats are supported?
The API accepts 40+ file formats including PDF, DOCX, PPTX, ODT, ODS, XLSX, EPUB, CSV, TSV, HTML, LaTeX, EML, Jupyter notebooks, and all common image formats. Scanned documents are processed with built-in OCR.
How does schema-based extraction work?
You define the structured data schema you want returned by describing each field with a name, type, and optional description. The API then extracts those fields from the document and returns them as typed JSON.
What are confidence scores?
Every extracted field includes a confidence score between 0 and 1 so you can decide what to automate directly and what to send to human review.
How many files can I send per request?
You can send up to 20 files per request. All files are combined into a single extraction result — the API pulls fields from across all documents. The total size limit is 200 MB with 50 MB per file.
Does it handle scanned documents?
Yes. The API includes built-in OCR for scanned documents and images. No separate OCR step is needed.
What happens when a field isn't found?
Missing fields return null with a confidence score of 0. You can use confidence thresholds to decide when to flag documents for manual review.

Built for how you work

Whether you're building pipelines in code, automating workflows, orchestrating AI agents, or shipping client projects — Iteration Layer fits your process.