Document Extraction vs Nanonets: Schema-Based or Model-Based?

Train a Model or Define a Schema

There are two ways to get structured data out of documents. You can train a model that learns what an invoice looks like, then use that model to extract fields from future invoices. Or you can define a schema — the fields you want, the types they should be — and send it alongside the document. No training. No model. Just a description of what you need.

Nanonets follows the first approach. You create a model for each document type, upload training samples, label the fields you want extracted, train the model, then hit the API with new documents. Need to handle receipts too? New model. Contracts? New model. Each one needs its own labeled training data before it can extract anything.

Iteration Layer follows the second approach. One API endpoint. You define a schema, attach a document, and get structured JSON back. The schema works on any document type — invoices, contracts, employment agreements, medical forms — without training a single model.

Both approaches produce structured data from unstructured documents. The difference is everything that happens before the first extraction.

The Model Training Tax

With Nanonets, you don’t write code first. You train. Their workflow looks like this:

Create a new model for your document type
Upload 5–50+ sample documents
Label the fields you want to extract in each sample
Train the model (minutes to hours depending on complexity)
Test the model against held-out samples
Iterate on labels and retraining until accuracy is acceptable
Deploy the model and integrate the API

For a single document type, this is manageable. For ten document types, you’re managing ten models — each with its own training data, its own accuracy characteristics, its own retraining cycle when the document format changes.

And formats do change. A supplier updates their invoice layout. A government agency redesigns a form. A client switches HR systems and the pay stubs look different now. Each change potentially means relabeling samples and retraining the affected model.

With a schema-based approach, none of that exists. The extraction logic isn’t learned from labeled examples — it’s driven by the schema you provide at request time. If the document layout changes, you send the same schema and get the same fields. If you need a new field, you add it to the schema. No relabeling, no retraining, no waiting for a training job to finish.

What the Schema Gives You

A schema isn’t just a list of field names. Iteration Layer supports 17 typed fields, and the types do real work during extraction.

Define a field as currency_amount and you get a numeric value with proper decimal handling — not a string containing “$4,250.00” that you have to parse yourself. Define address and the API decomposes it into street, city, region, postal code, and country. Define iban and the extraction validates the format. Define date and you get ISO 8601, regardless of whether the document says “February 27, 2026” or “27/02/2026” or “2026-02-27”.

The full list: TEXT, TEXTAREA, INTEGER, DECIMAL, BOOLEAN, DATE, DATETIME, TIME, EMAIL, IBAN, COUNTRY, CURRENCY_CODE, CURRENCY_AMOUNT, ADDRESS, ARRAY, ENUM, and CALCULATED.

CALCULATED fields reference other extracted fields. Define totalCheck as unitPrice * quantity and the API computes it during extraction. Compare it to the invoiceTotal field and you’ve got built-in validation — without a single line of post-processing code.

With Nanonets, the model defines what fields exist based on training. The output is whatever the model learned to extract. Type handling, normalization, validation, decomposition — that’s all your code.

Block-Based Pricing vs. Per-Page Pricing

Nanonets restructured their pricing in January 2025, moving from a per-page model to block-based billing. Each block in a workflow is priced separately. An OCR block costs one rate. A classification block costs another. An extraction block costs another. Chain three blocks together and you’re paying for three operations per document.

The problem isn’t the price per block — it’s predictability. When your workflow has multiple blocks, estimating the cost of processing 10,000 documents requires understanding exactly which blocks fire, how many pages each block processes, and whether any blocks run conditionally. That’s a spreadsheet exercise before you can forecast a monthly bill.

This is a side effect of Nanonets’ architecture. Their product is built around workflows — visual pipelines where you chain blocks together. OCR block into classification block into extraction block into export block. Every block in the chain is a billing event.

Iteration Layer is API-first. One endpoint, one request, one response.

Workflows vs. API Calls

The workflow model is where the philosophical difference between the two products is sharpest.

Nanonets wants you to build inside their platform. You design workflows in their UI, connect blocks, configure triggers, set up export destinations. The platform handles orchestration, retries, and routing. For teams that want a no-code or low-code document processing pipeline, this makes sense.

But if you’re a developer building a product, you probably don’t want your document extraction logic living inside a third-party workflow builder. You want an API you can call from your backend, your serverless function, your data pipeline. You want the extraction step to be one line in your code, not a block in someone else’s DAG.

Iteration Layer is built for this. No workflow builder. No visual editor. No blocks. Just an API endpoint that takes a document and a schema and returns JSON.

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.extract({
  files: [{ url: "https://example.com/employment-contract.pdf" }],
  schema: {
    fields: [
      { name: "employee_name", type: "text" },
      { name: "employee_email", type: "email" },
      { name: "start_date", type: "date" },
      { name: "job_title", type: "text" },
      { name: "annual_salary", type: "currency_amount" },
      { name: "work_address", type: "address" },
      { name: "probation_period_in_months", type: "integer" },
    ],
  },
});

Seven fields, four different types, one request. The address field comes back decomposed. The currency_amount comes back as a number. The date comes back as ISO 8601. No training data, no workflow blocks, no model management.

Want to extract from a different document type tomorrow? Change the schema. That’s it.

File Format Support

Nanonets processes PDFs and images — scanned documents, photos of receipts, and similar inputs. Their strength is OCR: taking a picture of a document and turning it into machine-readable text.

Iteration Layer also handles PDFs and images, but adds native parsing for DOCX, XLSX, CSV, JSON, HTML, Markdown, and plain text. Native means an Excel file is parsed as structured data, not OCR’d as an image of a spreadsheet. A CSV is read as rows and columns, not as a photo of a table.

This matters because OCR is lossy. Even good OCR misreads characters — especially in dense numeric tables where a “1” becomes an “l” or a “0” becomes an “O”. If the source file is already structured, parsing it natively eliminates that entire error class.

If your use case is purely scanned paper documents and photos, both tools handle it. If you also process digital documents — spreadsheets from accounting systems, Word documents from legal, CSV exports from SaaS tools — native parsing is a meaningful accuracy advantage.

Confidence Scores and Citations

Every field Iteration Layer extracts comes with a confidence score and a source citation. The confidence score tells you how certain the extraction is. The source citation shows the verbatim text from the document that produced the value.

This matters for automation boundaries. Set a threshold — say, 0.9 — and auto-process everything above it. Route everything below to a human review queue. The citation lets the reviewer verify the extraction without opening the original document. They see the extracted value, the confidence, and the exact text it came from. Quick decision, move on.

Nanonets provides confidence scores through their model predictions. But the scores are tied to the trained model’s certainty, which depends on how representative your training data was. A model trained on 20 invoices from one supplier might be very confident on similar invoices and very uncertain on invoices from a new supplier — not because the document is ambiguous, but because the model hasn’t seen that layout before.

Schema-based extraction doesn’t have this problem. There’s no training distribution to overfit. The confidence reflects the extraction quality for this specific document against this specific schema.

Where Nanonets Has the Edge

Being fair about what Nanonets does well:

No-code workflows. If your team doesn’t have developers and needs to process documents through a visual pipeline builder, Nanonets’ workflow UI is genuinely useful. Drag blocks, connect them, configure exports. No code required.
Pre-built models. Nanonets ships trained models for common document types — invoices, receipts, bank statements, purchase orders. If your use case is a common one, you can skip the training step and use a pre-built model immediately.
Export integrations. Built-in exports to QuickBooks, Xero, SAP, Google Sheets, and other destinations. If you want to route extracted data directly into an accounting system without writing integration code, Nanonets has those connectors.
Classification. Their workflow blocks include document classification — routing different document types to different extraction models. If you receive a mixed stream of documents and need to sort them before extracting, that’s built into the platform.

These are real strengths for specific use cases. If you’re an operations team automating invoice processing into QuickBooks and nobody on the team writes code, Nanonets’ workflow approach might be exactly right.

When to Choose What

Pick Nanonets if:

You want a no-code workflow builder for document processing
Pre-built models for common document types (invoices, receipts) fit your needs
You need built-in export integrations to accounting and ERP systems
Document classification across a mixed document stream is a core requirement

Pick Iteration Layer if:

You’re a developer building document extraction into a product or pipeline
You process many document types and don’t want to train and manage a model for each one
You need typed field extraction — address decomposition, currency parsing, IBAN validation, calculated fields — without post-processing code
You process a mix of PDFs, spreadsheets, Word documents, CSVs, and other digital formats
Predictable, API-first billing matters more than visual workflow orchestration

Get Started

Check the docs for the full schema reference, all 17 field types, and SDK guides for TypeScript and Python. Define a schema, send a document, see what comes back.

And because Document Extraction is part of a composable API suite, the structured data it returns flows directly into Document Generation or Image Generation — same auth, same credit pool, no glue code.

Iteration Layer runs on EU infrastructure (Frankfurt), which matters if your data residency requirements rule out US-hosted services.

Sign up for a free account — no credit card required. Your first extraction is a few lines of code, not a training pipeline.

Ingest

Transform

Generate

Categories

Featured

Overview

APIs

Integrations