Document Extraction vs AWS Textract: One API Call vs Five

7 min read Document Extraction

Five APIs to Extract One Document

AWS Textract doesn’t have a document extraction API. It has five.

  • DetectDocumentText — raw OCR, returns lines and words
  • AnalyzeDocument — forms, tables, queries, and signatures, each billed as a separate feature
  • AnalyzeExpense — receipts and invoices
  • AnalyzeID — driver’s licenses and passports
  • AnalyzeLending — mortgage documents

Each API has its own endpoint, its own response format, and its own pricing tier. If you need to extract structured data from a general business document — say a supplier contract with tables, key-value pairs, and a few specific fields — you’re calling AnalyzeDocument with the Forms and Tables features enabled, plus Queries for the fields that don’t fit neatly into either category.

That’s three feature charges for one page. At Textract’s published rates, a single page analyzed for forms ($0.05), tables ($0.015), and queries ($0.015 per query) runs $0.08 or more. Process 10,000 pages a month and the bill adds up fast — before you’ve written a line of business logic.

The Response You Have to Decode

Textract’s response format reflects its origins as an OCR tool. You get back a flat list of Block objects — each one a page, line, word, key-value pair, table, cell, or query result. Relationships between blocks are expressed as ID references. To reconstruct a table, you find the TABLE block, follow its CHILD relationships to CELL blocks, follow each cell’s CHILD relationships to WORD blocks, and concatenate the words.

Here’s what a key-value pair looks like in Textract’s response:

{
  "BlockType": "KEY_VALUE_SET",
  "EntityTypes": ["KEY"],
  "Relationships": [
    { "Type": "VALUE", "Ids": ["value-block-id"] },
    { "Type": "CHILD", "Ids": ["word-block-id-1", "word-block-id-2"] }
  ]
}

You still need to traverse the block graph to figure out that this key is “Invoice Number” and its value is “INV-2024-0042”. Then you need to map “Invoice Number” to whatever field name your application expects. Textract finds text on a page. Turning that text into typed, structured data is your problem.

No Schema, No Types

This is the fundamental gap. Textract doesn’t let you define what you want to extract. You send a document, it sends back everything it finds, and you filter and transform the results yourself.

There’s no concept of “this field is a date” or “this field is a currency amount.” Every value comes back as a string. If Textract reads “1.234,56” from a European invoice, you get the string “1.234,56” and you handle the decimal convention yourself. If it reads “March 15, 2026” or “15/03/2026” or “2026-03-15,” you get a string each time. Parsing and normalizing is downstream work.

The Queries feature gets closer — you can ask “What is the invoice number?” in natural language and Textract returns its best guess. But the result is still an untyped string. And queries are billed per query per page, so asking ten questions about a one-page document is ten charges.

The AWS Ecosystem Tax

Textract doesn’t work in isolation. It works inside AWS.

For synchronous processing, you pass the document as raw bytes. That works for single-page documents under 10 MB. For multi-page PDFs or anything async, you upload to S3 first, then start an async job, then poll for completion or set up an SNS topic to get notified. Processing the notification means writing a Lambda function or running a service that subscribes to the topic.

Authentication goes through IAM — roles, policies, and trust relationships. Your extraction pipeline now depends on four AWS services before it does anything useful.

If you’re already deep in the AWS ecosystem, this might feel natural. If you’re not, it’s a lot of infrastructure for “get me the invoice number from this PDF.”

Three Formats, Take It or Leave It

Textract supports PDF, JPEG, PNG, and TIFF. That’s it.

No DOCX. No XLSX. No CSV. No HTML. If someone emails you a Word document with tabular data, you need to convert it to PDF before Textract can touch it. That conversion step — whether it’s LibreOffice in a Docker container or a third-party service — adds another moving part to your pipeline.

One Call, One Schema, Typed Results

The Iteration Layer Document Extraction API takes a different approach. You define a schema describing the fields you want, send the document, and get back typed, structured JSON — with confidence scores for every field.

Here’s a receipt extraction:

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.extract({
  files: [{ url: "https://example.com/receipt.jpg" }],
  schema: {
    fields: [
      { name: "merchantName", type: "text" },
      { name: "transactionDate", type: "date" },
      { name: "totalAmount", type: "currency_amount" },
      { name: "taxAmount", type: "currency_amount" },
      { name: "paymentMethod", type: "text" },
      { name: "items", type: "array", fields: [
        { name: "name", type: "text" },
        { name: "price", type: "currency_amount" },
      ]},
    ],
  },
});

That’s the entire integration. No block graph traversal. No ID reference chasing. No string-to-type conversion. The totalAmount field comes back as a properly parsed currency value, not a string you have to interpret. The transactionDate comes back as a date, regardless of whether the receipt says “Feb 27, 2026” or “27/02/2026” or “2026-02-27.”

17 Typed Fields vs. Raw Strings

The schema supports 17 field types — text, textarea, integer, decimal, boolean, date, time, datetime, currency amount, currency code, email, country, address, IBAN, array, enum, and calculated. Each type has built-in parsing, normalization, and validation.

An ADDRESS field returns a decomposed object with street, city, region, postal code, and country — not a single string you need to split. An IBAN field validates the check digits. A CURRENCY_AMOUNT field handles European and US decimal conventions without you writing a single line of normalization code.

Textract returns strings. What those strings mean is your downstream problem. Iteration Layer pushes that work into the extraction step, where the AI model has the full document context to get it right.

Confidence Scores and Citations

Every extracted field includes a confidence score between 0.0 and 1.0. You know immediately whether the parser is certain about a value or guessing.

Build your pipeline around thresholds: auto-accept above 0.90, flag for human review between 0.70 and 0.90, reject below 0.70. This turns extraction from a binary pass/fail into a graduated process where you catch problems before they hit production.

Textract also returns confidence scores for its blocks, but those scores apply to OCR accuracy — did it read the characters correctly? They don’t tell you whether the value it found actually corresponds to the field you care about. Textract might read “INV-2024-0042” with 99% OCR confidence, but if it assigned that string to the wrong key-value pair, the confidence score doesn’t help.

Iteration Layer’s scores reflect extraction accuracy — did the parser find the right value for the field you defined? That distinction matters when you’re building automated pipelines.

Multi-File, Multi-Format

A single extraction request accepts up to 20 files. Mix formats freely — a PDF invoice, a JPEG receipt photo, a DOCX contract — same schema, same call, same response structure.

Textract processes one document per API call. Batch processing means managing a queue of async jobs, each producing its own set of blocks that you reconcile into a unified result. The orchestration code dwarfs the extraction logic.

Iteration Layer supports PDF, DOCX, XLSX, CSV, HTML, JPEG, PNG, GIF, and WebP. No conversion steps, no format-specific pipelines. Send the file, get the data.

Side-by-Side

Capability AWS Textract Iteration Layer
API calls for full extraction 2–5, depending on features 1
Schema definition None (Queries are close but limited) Full schema with 17 typed fields
Response format Block graph with ID references Flat JSON matching your schema
Field types Strings only 17 types with built-in parsing
Confidence meaning OCR character accuracy Extraction correctness
Files per request 1 Up to 20
Supported formats PDF, JPEG, PNG, TIFF PDF, DOCX, XLSX, CSV, HTML, JPEG, PNG, GIF, WebP
Infrastructure required S3, SNS, Lambda, IAM API key
Data residency US regions (default) EU-hosted (Frankfurt)

When Textract Makes Sense

Textract isn’t the wrong choice for everyone. If you’re already running on AWS, your documents are all PDFs or images, you only need raw text extraction, and you have an engineering team that can build and maintain the integration layer — Textract is a solid OCR engine.

But if you need structured, typed data from documents — with a defined schema, confidence scores that reflect extraction quality, and support for formats beyond PDF and images — you’re going to spend weeks building the layer that turns Textract’s raw output into something your application can use. That layer is what Iteration Layer ships as the product.

Get Started

Check the docs for the full schema reference, field type definitions, and SDK guides. The TypeScript and Python SDKs handle authentication, file uploads, and response parsing — your integration is a few lines of code.

And because Document Extraction is part of a composable API suite, the structured data it returns flows directly into Document Generation or Image Generation — same auth, same credit pool, no glue code.

Sign up for a free account, no credit card required, and run your documents against a schema. Compare the results to what you’re getting from Textract today.

Start building in minutes

Free trial included. No credit card required.