Document Extraction vs Google Document AI: Schema-Based or Processor-Based?

7 min read Document Extraction

The GCP Tax on Document Parsing

Before you extract a single field with Google Document AI, you need a GCP project. A billing account. A service account with the right IAM roles. You need to enable the Document AI API, pick a region (not all processors are available in all regions), and decide which processor type fits your document. OCR? Form Parser? Custom Extractor? Document Classifier?

That’s the setup. You haven’t parsed anything yet.

Google’s architecture is processor-based. You create a processor instance for a specific document type, send documents to it, and get results back. Different document types need different processors. A bank statement and an invoice need separate processor instances — or a custom extractor trained on labeled examples.

This makes sense if you’re Google. Processors map neatly to infrastructure that can be optimized, versioned, and billed per page. But if you’re a developer who just needs structured data from a PDF, processors are someone else’s abstraction leaking into your codebase.

The Training Data Problem

Google’s pre-built processors — OCR, Form Parser, Invoice Parser, Expense Parser — handle common document types reasonably well. But the moment your documents don’t fit a pre-built category, you’re building a custom processor.

Custom processors require labeled training data. The minimum is 10 documents, but Google’s own documentation recommends at least 50 for decent accuracy. You upload sample documents, manually annotate the fields you want to extract, train the processor, evaluate its performance, and iterate.

This is machine learning ops disguised as an API. You’re not calling an endpoint — you’re training a model. That means:

  • Collecting representative sample documents (which you may not have yet)
  • Manually labeling fields across dozens of documents in Google’s console
  • Waiting for training runs to complete
  • Evaluating precision and recall on a test set
  • Retraining when new document layouts appear

For teams with ML infrastructure and annotators on staff, this is manageable. For a developer building a SaaS app that needs to parse supplier invoices, it’s a detour measured in weeks.

Sync Limits and Bucket Dependencies

Google Document AI caps synchronous requests at 10 pages. If your document is longer — and plenty of real-world documents are — you need batch processing. Batch processing requires a Google Cloud Storage bucket for input and output. So now your document parsing pipeline has a storage dependency, and you’re managing GCS permissions alongside your Document AI permissions.

The workflow becomes: upload document to GCS, trigger batch processing, poll for completion, download results from GCS, parse the response. Four steps where one should suffice.

For high-volume use cases, this is defensible engineering. For the common case — parse this 15-page contract and give me the parties, dates, and clauses — it’s overhead that exists because of infrastructure constraints, not because your problem is complex.

Format Gaps

Google Document AI supports PDF, TIFF, GIF, JPEG, PNG, BMP, and WebP. That covers image-based documents well. But it doesn’t support DOCX, XLSX, or CSV.

In practice, a lot of structured data lives in spreadsheets and Word documents. If a client sends you a vendor list as an Excel file or a contract as a Word document, you need a separate pipeline to handle those formats before Document AI can touch them. That’s more glue code, more failure modes, more things to maintain.

Iteration Layer’s Document Extraction handles PDFs, images, DOCX, XLSX, CSV, and more — the formats documents actually arrive in. One endpoint, regardless of format.

Per-Page Pricing at Scale

Google charges approximately $0.06 per page for OCR processing, with higher rates for specialized processors. That sounds cheap until you do the math on volume.

A 20-page contract at $0.06/page costs $1.20. Process 1,000 contracts a month and you’re at $1,200 — just for extraction, before any storage or compute costs for the GCS buckets, the processing pipeline, and the infrastructure to manage it all.

Custom processors cost more. The Form Parser and specialized extractors have their own pricing tiers. And because batch processing requires GCS, you’re also paying for storage and network egress on top of per-page fees.

Schema-Based Extraction: Define What You Want

Iteration Layer takes a different approach. There are no processors to create, no models to train, no GCP accounts to configure. You define a schema describing the fields you want, send your document, and get structured JSON back.

Here’s what extracting data from a bank statement looks like:

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.extract({
  files: [{ url: "https://example.com/bank-statement.pdf" }],
  schema: {
    fields: [
      { name: "account_holder", type: "text" },
      { name: "iban", type: "iban" },
      { name: "statement_period", type: "text" },
      { name: "opening_balance", type: "currency_amount" },
      { name: "closing_balance", type: "currency_amount" },
      { name: "transactions", type: "array", fields: [
        { name: "date", type: "date" },
        { name: "description", type: "text" },
        { name: "amount", type: "currency_amount" },
      ]},
    ],
  },
});

That’s it. No processor creation. No training data. No GCS buckets. The schema is the configuration — it tells the extraction engine what to look for and what types to expect.

Want to parse invoices tomorrow? Change the schema. Contracts? Change the schema. Medical records, shipping manifests, insurance claims? Same API, different schema. You don’t create a new processor for each document type — you describe the output you want.

17 Typed Fields With Confidence Scores

The schema isn’t just field names and “string.” Document Extraction supports 17 typed fields: text, textarea, integer, decimal, date, datetime, time, enum, boolean, email, iban, country, currency_code, currency_amount, address, array, and calculated. Each type carries validation semantics — an iban field returns a validated IBAN, a currency_amount returns a structured value with amount and currency code.

Every extracted field includes a confidence score between 0 and 1, plus source citations pointing back to the exact location in the document where the value was found. You know not just what was extracted, but how confident the engine is and where it found the data. That’s the difference between a black box and a system you can build reliable automation on.

Low confidence scores become quality gates. Route high-confidence extractions straight to your database. Flag low-confidence results for human review. You make the threshold decision — the API gives you the data to make it.

Multi-File, No Extra Steps

Need to extract from multiple documents in a single request? Pass multiple files. The API handles them together, which matters when related information spans multiple documents — a cover letter and an attached invoice, for example.

There’s no batch mode with separate input/output buckets. No polling for completion on a separate endpoint. Send files, get JSON. The complexity stays on our side.

The Comparison, Condensed

Google Document AI Iteration Layer
Setup GCP project, service account, IAM, billing, region selection API key
Configuration Create processor per document type Define schema per request
Custom documents Train custom processor (50+ labeled samples) Change the schema
Sync page limit 10 pages No artificial limit
File formats PDF, TIFF, GIF, JPEG, PNG, BMP, WebP PDF, DOCX, XLSX, CSV, images, and more
Typed fields Key-value pairs, tables, entities 17 typed fields with validation
Confidence data Confidence scores Confidence scores + source citations
Multi-file Separate requests or batch via GCS Single request, multiple files
Data residency Configurable per GCP region EU-hosted (Frankfurt)

When Google Document AI Makes Sense

Google Document AI is a strong choice if you’re already deep in the GCP ecosystem, your team has ML ops experience, and your document volumes justify the infrastructure investment. The pre-built processors for invoices and receipts work well for their specific use cases, and the custom processor pipeline — while heavy — gives you fine-grained control if you need it.

If you’re processing millions of pages of the same document type and you have annotators to label training data, the processor model pays off over time.

When a Schema Is Enough

For most document extraction use cases, you don’t need a trained model. You need to describe the output shape and get structured data back. The document types change. New clients send new formats. You don’t have 50 labeled samples — you have a deadline and a PDF.

Iteration Layer’s Document Extraction is built for that reality. Define the schema, send the document, get JSON with typed fields, confidence scores, and source citations. No GCP setup, no processor management, no training pipeline. The schema is your configuration, and changing it takes seconds.

Start Extracting

Check out the docs to see the full schema definition, all 17 field types, and working examples for common document types. An API key is all the setup you need.

And because Document Extraction is part of a composable API suite, the structured data it returns flows directly into Document Generation or Image Generation — same auth, same credit pool, no glue code.

Start building in minutes

Free trial included. No credit card required.