Document Extraction vs Azure AI Document Intelligence: Schema-First or Model-First?

Pick a Model, Any Model

Azure AI Document Intelligence — formerly Form Recognizer — organizes document processing around models. Pre-built models for invoices, receipts, ID documents, tax forms, health insurance cards, contracts. Custom models you train yourself. A layout model. A read model. A general document model.

Before you extract a single field, you’re making an architectural decision: which model handles this document type? If your document doesn’t fit a pre-built model, you’re training a custom one. If it sort of fits but you need fields the pre-built model doesn’t expose, you’re training a custom one anyway.

This model-first approach works when your documents are standard. An invoice is an invoice. A receipt is a receipt. But real-world document processing isn’t that clean. You get supplier agreements with non-standard layouts. Employee onboarding packets that combine forms, contracts, and ID scans. Property documents that don’t map to any pre-built category.

For every document type Azure doesn’t have a model for, you’re back to training.

The Training Tax

Custom models in Azure Document Intelligence aren’t optional extras. They’re the escape hatch for everything the pre-built models don’t cover. And they come with real cost.

You need labeled training data — a minimum of 5 samples per document type for the classifier, and a meaningful dataset for accurate extraction. You label fields in Azure’s Studio tool, mapping regions on the document to field names. Then you kick off a training run. Neural models can train for up to 10 hours on the free tier before Azure starts charging $3/hour.

Once trained, you manage model versions. You deploy models to specific resources. You track which model version is in production and which is in staging. When document layouts change — and they always change — you retrain.

This is an entire ML operations workflow for what should be a data extraction problem.

Iteration Layer doesn’t have models to select, train, or manage. You define a schema. You send a document. You get structured JSON. If the document layout changes, the schema still works — the extraction adapts because it’s driven by field definitions, not by patterns learned from training samples.

Pre-built Models Lock You In

Azure’s pre-built invoice model extracts a fixed set of fields: VendorName, VendorAddress, CustomerName, InvoiceTotal, DueDate, and about two dozen others. If your invoice workflow needs exactly those fields, it works well.

But what if you need the payment terms as a structured clause? The contract reference number on a purchase order that happens to be formatted as an invoice? The environmental compliance certification number your procurement team requires? Those fields don’t exist in the pre-built model.

Your options: train a custom model to handle the extra fields, use the Query Fields add-on at $10 per 1,000 pages on top of the pre-built model cost, or post-process the raw layout output with your own extraction logic.

Each option adds complexity. Each one is a workaround for a fundamental constraint — the model decides what fields exist, not you.

With Iteration Layer, the schema is yours. Add a field, remove a field, rename a field. The extraction adjusts. No retraining, no add-on charges, no post-processing.

Pricing That Requires a Spreadsheet

Azure Document Intelligence pricing varies by model type. At published rates: the Read model runs $1.50 per 1,000 pages. Pre-built models — invoices, receipts, IDs, contracts — cost $10 per 1,000 pages. Custom extraction jumps to $30 per 1,000 pages. Add-on capabilities like high resolution and query fields add $6–$10 per 1,000 pages on top.

These tiers interact. A custom model that uses the layout model for labeling gets billed for both. A pre-built model with query fields gets billed for both. Neural model training beyond the free 10 hours costs $3/hour.

Estimating your monthly cost means knowing which model types you’ll use, how many pages each model will process, whether you need add-ons, and how often you’ll retrain custom models. It’s a spreadsheet exercise before you’ve processed a single document.

Commitment tiers add another layer. You can prepay for volume — 20,000 pages, 100,000 pages, 500,000 pages — at discounted rates. But each commitment tier is per model type. Committing to pre-built model volume doesn’t cover your custom model usage.

The Azure Ecosystem

Azure Document Intelligence runs inside Azure. That means an Azure subscription, a resource group, a region selection, and either API key or Microsoft Entra ID authentication.

For the SDK, you need an endpoint URL and credentials. Here’s the TypeScript setup for analyzing an invoice:

import { DocumentIntelligenceClient } from "@azure-rest/ai-document-intelligence";
import { AzureKeyCredential } from "@azure/core-auth";

const client = DocumentIntelligenceClient(
  "https://<resource>.cognitiveservices.azure.com",
  new AzureKeyCredential("<api-key>")
);

const poller = await client
  .path("/documentModels/{modelId}:analyze", "prebuilt-invoice")
  .post({
    contentType: "application/json",
    body: { urlSource: "https://example.com/invoice.pdf" },
  });

const result = await poller.pollUntilDone();

// Now traverse the result to find the fields you need
const invoiceFields = result.body.analyzeResult?.documents?.[0]?.fields;
const vendorName = invoiceFields?.VendorName?.valueString;
const invoiceTotal = invoiceFields?.InvoiceTotal?.valueCurrency?.amount;

Three things stand out. First, you’re choosing a model ID upfront — prebuilt-invoice — which means you need a different code path for receipts, IDs, or custom documents. Second, the analysis is asynchronous with polling. Third, the response requires traversal to get at the values, and those values are nested under model-specific field names.

Compare this with Iteration Layer’s approach — one API call, one code path, regardless of document type:

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.extract({
  files: [{ url: "https://example.com/contract.pdf" }],
  schema: {
    fields: [
      { name: "party_a", type: "text", description: "First party to the contract" },
      { name: "party_b", type: "text", description: "Second party to the contract" },
      { name: "effective_date", type: "date" },
      { name: "termination_clause", type: "textarea" },
      { name: "total_value", type: "currency_amount" },
    ],
  },
});

No model selection. No polling. No field name translation. The response mirrors your schema — the fields you defined are the fields you get back, with typed values and confidence scores.

That contract example would require a custom model in Azure. You’d label training samples, train the model, deploy it, and write extraction code against its specific field structure. With Iteration Layer, you describe what you want and send the document.

Typed Fields vs. Model-Dependent Output

Azure’s pre-built models return typed values for their predefined fields — valueCurrency, valueDate, valueString. But those types are fixed to the model’s field definitions. The invoice model knows that InvoiceTotal is a currency. A custom model returns whatever types you assigned during labeling, and the accuracy depends on your training data quality.

Iteration Layer’s schema supports 17 field types — text, textarea, integer, decimal, boolean, date, time, datetime, currency amount, currency code, email, country, address, IBAN, array, enum, and calculated. Each type handles parsing and normalization automatically. A CURRENCY_AMOUNT field reads “1.234,56” from a European document and returns a proper numeric value. An ADDRESS field returns a decomposed object with street, city, region, postal code, and country.

These types work the same on every document. You don’t need a pre-built model that happens to know about currency fields. You just declare a field as currency_amount and the parser handles the rest.

Confidence That Means Something

Both services return confidence scores, but they measure different things. Azure’s confidence on pre-built model fields reflects how certain the model is that it found and correctly extracted the predefined field. On custom models, confidence depends on training data quality and quantity — the more representative your training set, the better the scores.

Iteration Layer returns confidence scores on every field of every extraction — no model quality variance, no training data dependency. The score tells you how confident the parser is that the extracted value matches your field definition. Build thresholds into your pipeline: auto-accept high confidence, route low confidence to human review.

Side-by-Side

Capability	Azure Document Intelligence	Iteration Layer
Extraction approach	Select a model or train a custom one	Define a schema
Custom document types	Requires labeled training data	Add fields to your schema
Field definitions	Fixed per model or defined during training	17 typed fields, user-defined per request
Response format	Model-specific nested structure	Flat JSON matching your schema
Training required	Yes, for custom models	None
Model management	Versioning, deployment, retraining	None
Files per request	Up to 2,000 pages per analysis	Up to 20 files per request
Supported formats	PDF, JPEG, PNG, TIFF, BMP, HEIF, DOCX (Read/Layout only)	PDF, DOCX, XLSX, CSV, HTML, JPEG, PNG, GIF, WebP
Infrastructure	Azure subscription, resource group, region	API key
Data residency	Configurable per Azure region	EU-hosted (Frankfurt)

When Azure Document Intelligence Makes Sense

If you’re processing high volumes of standard document types — invoices, receipts, IDs — and you’re already in the Azure ecosystem, the pre-built models are fast and accurate. The commitment tiers make the per-page cost reasonable at scale. And if you have an ML team that can build and maintain custom models, the training workflow is well-documented.

But if your documents don’t fit neatly into pre-built categories, if you need flexibility to change field definitions without retraining, or if you’d rather not manage model versions alongside your application code — you’re paying the model-first tax on every new document type.

Iteration Layer’s schema-first approach means every document type costs the same amount of effort: write a schema. No training data collection, no labeling sessions, no model deployment. The extraction works from the schema definition alone.

Get Started

Check the docs for the full schema reference, all 17 field types, and working examples for common document types. The TypeScript and Python SDKs handle authentication, file uploads, and response typing — going from “I have a document” to “I have structured data” is a few lines of code.

And because Document Extraction is part of a composable API suite, the structured data it returns flows directly into Document Generation or Image Generation — same auth, same credit pool, no glue code.

Ingest

Transform

Generate

Categories

Featured

Overview

APIs

Integrations