Human in the Loop: Using Confidence Scores to Build Reliable Document Extraction

Why Fully Automated Extraction Fails

Every document extraction project starts with the same pitch: upload a PDF, get structured JSON, never look at the document again. It works in the demo. It falls apart in production.

The problem isn’t that AI extraction is inaccurate — it’s that it’s inconsistently accurate. A well-structured invoice from a regular supplier extracts perfectly. A scanned contract with coffee stains and handwritten annotations does not. And when the extraction is wrong, you don’t find out until downstream: a wrong invoice total breaks a payment run, a wrong contract date triggers incorrect compliance alerts, a wrong address sends a shipment to the wrong city.

Fully automated pipelines without human oversight don’t survive contact with messy real-world documents. But the opposite extreme — manual review of every extracted field — defeats the purpose of automation entirely. When the job is full-text search or RAG, Document to Markdown is the better first step; when the job is typed business fields, confidence-scored Document Extraction is the control point.

The answer is a human-in-the-loop approach: automate the cases where the AI is reliably correct, and route the uncertain ones to a human reviewer. The question is how to tell the difference. That’s where confidence scores come in. Per-field confidence scores give your pipeline a built-in uncertainty signal — a way to separate the extractions the model is sure about from the ones it’s guessing at. Instead of choosing between “trust everything” and “review everything,” you build a review flow that puts human attention exactly where it matters.

This is the architecture that makes document extraction work at scale: not pure automation, and not pure manual review, but a calibrated loop where humans handle the edge cases and the system handles the rest.

Stanford’s 2026 Enterprise AI Playbook gives that pattern external support. Across successful enterprise deployments, escalation-based workflows — where AI handles 80% or more of work autonomously and humans review exceptions — delivered 71% median productivity gains, compared with 30% for approval models where humans reviewed every output. For document extraction, field-level confidence scores are what make that escalation model practical.

What Confidence Scores Measure

The Iteration Layer Document Extraction API returns a confidence score between 0.0 and 1.0 for every extracted field. The score reflects how certain the extraction model is about a specific value given the source document.

A confidence of 0.97 means the model found a clear, unambiguous value. A confidence of 0.72 means the model extracted something, but the source was degraded, ambiguous, or formatted in an unexpected way. A confidence of 0.35 means the model is guessing.

Confidence is not accuracy. A field with 0.95 confidence can still be wrong — the model was very sure about an incorrect value. But across thousands of extractions, higher confidence correlates strongly with correctness. A 0.95 field is right far more often than a 0.65 field.

Factors that affect confidence:

Scan quality — blurry, skewed, or low-resolution scans reduce confidence
Document structure — well-structured invoices extract at higher confidence than free-form letters
Field ambiguity — a single clearly labeled “Total” extracts at higher confidence than a table with multiple subtotals
Handwriting — handwritten values have lower confidence than printed text
Language mixing — documents mixing languages or scripts reduce confidence for affected fields

Per-Field vs. Document-Level Confidence

Iteration Layer provides confidence per field, not per document. This is a deliberate design decision that changes how you build your pipeline.

A single invoice might return:

Code

{
  "invoiceNumber": {
    "type": "TEXT",
    "value": "INV-2026-4521",
    "confidence": 0.97,
    "citations": [
      "Invoice #INV-2026-4521"
    ],
    "source": "invoice.pdf"
  },
  "vendorName": {
    "type": "TEXT",
    "value": "Bergmann Elektronik GmbH",
    "confidence": 0.94,
    "citations": [
      "Bergmann Elektronik GmbH"
    ],
    "source": "invoice.pdf"
  },
  "totalAmount": {
    "type": "CURRENCY_AMOUNT",
    "value": 3847.50,
    "confidence": 0.96,
    "citations": [
      "Total: 3.847,50 €"
    ],
    "source": "invoice.pdf"
  },
  "shippingAddress": {
    "type": "ADDRESS",
    "value": {
      "street": "Industriestrasse 12",
      "city": "Stuttgart",
      "region": "BW",
      "postal_code": "70469",
      "country": "DE"
    },
    "confidence": 0.72,
    "citations": [
      "Industriestrasse 12, Stuttgart, TX 78701"
    ],
    "source": "invoice.pdf"
  }
}

The invoice number, vendor name, and total are high confidence. The shipping address is lower — maybe the scan was smudged in that area, or the layout was ambiguous. With per-field confidence, you auto-accept the three reliable fields and only route the address for human review. Without per-field confidence, you’d have to review the entire document for one uncertain field.

This is dramatically more efficient. A human reviewer sees one pre-filled field that needs confirmation instead of re-checking every field on the page.

Threshold Strategies

The most direct use of confidence scores is threshold-based routing. Define boundaries and route each field accordingly.

Three-Tier Routing

The most common pattern splits fields into three buckets:

Auto-accept — confidence is high enough to write directly to your database
Review — confidence is in a gray zone; pre-fill the value and ask a human to confirm or correct
Manual entry — confidence is too low to be useful; ask a human to enter the value from scratch

The threshold values depend on your domain and the cost of errors. Here’s a starting point:

Tier	Threshold	Action
Auto-accept	>= 0.92	Write to database
Review	>= 0.70	Pre-fill for human confirmation
Manual entry	< 0.70	Human enters from scratch

Financial data (payment amounts, tax calculations) warrants higher thresholds — 0.95 or above for auto-accept. Content aggregation pipelines can tolerate lower thresholds — 0.85 for auto-accept might be fine when the cost of a wrong value is a minor inconvenience, not a financial discrepancy.

Per-Field Thresholds

Not every field deserves the same threshold. A wrong invoice number is annoying but correctable. A wrong payment amount triggers a wrong payment. Set thresholds based on the business cost of getting each field wrong.

BashTypeScriptPythonGo

# Extract with schema, then route based on confidence
curl -X POST https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer $ITERATION_LAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "type": "url",
        "name": "invoice.pdf",
        "url": "https://example.com/invoices/INV-2026-4521.pdf"
      }
    ],
    "schema": {
      "fields": [
        {
          "type": "TEXT",
          "name": "invoiceNumber",
          "description": "The invoice number or identifier"
        },
        {
          "type": "TEXT",
          "name": "vendorName",
          "description": "The name of the vendor or supplier"
        },
        {
          "type": "CURRENCY_AMOUNT",
          "name": "subtotal",
          "description": "The subtotal before tax"
        },
        {
          "type": "CURRENCY_AMOUNT",
          "name": "taxAmount",
          "description": "The tax amount"
        },
        {
          "type": "CURRENCY_AMOUNT",
          "name": "totalDue",
          "description": "The total amount due including tax"
        }
      ]
    }
  }'

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: process.env.ITERATION_LAYER_API_KEY!,
});

const THRESHOLD_BY_FIELD: Record<string, number> = {
  invoiceNumber: 0.90,
  vendorName: 0.88,
  invoiceDate: 0.90,
  subtotal: 0.95,
  taxAmount: 0.95,
  totalDue: 0.95,
};

const DEFAULT_THRESHOLD = 0.90;

const result = await client.extractDocument({
  files: [
    {
      type: "url",
      name: "invoice.pdf",
      url: "https://example.com/invoices/INV-2026-4521.pdf",
    },
  ],
  schema: {
    fields: [
      {
        type: "TEXT",
        name: "invoiceNumber",
        description: "The invoice number or identifier",
      },
      {
        type: "TEXT",
        name: "vendorName",
        description: "The name of the vendor or supplier",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "subtotal",
        description: "The subtotal before tax",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "taxAmount",
        description: "The tax amount",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "totalDue",
        description: "The total amount due including tax",
      },
    ],
  },
});

const routedFields = Object.entries(result).map(
  ([fieldName, fieldResult]) => {
    const threshold =
      THRESHOLD_BY_FIELD[fieldName] ?? DEFAULT_THRESHOLD;

    if (fieldResult.confidence >= threshold) {
      return {
        action: "accept" as const,
        field: fieldName,
        value: fieldResult.value,
        confidence: fieldResult.confidence,
      };
    }

    if (fieldResult.confidence >= 0.70) {
      return {
        action: "review" as const,
        field: fieldName,
        value: fieldResult.value,
        confidence: fieldResult.confidence,
      };
    }

    return {
      action: "manual" as const,
      field: fieldName,
      value: null,
      confidence: fieldResult.confidence,
    };
  },
);

from iterationlayer import IterationLayer

client = IterationLayer(api_key=os.environ["ITERATION_LAYER_API_KEY"])

THRESHOLD_BY_FIELD = {
    "invoiceNumber": 0.90,
    "vendorName": 0.88,
    "invoiceDate": 0.90,
    "subtotal": 0.95,
    "taxAmount": 0.95,
    "totalDue": 0.95,
}

DEFAULT_THRESHOLD = 0.90

result = client.extract_document(
    files=[
        {
            "type": "url",
            "name": "invoice.pdf",
            "url": "https://example.com/invoices/INV-2026-4521.pdf",
        }
    ],
    schema={
        "fields": [
            {
                "type": "TEXT",
                "name": "invoiceNumber",
                "description": "The invoice number or identifier",
            },
            {
                "type": "TEXT",
                "name": "vendorName",
                "description": "The name of the vendor or supplier",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "subtotal",
                "description": "The subtotal before tax",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "taxAmount",
                "description": "The tax amount",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "totalDue",
                "description": "The total amount due including tax",
            },
        ]
    },
)

routed_fields = []
for field_name, field_result in result.items():
    threshold = THRESHOLD_BY_FIELD.get(field_name, DEFAULT_THRESHOLD)

    if field_result["confidence"] >= threshold:
        routed_fields.append({
            "action": "accept",
            "field": field_name,
            "value": field_result["value"],
            "confidence": field_result["confidence"],
        })
    elif field_result["confidence"] >= 0.70:
        routed_fields.append({
            "action": "review",
            "field": field_name,
            "value": field_result["value"],
            "confidence": field_result["confidence"],
        })
    else:
        routed_fields.append({
            "action": "manual",
            "field": field_name,
            "value": None,
            "confidence": field_result["confidence"],
        })

package main

import (
    "fmt"
    "os"

    il "github.com/iterationlayer/sdk-go"
)

func main() {
    client := il.NewClient(os.Getenv("ITERATION_LAYER_API_KEY"))

    result, err := client.ExtractDocument(il.ExtractDocumentRequest{
        Files: []il.FileInput{
            il.NewFileFromURL(
                "invoice.pdf",
                "https://example.com/invoices/INV-2026-4521.pdf",
            ),
        },
        Schema: il.ExtractionSchema{
            "invoiceNumber": il.NewTextFieldConfig(
                "invoiceNumber",
                "The invoice number or identifier",
            ),
            "vendorName": il.NewTextFieldConfig(
                "vendorName",
                "The name of the vendor or supplier",
            ),
        },
    })
    if err != nil {
        panic(err)
    }

    thresholdByField := map[string]float64{
        "invoiceNumber": 0.90,
        "vendorName":    0.88,
        "subtotal":      0.95,
        "taxAmount":     0.95,
        "totalDue":      0.95,
    }

    defaultThreshold := 0.90

    for fieldName, fieldResult := range *result {
        threshold, exists := thresholdByField[fieldName]
        if !exists {
            threshold = defaultThreshold
        }

        if fieldResult.Confidence >= threshold {
            fmt.Printf("ACCEPT %s: %v (%.2f)\n",
                fieldName, fieldResult.Value, fieldResult.Confidence)
        } else if fieldResult.Confidence >= 0.70 {
            fmt.Printf("REVIEW %s: %v (%.2f)\n",
                fieldName, fieldResult.Value, fieldResult.Confidence)
        } else {
            fmt.Printf("MANUAL %s (%.2f)\n",
                fieldName, fieldResult.Confidence)
        }
    }
}

Financial fields (subtotal, tax, total) get a 0.95 bar. Descriptive fields (vendor name) get 0.88. This reflects the real-world cost of errors — a wrong vendor name is a minor annoyance, a wrong total is a financial discrepancy.

Building a Human-in-the-Loop Review Flow

Threshold routing is only useful if the “review” tier actually reaches a human. Here’s how to build that.

The Review Queue

A review queue entry should contain everything the reviewer needs to make a decision without leaving the page:

The original document (as a link, thumbnail, or embedded viewer)
All extracted fields with their values and confidence scores
A visual indicator (green, yellow, red) based on your thresholds
The ability to confirm or correct each flagged field individually
A single “approve all” action for fields that don’t need changes

The reviewer’s workflow: scan the flagged fields, confirm or correct each one, approve the document. Fields that were auto-accepted are visible but not editable unless the reviewer explicitly overrides them.

Designing for Speed

The goal of a review queue is to make human review as fast as possible, not to eliminate it. A reviewer who sees one pre-filled field needing confirmation spends seconds, not minutes. Optimize for the common case — most flagged fields are correct and just need a click to confirm.

Patterns that help:

Pre-fill everything. Even low-confidence fields should show the extracted value. A reviewer who corrects a pre-filled value is faster than a reviewer who types from scratch.
Show the source. Display the citation next to each field — the exact text the model extracted from. The reviewer can compare the extracted value to the source without reading the full document.
Keyboard navigation. Tab between fields, Enter to confirm, type to correct. The reviewer should never need a mouse for the common path.
Batch approval. If all flagged fields look correct, one click approves the entire document.

Routing the Review Result

Once the reviewer confirms or corrects a field, route the result back into your pipeline:

BashTypeScriptPythonGo

# After human review, the corrected data feeds into downstream processing.
# For example, generate a summary report from the reviewed extraction:
curl -X POST https://api.iterationlayer.com/document-generation/v1/generate \
  -H "Authorization: Bearer $ITERATION_LAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "pdf",
    "document": {
      "metadata": {
        "title": "Invoice Summary - INV-2026-4521"
      },
      "content": [
        {
          "type": "headline",
          "level": "h1",
          "text": "Invoice Summary"
        },
        {
          "type": "table",
          "header": {
            "cells": [
              { "text": "Field" },
              { "text": "Value" },
              { "text": "Status" }
            ]
          },
          "rows": [
            {
              "cells": [
                { "text": "Invoice Number" },
                { "text": "INV-2026-4521" },
                { "text": "Auto-accepted" }
              ]
            },
            {
              "cells": [
                { "text": "Total Due" },
                { "text": "$3,847.50" },
                { "text": "Reviewed" }
              ]
            }
          ]
        }
      ]
    }
  }'

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: process.env.ITERATION_LAYER_API_KEY!,
});

// After human review, generate a summary report
const report = await client.generateDocument({
  format: "pdf",
  document: {
    metadata: {
      title: `Invoice Summary - ${reviewedData.invoiceNumber}`,
    },
    content: [
      {
        type: "headline",
        level: "h1",
        text: "Invoice Summary",
      },
      {
        type: "table",
        header: {
          cells: [
            { text: "Field" },
            { text: "Value" },
            { text: "Status" },
          ],
        },
        rows: Object.entries(reviewedData).map(
          ([fieldName, fieldValue]) => ({
            cells: [
              { text: fieldName },
              { text: String(fieldValue) },
              { text: "Verified" },
            ],
          }),
        ),
      },
    ],
  },
});

from iterationlayer import IterationLayer

client = IterationLayer(api_key=os.environ["ITERATION_LAYER_API_KEY"])

# After human review, generate a summary report
report = client.generate_document(
    format="pdf",
    document={
        "metadata": {
            "title": f"Invoice Summary - {reviewed_data['invoiceNumber']}",
        },
        "content": [
            {
                "type": "headline",
                "level": "h1",
                "text": "Invoice Summary",
            },
            {
                "type": "table",
                "header": {
                    "cells": [
                        {"text": "Field"},
                        {"text": "Value"},
                        {"text": "Status"},
                    ]
                },
                "rows": [
                    {
                        "cells": [
                            {"text": field_name},
                            {"text": str(field_value)},
                            {"text": "Verified"},
                        ]
                    }
                    for field_name, field_value in reviewed_data.items()
                ],
            },
        ],
    },
)

package main

import (
    "fmt"
    "os"

    il "github.com/iterationlayer/sdk-go"
)

func main() {
    client := il.NewClient(os.Getenv("ITERATION_LAYER_API_KEY"))

    // After human review, generate a summary report
    report, err := client.GenerateDocument(il.GenerateDocumentRequest{
        Format: "pdf",
        Document: il.DocumentDefinition{
            Metadata: il.DocumentMetadata{
                Title: fmt.Sprintf("Invoice Summary - %s",
                    reviewedData["invoiceNumber"]),
            },
            Content: []il.ContentBlock{
                il.HeadlineBlock{
                    Type:  "headline",
                    Level: "h1",
                    Text:  "Invoice Summary",
                },
            },
        },
    })
    if err != nil {
        panic(err)
    }

    fmt.Printf("Generated report: %s\n", report.MimeType)
}

This is the composability payoff. The extraction feeds into a review queue. The reviewed data feeds into document generation. Same API key, same credit pool, no glue code between services.

Confidence-Based Routing for Agents

If you’re building AI agents that process documents — an MCP-connected assistant, a Claude-powered pipeline, a custom agent framework — confidence scores become the agent’s decision layer.

An agent without confidence awareness is dangerous. It extracts data and acts on it, with no way to know when the extraction was unreliable. An agent with confidence awareness can make nuanced decisions: proceed when confident, ask for help when uncertain, reject when the data is too unreliable to use.

Agent Decision Patterns

Pattern 1: Proceed or escalate. The agent extracts document data. If all fields are above the threshold, it continues the workflow. If any field falls below, it pauses and escalates to a human.

Pattern 2: Confidence-gated branching. The agent takes different actions based on confidence. High-confidence invoices go straight to payment processing. Medium-confidence invoices get queued for review. Low-confidence invoices get flagged with a note explaining which fields are uncertain.

Pattern 3: Multi-source validation. The agent extracts the same field from multiple documents (e.g., a contract and its amendment). If both extractions agree and both have high confidence, the agent trusts the result. If they disagree, it flags the discrepancy.

MCP Tool Integration

When the Iteration Layer MCP server processes a document, the agent receives the full confidence data alongside the extracted values. The agent can inspect confidence per field and decide its next action without custom parsing logic.

A typical agent workflow:

Agent receives a document to process
Agent calls the extraction tool via MCP
Agent checks confidence scores on each field
For high-confidence fields: continues with the workflow
For low-confidence fields: asks the user to verify, or flags the document

The confidence scores give the agent the same decision-making framework a human operator would use — but faster, and consistently applied across every document.

Monitoring Confidence Over Time

Confidence scores aren’t just a per-document decision tool. They’re a signal about the health of your entire pipeline.

What to Track

Average confidence per field — trending across all documents processed
Confidence distribution — what percentage of fields fall into each tier (accept, review, manual)
Auto-accept rate — what percentage of fields pass your threshold without human review
Review-to-correction rate — how often reviewers actually change the suggested value

Setting Up a Confidence Dashboard

BashTypeScriptPythonGo

# Extract and log confidence data for monitoring
# Process a batch and pipe confidence stats to your monitoring system
curl -X POST https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer $ITERATION_LAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "type": "url",
        "name": "invoice.pdf",
        "url": "https://example.com/invoices/batch-001.pdf"
      }
    ],
    "schema": {
      "fields": [
        {
          "type": "TEXT",
          "name": "invoiceNumber",
          "description": "The invoice number"
        },
        {
          "type": "CURRENCY_AMOUNT",
          "name": "totalDue",
          "description": "The total amount due"
        }
      ]
    }
  }' | jq '{
    fields: [to_entries[] | {
      name: .key,
      confidence: .value.confidence,
      tier: (if .value.confidence >= 0.92 then "accept"
            elif .value.confidence >= 0.70 then "review"
            else "manual" end)
    }]
  }'

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: process.env.ITERATION_LAYER_API_KEY!,
});

const HIGH_CONFIDENCE_THRESHOLD = 0.92;
const LOW_CONFIDENCE_THRESHOLD = 0.70;

const result = await client.extractDocument({
  files: [
    {
      type: "url",
      name: "invoice.pdf",
      url: "https://example.com/invoices/batch-001.pdf",
    },
  ],
  schema: {
    fields: [
      {
        type: "TEXT",
        name: "invoiceNumber",
        description: "The invoice number",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "totalDue",
        description: "The total amount due",
      },
    ],
  },
});

const confidenceStats = Object.entries(result).map(
  ([fieldName, fieldResult]) => ({
    field: fieldName,
    confidence: fieldResult.confidence,
    tier:
      fieldResult.confidence >= HIGH_CONFIDENCE_THRESHOLD
        ? "accept"
        : fieldResult.confidence >= LOW_CONFIDENCE_THRESHOLD
          ? "review"
          : "manual",
  }),
);

// Send to your monitoring system
console.log(JSON.stringify(confidenceStats, null, 2));

import json
from iterationlayer import IterationLayer

client = IterationLayer(api_key=os.environ["ITERATION_LAYER_API_KEY"])

HIGH_CONFIDENCE_THRESHOLD = 0.92
LOW_CONFIDENCE_THRESHOLD = 0.70

result = client.extract_document(
    files=[
        {
            "type": "url",
            "name": "invoice.pdf",
            "url": "https://example.com/invoices/batch-001.pdf",
        }
    ],
    schema={
        "fields": [
            {
                "type": "TEXT",
                "name": "invoiceNumber",
                "description": "The invoice number",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "totalDue",
                "description": "The total amount due",
            },
        ]
    },
)

confidence_stats = []
for field_name, field_result in result.items():
    confidence = field_result["confidence"]

    if confidence >= HIGH_CONFIDENCE_THRESHOLD:
        tier = "accept"
    elif confidence >= LOW_CONFIDENCE_THRESHOLD:
        tier = "review"
    else:
        tier = "manual"

    confidence_stats.append({
        "field": field_name,
        "confidence": confidence,
        "tier": tier,
    })

# Send to your monitoring system
print(json.dumps(confidence_stats, indent=2))

package main

import (
    "encoding/json"
    "fmt"
    "os"

    il "github.com/iterationlayer/sdk-go"
)

func main() {
    client := il.NewClient(os.Getenv("ITERATION_LAYER_API_KEY"))

    result, err := client.ExtractDocument(il.ExtractDocumentRequest{
        Files: []il.FileInput{
            il.NewFileFromURL(
                "invoice.pdf",
                "https://example.com/invoices/batch-001.pdf",
            ),
        },
        Schema: il.ExtractionSchema{
            "invoiceNumber": il.NewTextFieldConfig(
                "invoiceNumber",
                "The invoice number",
            ),
            "totalDue": il.NewTextFieldConfig(
                "totalDue",
                "The total amount due",
            ),
        },
    })
    if err != nil {
        panic(err)
    }

    highThreshold := 0.92
    lowThreshold := 0.70

    type ConfidenceStat struct {
        Field      string  `json:"field"`
        Confidence float64 `json:"confidence"`
        Tier       string  `json:"tier"`
    }

    stats := make([]ConfidenceStat, 0)
    for fieldName, fieldResult := range *result {
        tier := "manual"
        if fieldResult.Confidence >= highThreshold {
            tier = "accept"
        } else if fieldResult.Confidence >= lowThreshold {
            tier = "review"
        }

        stats = append(stats, ConfidenceStat{
            Field:      fieldName,
            Confidence: fieldResult.Confidence,
            Tier:       tier,
        })
    }

    output, _ := json.MarshalIndent(stats, "", "  ")
    fmt.Println(string(output))
}

Reading the Signals

Signal	What it means	What to do
Average confidence dropping	Document quality changed, or a new document format appeared	Investigate recent documents; check for new suppliers, formats, or scan settings
Auto-accept rate above 95%	Thresholds might be too conservative	Consider raising them to reduce unnecessary review volume
Review-to-correction rate below 2%	Reviewers almost never change flagged values	Raise the auto-accept threshold — you’re wasting reviewer time
Review-to-correction rate above 20%	Too many incorrect values are reaching the review tier	Lower the auto-accept threshold, or investigate why specific fields are unreliable
One field consistently low confidence	The schema description might be ambiguous, or the field is inherently hard to extract	Refine the field description, add constraints, or accept that this field always needs review

Handling Low-Confidence Fields Gracefully

Not every low-confidence field is a failure. Some fields are inherently harder to extract — handwritten notes, freeform text blocks, nested tables with inconsistent formatting. Your system needs to handle these cases without breaking.

Fallback Strategies

Strategy 1: Default values. For optional fields where a wrong value is better than no value, use a sensible default when confidence is below your threshold. The field config supports default_value for this purpose.

Strategy 2: Skip and notify. For non-critical fields, skip the low-confidence extraction entirely and log it. Process the document with the fields you do have. A missing vendor name doesn’t block invoice processing if you have the invoice number and amount.

Strategy 3: Re-extract with a different schema. If a field consistently extracts at low confidence, the problem might be the schema description, not the document. Try rewording the field description to be more specific. “The total amount due after tax, displayed at the bottom of the invoice” often extracts better than “total.”

Strategy 4: Multi-document cross-reference. For critical fields, extract the same value from multiple related documents. If the invoice total matches the purchase order total and both have high confidence, you have strong validation. If they disagree, flag both for review.

Calculated Fields as Validation

The Document Extraction API supports CALCULATED fields that compute values from other extracted fields. This is a built-in cross-check.

Extract subtotal, taxAmount, and totalDue as separate fields. Add a calculated field that sums subtotal and taxAmount. If the calculated sum doesn’t match the extracted totalDue, at least one of the three fields is wrong — flag all three for review, regardless of their individual confidence scores.

This pattern catches a class of errors that confidence alone can’t: values that look plausible individually but are internally inconsistent.

Adjusting Thresholds Over Time

Your initial thresholds are educated guesses. After processing a few hundred documents, you have data to make them precise.

The Calibration Loop

Start conservative. Set auto-accept thresholds high (0.95) and review thresholds generous (0.65). You’ll review more documents than necessary, but you won’t miss errors.
Track reviewer actions. For every reviewed field, record whether the reviewer confirmed the value or corrected it.
Analyze at the 500-document mark. For each field, calculate:
- What percentage of reviewed fields were confirmed without changes?
- What was the average confidence of confirmed-without-changes fields?
- What was the average confidence of corrected fields?
Adjust thresholds. If 98% of fields in the review tier were confirmed without changes, your auto-accept threshold is too high. Lower it until the review tier catches roughly 5-10% corrections — that’s the sweet spot where review is adding value without wasting time.
Repeat quarterly. Document formats change, suppliers change, scan quality changes. Recalibrate regularly.

A Practical Example

After processing 1,000 invoices with a 0.92 auto-accept threshold:

Metric	Value
Fields auto-accepted	87%
Fields sent to review	11%
Fields requiring manual entry	2%
Reviewed fields confirmed without changes	96%
Reviewed fields corrected	4%

That 96% confirmation rate means your review threshold is too conservative — most flagged fields are correct. Drop the auto-accept threshold to 0.88 and re-measure. The goal: push the auto-accept rate up while keeping the correction rate in the review tier meaningful.

The Business Case for Confidence-Based Automation

Manual document processing costs time. Full automation without verification costs trust. Confidence-based automation gives you both — speed for the clear cases, human oversight for the ambiguous ones.

For a team processing 500 invoices per month with 8 fields per invoice — that’s 4,000 field decisions per month. With confidence-based routing:

Tier	Percentage	Fields per month	Human time
Auto-accept	85%	3,400	Zero
Review (confirm/correct)	12%	480	~2 hours
Manual entry	3%	120	~1 hour
Total		4,000	~3 hours

Compare that to fully manual processing at roughly 2 minutes per field: 133 hours per month. Or fully automated with no review: zero hours, but a 5-15% error rate that shows up as payment discrepancies, compliance issues, and client complaints.

Three hours of targeted review per month. That’s what confidence scores buy you.

Cross-Check: Combining Confidence with Calculated Fields

One of the most effective validation patterns combines confidence scores with the CALCULATED field type. This catches errors that look plausible in isolation but fail basic arithmetic.

BashTypeScriptPythonGo

curl -X POST https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer $ITERATION_LAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "type": "url",
        "name": "invoice.pdf",
        "url": "https://example.com/invoices/INV-2026-4521.pdf"
      }
    ],
    "schema": {
      "fields": [
        {
          "type": "CURRENCY_AMOUNT",
          "name": "subtotal",
          "description": "The subtotal before tax"
        },
        {
          "type": "CURRENCY_AMOUNT",
          "name": "taxAmount",
          "description": "The tax amount"
        },
        {
          "type": "CURRENCY_AMOUNT",
          "name": "totalDue",
          "description": "The total amount due"
        },
        {
          "type": "CALCULATED",
          "name": "computedTotal",
          "description": "Sum of subtotal and tax for validation",
          "operation": "sum",
          "source_field_names": ["subtotal", "taxAmount"]
        }
      ]
    }
  }'

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: process.env.ITERATION_LAYER_API_KEY!,
});

const result = await client.extractDocument({
  files: [
    {
      type: "url",
      name: "invoice.pdf",
      url: "https://example.com/invoices/INV-2026-4521.pdf",
    },
  ],
  schema: {
    fields: [
      {
        type: "CURRENCY_AMOUNT",
        name: "subtotal",
        description: "The subtotal before tax",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "taxAmount",
        description: "The tax amount",
      },
      {
        type: "CURRENCY_AMOUNT",
        name: "totalDue",
        description: "The total amount due",
      },
      {
        type: "CALCULATED",
        name: "computedTotal",
        description: "Sum of subtotal and tax for validation",
        operation: "sum",
        source_field_names: ["subtotal", "taxAmount"],
      },
    ],
  },
});

const extractedTotal = result.totalDue.value as number;
const computedTotal = result.computedTotal.value as number;
const totalsMatch = Math.abs(extractedTotal - computedTotal) < 0.01;

if (!totalsMatch) {
  // Flag all financial fields for review, regardless of confidence
  console.log("Arithmetic mismatch detected — flagging for review");
}

import math
from iterationlayer import IterationLayer

client = IterationLayer(api_key=os.environ["ITERATION_LAYER_API_KEY"])

result = client.extract_document(
    files=[
        {
            "type": "url",
            "name": "invoice.pdf",
            "url": "https://example.com/invoices/INV-2026-4521.pdf",
        }
    ],
    schema={
        "fields": [
            {
                "type": "CURRENCY_AMOUNT",
                "name": "subtotal",
                "description": "The subtotal before tax",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "taxAmount",
                "description": "The tax amount",
            },
            {
                "type": "CURRENCY_AMOUNT",
                "name": "totalDue",
                "description": "The total amount due",
            },
            {
                "type": "CALCULATED",
                "name": "computedTotal",
                "description": "Sum of subtotal and tax for validation",
                "operation": "sum",
                "source_field_names": ["subtotal", "taxAmount"],
            },
        ]
    },
)

extracted_total = result["totalDue"]["value"]
computed_total = result["computedTotal"]["value"]
totals_match = math.isclose(extracted_total, computed_total, abs_tol=0.01)

if not totals_match:
    # Flag all financial fields for review, regardless of confidence
    print("Arithmetic mismatch detected — flagging for review")

package main

import (
    "fmt"
    "math"
    "os"

    il "github.com/iterationlayer/sdk-go"
)

func main() {
    client := il.NewClient(os.Getenv("ITERATION_LAYER_API_KEY"))

    result, err := client.ExtractDocument(il.ExtractDocumentRequest{
        Files: []il.FileInput{
            il.NewFileFromURL(
                "invoice.pdf",
                "https://example.com/invoices/INV-2026-4521.pdf",
            ),
        },
        Schema: il.ExtractionSchema{
            "subtotal": il.NewDecimalFieldConfig(
                "subtotal",
                "The subtotal before tax",
            ),
            "taxAmount": il.NewDecimalFieldConfig(
                "taxAmount",
                "The tax amount",
            ),
            "totalDue": il.NewDecimalFieldConfig(
                "totalDue",
                "The total amount due",
            ),
        },
    })
    if err != nil {
        panic(err)
    }

    extractedTotal := (*result)["totalDue"].Value.(float64)
    computedSubtotal := (*result)["subtotal"].Value.(float64)
    computedTax := (*result)["taxAmount"].Value.(float64)

    if math.Abs(extractedTotal-(computedSubtotal+computedTax)) > 0.01 {
        fmt.Println("Arithmetic mismatch detected — flagging for review")
    }
}

If the extracted total doesn’t match the computed sum, flag all three fields for review — even if each individual field has high confidence. The mismatch tells you something is wrong, even when the model is confident about each part.

Practical Domain Patterns

Invoice Processing

Auto-accept: invoice number, date, vendor name (usually high confidence from structured invoices). Flag: line item totals when below threshold. Use CALCULATED fields to cross-check subtotal + tax = total. Financial fields get a 0.95 threshold; descriptive fields get 0.88.

Contract Analysis

Auto-accept: party names, effective dates, contract ID. Flag: clause summaries (TEXTAREA fields have more room for partial extraction). Flag boolean fields like “has non-compete clause” when confidence is below 0.85 — the business impact of a wrong answer is high.

Resume Screening

Auto-accept: name and email (high confidence from structured headers). Flag: skills and experience summaries (often medium confidence due to varied formatting across resumes). Require manual review for contact details extracted from scanned or photographed documents.

Logistics Documents

Auto-accept: shipment ID, origin, destination (usually structured and consistent). Flag: weight, dimensions, declared value (format varies widely across carriers). Use CALCULATED fields to validate that itemized weights sum to the declared total weight.

Get Started

Check the Document Extraction docs to see how confidence scores work across all field types. The TypeScript, Python, and Go SDKs return typed response objects with confidence scores on every field.

Start with conservative thresholds. Process a few hundred documents. Track what your reviewers actually change. Then tighten the thresholds based on data, not guesses.

Start the 7-day trial. Run a few documents through with your schema and check the confidence distributions before building your thresholding logic.

Agencies

Automation Builders

n8n Agencies

EU AI Sovereignty Belongs in the Workflow Layer

EU AI Sovereignty Belongs in the Workflow Layer

Why Fully Automated Extraction Fails

What Confidence Scores Measure

Per-Field vs. Document-Level Confidence

Threshold Strategies

Three-Tier Routing

Per-Field Thresholds

Building a Human-in-the-Loop Review Flow

The Review Queue

Designing for Speed

Routing the Review Result

Confidence-Based Routing for Agents

Agent Decision Patterns

MCP Tool Integration

Monitoring Confidence Over Time

What to Track

Setting Up a Confidence Dashboard

Reading the Signals

Handling Low-Confidence Fields Gracefully

Fallback Strategies

Calculated Fields as Validation

Adjusting Thresholds Over Time

The Calibration Loop

A Practical Example

The Business Case for Confidence-Based Automation

Cross-Check: Combining Confidence with Calculated Fields

Practical Domain Patterns

Invoice Processing

Contract Analysis

Resume Screening

Logistics Documents

Get Started

Related reading

How to Route Low-Confidence Document Fields to Human Review in n8n

How to Evaluate Document Extraction APIs

Best Document Extraction APIs in 2026

Try with your own data

Document Extraction