The Agent Promise vs. the Agent Reality
Every AI agent demo looks the same. The agent gets a prompt, calls a tool, returns a result. The audience applauds. Then someone tries to build a real workflow — one that handles messy PDFs, evaluates whether the extraction is trustworthy, generates a report when it is, and flags a human when it isn’t — and the demo falls apart.
The hard part of agent-based document processing is not calling an API. It is the logic between the calls. Deciding what to do with an extraction that came back at 0.72 confidence. Knowing when to generate a summary report and when to route to a review queue. Chaining extraction, evaluation, and generation into a pipeline that runs without babysitting.
This tutorial builds that pipeline end-to-end. You will build an agent workflow that extracts structured data from documents, evaluates confidence scores to decide the next step, and either generates a PDF report or flags the document for human review. The code is real, the confidence routing is real, and the pipeline works with any MCP-compatible client.
What You Are Building
The complete workflow has three stages:
- Extract — Parse a document (invoice, contract, report) into structured JSON with per-field confidence scores
- Evaluate — Check confidence scores against thresholds to decide whether the extraction is trustworthy
- Route — If confidence is high, generate a PDF summary report. If confidence is low, flag the document for manual review.
This is not a single API call. It is a multi-step pipeline where the output of one stage determines the behavior of the next. That is where agents earn their keep — not in making one call, but in making the right sequence of calls based on what the data tells them.
Prerequisites
You need an Iteration Layer account (free tier works) and an MCP-compatible client. The examples below show both the MCP-based approach (for agents like Claude Code and Cursor) and the direct SDK approach (for building this into your own application code).
MCP Setup for Claude Code:
claude mcp add iterationlayer --transport http https://api.iterationlayer.com/mcp
MCP Setup for Cursor (add to .cursor/mcp.json):
{
"mcpServers": {
"iterationlayer": {
"type": "http",
"url": "https://api.iterationlayer.com/mcp"
}
}
}Authentication uses OAuth 2.1 — a browser window opens on first use to authorize access. No API keys in config files.
For the SDK approach, install the SDK for your language:
# No installation needed — curl is available everywherenpm install iterationlayerpip install iterationlayergo get github.com/iterationlayer/sdk-goStage 1: Extracting Structured Data from a Document
The first step is turning an unstructured document into structured JSON. The key difference between a basic extraction and an agent-ready extraction is the confidence scores. Every field in the response includes a confidence value between 0 and 1. This is what makes routing possible — without it, the agent has no signal for deciding what to trust.
Here is an extraction that pulls key fields from a supplier invoice — vendor name, invoice number, total amount, currency, due date, and line items:
curl -X POST \
https://api.iterationlayer.com/document-extraction/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"name": "supplier-invoice-2026-0892.pdf",
"url": "https://example.com/invoices/supplier-invoice-2026-0892.pdf"
}
],
"schema": {
"fields": [
{
"name": "vendor_name",
"type": "TEXT",
"description": "Name of the company that issued the invoice"
},
{
"name": "invoice_number",
"type": "TEXT",
"description": "Invoice number or reference code"
},
{
"name": "total_amount",
"type": "CURRENCY_AMOUNT",
"description": "Total amount due on the invoice",
"decimal_points": 2
},
{
"name": "currency",
"type": "CURRENCY_CODE",
"description": "Currency of the invoice"
},
{
"name": "due_date",
"type": "DATE",
"description": "Payment due date"
},
{
"name": "line_items",
"type": "ARRAY",
"description": "Individual line items on the invoice",
"item_schema": {
"fields": [
{
"name": "description",
"type": "TEXT",
"description": "Line item description"
},
{
"name": "quantity",
"type": "INTEGER",
"description": "Number of units"
},
{
"name": "unit_price",
"type": "CURRENCY_AMOUNT",
"description": "Price per unit",
"decimal_points": 2
}
]
}
}
]
}
}'import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
apiKey: "YOUR_API_KEY",
});
const extraction = await client.extract({
files: [
{
type: "url",
name: "supplier-invoice-2026-0892.pdf",
url: "https://example.com/invoices/supplier-invoice-2026-0892.pdf",
},
],
schema: {
fields: [
{
name: "vendor_name",
type: "TEXT",
description: "Name of the company that issued the invoice",
},
{
name: "invoice_number",
type: "TEXT",
description: "Invoice number or reference code",
},
{
name: "total_amount",
type: "CURRENCY_AMOUNT",
description: "Total amount due on the invoice",
decimal_points: 2,
},
{
name: "currency",
type: "CURRENCY_CODE",
description: "Currency of the invoice",
},
{
name: "due_date",
type: "DATE",
description: "Payment due date",
},
{
name: "line_items",
type: "ARRAY",
description: "Individual line items on the invoice",
item_schema: {
fields: [
{
name: "description",
type: "TEXT",
description: "Line item description",
},
{
name: "quantity",
type: "INTEGER",
description: "Number of units",
},
{
name: "unit_price",
type: "CURRENCY_AMOUNT",
description: "Price per unit",
decimal_points: 2,
},
],
},
},
],
},
});from iterationlayer import IterationLayer
client = IterationLayer(api_key="YOUR_API_KEY")
extraction = client.extract(
files=[
{
"type": "url",
"name": "supplier-invoice-2026-0892.pdf",
"url": "https://example.com/invoices/supplier-invoice-2026-0892.pdf",
}
],
schema={
"fields": [
{
"name": "vendor_name",
"type": "TEXT",
"description": "Name of the company that issued the invoice",
},
{
"name": "invoice_number",
"type": "TEXT",
"description": "Invoice number or reference code",
},
{
"name": "total_amount",
"type": "CURRENCY_AMOUNT",
"description": "Total amount due on the invoice",
"decimal_points": 2,
},
{
"name": "currency",
"type": "CURRENCY_CODE",
"description": "Currency of the invoice",
},
{
"name": "due_date",
"type": "DATE",
"description": "Payment due date",
},
{
"name": "line_items",
"type": "ARRAY",
"description": "Individual line items on the invoice",
"item_schema": {
"fields": [
{
"name": "description",
"type": "TEXT",
"description": "Line item description",
},
{
"name": "quantity",
"type": "INTEGER",
"description": "Number of units",
},
{
"name": "unit_price",
"type": "CURRENCY_AMOUNT",
"description": "Price per unit",
"decimal_points": 2,
},
]
},
},
]
},
)package main
import (
"fmt"
"log"
il "github.com/iterationlayer/sdk-go"
)
func main() {
client := il.NewClient("YOUR_API_KEY")
extraction, err := client.Extract(il.ExtractRequest{
Files: []il.FileInput{
il.NewFileFromURL(
"supplier-invoice-2026-0892.pdf",
"https://example.com/invoices/supplier-invoice-2026-0892.pdf",
),
},
Schema: il.ExtractionSchema{
"vendor_name": il.NewTextFieldConfig("vendor_name", "Name of the company that issued the invoice"),
"invoice_number": il.NewTextFieldConfig("invoice_number", "Invoice number or reference code"),
"total_amount": il.NewCurrencyAmountFieldConfig("total_amount", "Total amount due on the invoice"),
"currency": il.NewCurrencyCodeFieldConfig("currency", "Currency of the invoice"),
"due_date": il.NewDateFieldConfig("due_date", "Payment due date"),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Vendor: %v (confidence: %.2f)\n",
(*extraction)["vendor_name"].Value,
(*extraction)["vendor_name"].Confidence,
)
}The response includes per-field confidence scores and citations:
{
"vendor_name": {
"value": "Bergmann Elektronik GmbH",
"confidence": 0.97,
"citations": ["Bergmann Elektronik GmbH"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "TEXT"
},
"invoice_number": {
"value": "BE-2026-0892",
"confidence": 0.99,
"citations": ["Rechnung Nr. BE-2026-0892"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "TEXT"
},
"total_amount": {
"value": 12450.00,
"confidence": 0.95,
"citations": ["Gesamtbetrag: 12.450,00"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "CURRENCY_AMOUNT"
},
"currency": {
"value": "EUR",
"confidence": 0.98,
"citations": ["EUR"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "CURRENCY_CODE"
},
"due_date": {
"value": "2026-05-15",
"confidence": 0.93,
"citations": ["Zahlungsziel: 15.05.2026"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "DATE"
},
"line_items": {
"value": [
{
"description": {
"value": "Leiterplatten Typ A (200 Stk.)",
"confidence": 0.94,
"citations": ["Leiterplatten Typ A (200 Stk.)"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "TEXT"
},
"quantity": {
"value": 200,
"confidence": 0.96,
"citations": ["200"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "INTEGER"
},
"unit_price": {
"value": 42.50,
"confidence": 0.93,
"citations": ["42,50"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "CURRENCY_AMOUNT"
}
},
{
"description": {
"value": "Steckverbinder Set B (500 Stk.)",
"confidence": 0.95,
"citations": ["Steckverbinder Set B (500 Stk.)"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "TEXT"
},
"quantity": {
"value": 500,
"confidence": 0.97,
"citations": ["500"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "INTEGER"
},
"unit_price": {
"value": 7.90,
"confidence": 0.91,
"citations": ["7,90"],
"source": "supplier-invoice-2026-0892.pdf",
"type": "CURRENCY_AMOUNT"
}
}
],
"confidence": 0.93,
"citations": [],
"source": "supplier-invoice-2026-0892.pdf",
"type": "ARRAY"
}
}
Every field has a confidence score. Every field has citations showing the exact text from the source document that supports the extracted value. This is the foundation the agent needs to make routing decisions.
Stage 2: Confidence-Based Routing
Here is where the agent logic lives. You have structured data with confidence scores. Now you need rules.
A simple but effective routing strategy uses two thresholds:
- High confidence (all fields >= 0.90): Auto-generate the report. No human review needed.
- Low confidence (any field < 0.90): Flag the document. Route to a review queue with the specific fields that need attention.
This is not a magic number — 0.90 is a reasonable starting point for most document types. Adjust based on your tolerance for errors. Financial documents might use 0.95. Internal summaries might use 0.85.
# Confidence evaluation is application logic — not an API call.
# In a bash pipeline, you would use jq to check confidence scores:
EXTRACTION='{"vendor_name":{"value":"Bergmann Elektronik GmbH","confidence":0.97},"total_amount":{"value":12450.00,"confidence":0.95},"due_date":{"value":"2026-05-15","confidence":0.93}}'
CONFIDENCE_THRESHOLD=0.90
LOW_CONFIDENCE_FIELDS=$(echo "$EXTRACTION" | jq -r \
--argjson threshold "$CONFIDENCE_THRESHOLD" \
'[to_entries[] | select(.value.confidence < $threshold) | .key] | join(", ")')
if [ -z "$LOW_CONFIDENCE_FIELDS" ]; then
echo "All fields above threshold. Generating report..."
else
echo "Low confidence fields: $LOW_CONFIDENCE_FIELDS"
echo "Routing to human review."
ficonst CONFIDENCE_THRESHOLD = 0.90;
const lowConfidenceFields = Object.entries(extraction)
.filter(([, field]) => field.confidence < CONFIDENCE_THRESHOLD)
.map(([name, field]) => ({
name,
confidence: field.confidence,
value: field.value,
}));
const isHighConfidence = lowConfidenceFields.length === 0;
if (isHighConfidence) {
console.log("All fields above threshold. Generating report...");
} else {
console.log("Low confidence fields:", lowConfidenceFields);
console.log("Routing to human review.");
}CONFIDENCE_THRESHOLD = 0.90
low_confidence_fields = [
{"name": name, "confidence": field["confidence"], "value": field["value"]}
for name, field in extraction.items()
if field["confidence"] < CONFIDENCE_THRESHOLD
]
is_high_confidence = len(low_confidence_fields) == 0
if is_high_confidence:
print("All fields above threshold. Generating report...")
else:
print("Low confidence fields:", low_confidence_fields)
print("Routing to human review.")const confidenceThreshold = 0.90
type LowConfidenceField struct {
Name string
Confidence float64
Value any
}
var lowConfidenceFields []LowConfidenceField
for name, field := range *extraction {
if field.Confidence < confidenceThreshold {
lowConfidenceFields = append(lowConfidenceFields, LowConfidenceField{
Name: name,
Confidence: field.Confidence,
Value: field.Value,
})
}
}
isHighConfidence := len(lowConfidenceFields) == 0
if isHighConfidence {
fmt.Println("All fields above threshold. Generating report...")
} else {
fmt.Printf("Low confidence fields: %+v\n", lowConfidenceFields)
fmt.Println("Routing to human review.")
}This evaluation step is pure application logic — no API call needed. The agent reads the confidence scores from the extraction response and makes a decision. In an MCP-based agent, this decision happens naturally in conversation: the agent sees the confidence scores and decides what to do next based on the instructions you give it.
Stage 3a: Generating a Report (High Confidence Path)
When all confidence scores pass the threshold, the agent generates a PDF summary report from the extracted data. This is where composability matters — the extraction output feeds directly into document generation. Same auth, same credit pool, no format conversion.
curl -X POST \
https://api.iterationlayer.com/document-generation/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"format": "pdf",
"document": {
"metadata": {
"title": "Invoice Summary: BE-2026-0892",
"author": "Automated Processing Pipeline"
},
"page": {
"size": {
"preset": "A4"
},
"margins": {
"top_in_pt": 60,
"right_in_pt": 50,
"bottom_in_pt": 60,
"left_in_pt": 50
}
},
"content": [
{
"type": "headline",
"level": "h1",
"text": "Invoice Summary"
},
{
"type": "paragraph",
"markdown": "**Vendor:** Bergmann Elektronik GmbH"
},
{
"type": "paragraph",
"markdown": "**Invoice:** BE-2026-0892 | **Due:** 2026-05-15 | **Total:** 12,450.00 EUR"
},
{
"type": "headline",
"level": "h2",
"text": "Line Items"
},
{
"type": "table",
"header": {
"cells": [
{"text": "Description"},
{"text": "Qty"},
{"text": "Unit Price"},
{"text": "Subtotal"}
]
},
"rows": [
{
"cells": [
{"text": "Leiterplatten Typ A (200 Stk.)"},
{"text": "200", "horizontal_alignment": "right"},
{"text": "42.50 EUR", "horizontal_alignment": "right"},
{"text": "8,500.00 EUR", "horizontal_alignment": "right"}
]
},
{
"cells": [
{"text": "Steckverbinder Set B (500 Stk.)"},
{"text": "500", "horizontal_alignment": "right"},
{"text": "7.90 EUR", "horizontal_alignment": "right"},
{"text": "3,950.00 EUR", "horizontal_alignment": "right"}
]
}
]
},
{
"type": "separator"
},
{
"type": "paragraph",
"markdown": "**Processing Status:** Automatically approved. All extraction confidence scores above 0.90 threshold."
}
]
}
}'const report = await client.generateDocument({
format: "pdf",
document: {
metadata: {
title: `Invoice Summary: ${extraction.invoice_number.value}`,
author: "Automated Processing Pipeline",
},
page: {
size: { preset: "A4" },
margins: {
top_in_pt: 60,
right_in_pt: 50,
bottom_in_pt: 60,
left_in_pt: 50,
},
},
content: [
{
type: "headline",
level: "h1",
text: "Invoice Summary",
},
{
type: "paragraph",
markdown: `**Vendor:** ${extraction.vendor_name.value}`,
},
{
type: "paragraph",
markdown: `**Invoice:** ${extraction.invoice_number.value} | **Due:** ${extraction.due_date.value} | **Total:** ${extraction.total_amount.value} ${extraction.currency.value}`,
},
{
type: "headline",
level: "h2",
text: "Line Items",
},
{
type: "table",
header: {
cells: [
{ text: "Description" },
{ text: "Qty" },
{ text: "Unit Price" },
{ text: "Subtotal" },
],
},
rows: (extraction.line_items.value as Array<Record<string, ExtractionFieldResult>>).map(
(item) => ({
cells: [
{ text: String(item.description.value) },
{ text: String(item.quantity.value), horizontal_alignment: "right" as const },
{
text: `${item.unit_price.value} ${extraction.currency.value}`,
horizontal_alignment: "right" as const,
},
{
text: `${(Number(item.quantity.value) * Number(item.unit_price.value)).toFixed(2)} ${extraction.currency.value}`,
horizontal_alignment: "right" as const,
},
],
}),
),
},
{ type: "separator" },
{
type: "paragraph",
markdown:
"**Processing Status:** Automatically approved. All extraction confidence scores above 0.90 threshold.",
},
],
},
});
// report.buffer contains the base64-encoded PDF
// report.mime_type is "application/pdf"line_items = extraction["line_items"]["value"]
currency = extraction["currency"]["value"]
report = client.generate_document(
format="pdf",
document={
"metadata": {
"title": f"Invoice Summary: {extraction['invoice_number']['value']}",
"author": "Automated Processing Pipeline",
},
"page": {
"size": {"preset": "A4"},
"margins": {
"top_in_pt": 60,
"right_in_pt": 50,
"bottom_in_pt": 60,
"left_in_pt": 50,
},
},
"content": [
{
"type": "headline",
"level": "h1",
"text": "Invoice Summary",
},
{
"type": "paragraph",
"markdown": f"**Vendor:** {extraction['vendor_name']['value']}",
},
{
"type": "paragraph",
"markdown": (
f"**Invoice:** {extraction['invoice_number']['value']}"
f" | **Due:** {extraction['due_date']['value']}"
f" | **Total:** {extraction['total_amount']['value']} {currency}"
),
},
{
"type": "headline",
"level": "h2",
"text": "Line Items",
},
{
"type": "table",
"header": {
"cells": [
{"text": "Description"},
{"text": "Qty"},
{"text": "Unit Price"},
{"text": "Subtotal"},
]
},
"rows": [
{
"cells": [
{"text": str(item["description"]["value"])},
{
"text": str(item["quantity"]["value"]),
"horizontal_alignment": "right",
},
{
"text": f"{item['unit_price']['value']} {currency}",
"horizontal_alignment": "right",
},
{
"text": f"{item['quantity']['value'] * item['unit_price']['value']:.2f} {currency}",
"horizontal_alignment": "right",
},
]
}
for item in line_items
],
},
{"type": "separator"},
{
"type": "paragraph",
"markdown": "**Processing Status:** Automatically approved. All extraction confidence scores above 0.90 threshold.",
},
],
},
)
# report["buffer"] contains the base64-encoded PDF
# report["mime_type"] is "application/pdf"report, err := client.GenerateDocument(il.GenerateDocumentRequest{
Format: "pdf",
Document: il.DocumentDefinition{
Metadata: il.DocumentMetadata{
Title: fmt.Sprintf("Invoice Summary: %v", (*extraction)["invoice_number"].Value),
Author: "Automated Processing Pipeline",
},
Page: &il.DocumentPage{
Size: il.PageSize{Preset: "A4"},
Margins: il.Margins{TopInPt: 60, RightInPt: 50, BottomInPt: 60, LeftInPt: 50},
},
Content: []il.ContentBlock{
il.HeadlineBlock{Type: "headline", Level: "h1", Text: "Invoice Summary"},
il.ParagraphBlock{
Type: "paragraph",
Markdown: fmt.Sprintf("**Vendor:** %v", (*extraction)["vendor_name"].Value),
},
il.ParagraphBlock{
Type: "paragraph",
Markdown: fmt.Sprintf("**Invoice:** %v | **Due:** %v | **Total:** %v EUR",
(*extraction)["invoice_number"].Value,
(*extraction)["due_date"].Value,
(*extraction)["total_amount"].Value,
),
},
il.HeadlineBlock{Type: "headline", Level: "h2", Text: "Line Items"},
il.SeparatorBlock{Type: "separator"},
il.ParagraphBlock{
Type: "paragraph",
Markdown: "**Processing Status:** Automatically approved. All extraction confidence scores above 0.90 threshold.",
},
},
},
})
if err != nil {
log.Fatal(err)
}
// report.Buffer contains the base64-encoded PDF
// report.MimeType is "application/pdf"Two API calls. One credit pool. The extraction result feeds directly into document generation — no intermediate file, no format conversion, no second vendor.
Stage 3b: Flagging for Review (Low Confidence Path)
When any field falls below the confidence threshold, the agent generates a different document — a review report that highlights exactly which fields need human attention, what the extracted values were, and what the confidence scores looked like.
curl -X POST \
https://api.iterationlayer.com/document-generation/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"format": "pdf",
"document": {
"metadata": {
"title": "Review Required: BE-2026-0892",
"author": "Automated Processing Pipeline"
},
"page": {
"size": {
"preset": "A4"
},
"margins": {
"top_in_pt": 60,
"right_in_pt": 50,
"bottom_in_pt": 60,
"left_in_pt": 50
}
},
"content": [
{
"type": "headline",
"level": "h1",
"text": "Review Required"
},
{
"type": "paragraph",
"markdown": "The following fields were extracted with confidence below the 0.90 threshold and require manual verification."
},
{
"type": "table",
"header": {
"cells": [
{"text": "Field"},
{"text": "Extracted Value"},
{"text": "Confidence"}
]
},
"rows": [
{
"cells": [
{"text": "due_date"},
{"text": "2026-05-15"},
{"text": "0.87"}
]
}
]
},
{
"type": "separator"
},
{
"type": "headline",
"level": "h2",
"text": "All Extracted Fields"
},
{
"type": "table",
"header": {
"cells": [
{"text": "Field"},
{"text": "Value"},
{"text": "Confidence"},
{"text": "Status"}
]
},
"rows": [
{
"cells": [
{"text": "vendor_name"},
{"text": "Bergmann Elektronik GmbH"},
{"text": "0.97"},
{"text": "OK"}
]
},
{
"cells": [
{"text": "invoice_number"},
{"text": "BE-2026-0892"},
{"text": "0.99"},
{"text": "OK"}
]
},
{
"cells": [
{"text": "total_amount"},
{"text": "12450.00"},
{"text": "0.95"},
{"text": "OK"}
]
},
{
"cells": [
{"text": "due_date"},
{"text": "2026-05-15"},
{"text": "0.87"},
{"text": "REVIEW"}
]
}
]
},
{
"type": "paragraph",
"markdown": "**Source Document:** supplier-invoice-2026-0892.pdf"
}
]
}
}'const reviewReport = await client.generateDocument({
format: "pdf",
document: {
metadata: {
title: `Review Required: ${extraction.invoice_number.value}`,
author: "Automated Processing Pipeline",
},
page: {
size: { preset: "A4" },
margins: {
top_in_pt: 60,
right_in_pt: 50,
bottom_in_pt: 60,
left_in_pt: 50,
},
},
content: [
{
type: "headline",
level: "h1",
text: "Review Required",
},
{
type: "paragraph",
markdown: `The following fields were extracted with confidence below the ${CONFIDENCE_THRESHOLD} threshold and require manual verification.`,
},
{
type: "table",
header: {
cells: [
{ text: "Field" },
{ text: "Extracted Value" },
{ text: "Confidence" },
],
},
rows: lowConfidenceFields.map((field) => ({
cells: [
{ text: field.name },
{ text: String(field.value) },
{ text: field.confidence.toFixed(2) },
],
})),
},
{ type: "separator" },
{
type: "headline",
level: "h2",
text: "All Extracted Fields",
},
{
type: "table",
header: {
cells: [
{ text: "Field" },
{ text: "Value" },
{ text: "Confidence" },
{ text: "Status" },
],
},
rows: Object.entries(extraction).map(([name, field]) => ({
cells: [
{ text: name },
{ text: String(field.value) },
{ text: field.confidence.toFixed(2) },
{ text: field.confidence >= CONFIDENCE_THRESHOLD ? "OK" : "REVIEW" },
],
})),
},
{
type: "paragraph",
markdown: "**Source Document:** supplier-invoice-2026-0892.pdf",
},
],
},
});review_report = client.generate_document(
format="pdf",
document={
"metadata": {
"title": f"Review Required: {extraction['invoice_number']['value']}",
"author": "Automated Processing Pipeline",
},
"page": {
"size": {"preset": "A4"},
"margins": {
"top_in_pt": 60,
"right_in_pt": 50,
"bottom_in_pt": 60,
"left_in_pt": 50,
},
},
"content": [
{
"type": "headline",
"level": "h1",
"text": "Review Required",
},
{
"type": "paragraph",
"markdown": (
f"The following fields were extracted with confidence below the"
f" {CONFIDENCE_THRESHOLD} threshold and require manual verification."
),
},
{
"type": "table",
"header": {
"cells": [
{"text": "Field"},
{"text": "Extracted Value"},
{"text": "Confidence"},
]
},
"rows": [
{
"cells": [
{"text": field["name"]},
{"text": str(field["value"])},
{"text": f"{field['confidence']:.2f}"},
]
}
for field in low_confidence_fields
],
},
{"type": "separator"},
{
"type": "headline",
"level": "h2",
"text": "All Extracted Fields",
},
{
"type": "table",
"header": {
"cells": [
{"text": "Field"},
{"text": "Value"},
{"text": "Confidence"},
{"text": "Status"},
]
},
"rows": [
{
"cells": [
{"text": name},
{"text": str(field["value"])},
{"text": f"{field['confidence']:.2f}"},
{
"text": "OK"
if field["confidence"] >= CONFIDENCE_THRESHOLD
else "REVIEW"
},
]
}
for name, field in extraction.items()
],
},
{
"type": "paragraph",
"markdown": "**Source Document:** supplier-invoice-2026-0892.pdf",
},
],
},
)var reviewRows []il.TableRow
for name, field := range *extraction {
status := "OK"
if field.Confidence < confidenceThreshold {
status = "REVIEW"
}
reviewRows = append(reviewRows, il.TableRow{
Cells: []il.TableCell{
{Text: name},
{Text: fmt.Sprintf("%v", field.Value)},
{Text: fmt.Sprintf("%.2f", field.Confidence)},
{Text: status},
},
})
}
reviewReport, err := client.GenerateDocument(il.GenerateDocumentRequest{
Format: "pdf",
Document: il.DocumentDefinition{
Metadata: il.DocumentMetadata{
Title: fmt.Sprintf("Review Required: %v", (*extraction)["invoice_number"].Value),
Author: "Automated Processing Pipeline",
},
Page: &il.DocumentPage{
Size: il.PageSize{Preset: "A4"},
Margins: il.Margins{TopInPt: 60, RightInPt: 50, BottomInPt: 60, LeftInPt: 50},
},
Content: []il.ContentBlock{
il.HeadlineBlock{Type: "headline", Level: "h1", Text: "Review Required"},
il.ParagraphBlock{
Type: "paragraph",
Markdown: "The following fields require manual verification.",
},
il.SeparatorBlock{Type: "separator"},
il.HeadlineBlock{Type: "headline", Level: "h2", Text: "All Extracted Fields"},
il.TableBlock{
Type: "table",
Header: &il.TableRow{
Cells: []il.TableCell{
{Text: "Field"},
{Text: "Value"},
{Text: "Confidence"},
{Text: "Status"},
},
},
Rows: reviewRows,
},
il.ParagraphBlock{
Type: "paragraph",
Markdown: "**Source Document:** supplier-invoice-2026-0892.pdf",
},
},
},
})
if err != nil {
log.Fatal(err)
}
_ = reviewReportThe review report gives a human everything they need: which fields are below threshold, what the extracted values were, and the confidence scores for every field. No one has to re-read the original document to figure out what went wrong.
Putting It All Together: The MCP Agent Workflow
When you use this pipeline through an MCP agent (Claude Code, Cursor, Windsurf), the three stages happen in a single conversation. You describe the workflow once, and the agent orchestrates it.
Here is a prompt that captures the full pipeline:
I have a supplier invoice at https://example.com/invoices/supplier-invoice-2026-0892.pdf. Extract the vendor name, invoice number, total amount, currency, due date, and line items with descriptions, quantities, and unit prices. Then check the confidence scores: if all fields are at 0.90 or above, generate a PDF summary report with the invoice details and a line item table. If any field is below 0.90, generate a review report that lists which fields need manual verification, with their extracted values and confidence scores.
The agent calls extract_document, reads the confidence scores in the response, and decides whether to call generate_document with the summary template or the review template. No code, no scripting — just natural language describing a multi-step workflow with conditional routing.
This works because the MCP server exposes both tools (extract_document and generate_document) with full schema descriptions. The agent knows what parameters each tool expects and what the response looks like. The confidence scores in the extraction response give the agent a concrete signal for the routing decision.
Extending the Pipeline
The three-stage pattern — extract, evaluate, route — is a foundation. Here are four extensions that build on it.
Adding Image Processing to the Pipeline
If the source document includes a company logo or product images, you can add an image transformation step. After extraction, the agent calls transform_image to resize, crop, or convert images before they are embedded in the generated report.
The agent prompt extends naturally:
…and if the invoice has a supplier logo, resize it to 200x80 pixels and include it in the header of the summary report.
One more MCP tool call in the chain. Same credit pool, same auth.
Batch Processing with Confidence Aggregation
Process multiple documents and aggregate the results. Extract data from a stack of invoices, evaluate confidence across the batch, and generate a single summary spreadsheet with a column marking which rows need review.
The per-field confidence scores make this possible at scale. Instead of reviewing every document, a human reviews only the rows where confidence dropped below threshold. For a batch of 50 invoices where 45 extract cleanly, that is 45 documents that never touch a human queue.
Webhook-Based Async Processing
For large documents or high-volume pipelines, use the async variants of each API. Pass a webhook_url and the API sends the result to your endpoint when processing completes:
curl -X POST \
https://api.iterationlayer.com/document-extraction/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"files": [
{
"type": "url",
"name": "large-contract.pdf",
"url": "https://example.com/contracts/large-contract.pdf"
}
],
"schema": {
"fields": [
{
"name": "parties",
"type": "ARRAY",
"description": "Contract parties",
"item_schema": {
"fields": [
{
"name": "name",
"type": "TEXT",
"description": "Party name"
}
]
}
}
]
},
"webhook_url": "https://your-app.example.com/webhooks/extraction-complete"
}'const asyncResult = await client.extractAsync({
files: [
{
type: "url",
name: "large-contract.pdf",
url: "https://example.com/contracts/large-contract.pdf",
},
],
schema: {
fields: [
{
name: "parties",
type: "ARRAY",
description: "Contract parties",
item_schema: {
fields: [
{
name: "name",
type: "TEXT",
description: "Party name",
},
],
},
},
],
},
webhook_url: "https://your-app.example.com/webhooks/extraction-complete",
});
// asyncResult.message confirms the job was queuedasync_result = client.extract_async(
files=[
{
"type": "url",
"name": "large-contract.pdf",
"url": "https://example.com/contracts/large-contract.pdf",
}
],
schema={
"fields": [
{
"name": "parties",
"type": "ARRAY",
"description": "Contract parties",
"item_schema": {
"fields": [
{
"name": "name",
"type": "TEXT",
"description": "Party name",
}
]
},
}
]
},
webhook_url="https://your-app.example.com/webhooks/extraction-complete",
)
# async_result["message"] confirms the job was queuedasyncResult, err := client.ExtractAsync(il.ExtractAsyncRequest{
Files: []il.FileInput{
il.NewFileFromURL(
"large-contract.pdf",
"https://example.com/contracts/large-contract.pdf",
),
},
Schema: il.ExtractionSchema{
"parties": il.NewArrayFieldConfig("parties", "Contract parties"),
},
WebhookURL: "https://your-app.example.com/webhooks/extraction-complete",
})
if err != nil {
log.Fatal(err)
}
// asyncResult.Message confirms the job was queuedYour webhook endpoint receives the extraction result, runs the confidence evaluation, and triggers the appropriate document generation call. The entire pipeline runs asynchronously without blocking your application.
Per-Field Threshold Tuning
Not all fields deserve the same confidence threshold. A vendor name at 0.88 confidence is probably fine — there are only so many ways to spell a company name. A total amount at 0.88 confidence on a six-figure invoice deserves a closer look.
{
"vendor_name": 0.85,
"invoice_number": 0.90,
"total_amount": 0.95,
"currency": 0.90,
"due_date": 0.90,
"line_items": 0.90
}Map field names to thresholds. Compare each field’s confidence against its specific threshold instead of using a single global number. Financial fields get tighter thresholds. Descriptive fields get looser ones.
Why This Pattern Works
The extract-evaluate-route pattern works for document processing the same way CI/CD pipelines work for code: automated when the checks pass, human review when they don’t.
The pieces that make this viable:
- Per-field confidence scores give the agent a concrete signal for routing. Without them, every document would need human review or blind trust.
- Structured I/O means the extraction output is already in the format document generation expects. No parsing, no transformation, no glue code between stages.
- Composable APIs mean extraction and generation share the same auth and credit pool. Adding a third step (image transformation, spreadsheet generation) does not mean adding a third vendor.
- MCP means the agent discovers and chains these tools without custom integration code. The same pipeline works in Claude Code, Cursor, Windsurf, or any MCP-compatible client.
The result is a document processing pipeline that handles the happy path automatically and only involves humans for the edge cases that actually need them.
Get Started
The full pipeline uses two APIs: Document Extraction for structured parsing with confidence scores, and Document Generation for producing PDF reports. Both are available through the MCP server or directly via the TypeScript, Python, and Go SDKs.
Sign up for a free account — no credit card required — and try the extraction with a document you already have. Check the confidence scores. If they are consistently above your threshold for your document type, you have a pipeline that runs itself.