Iteration Layer

Process Invoices in n8n Without Burning LLM Credits on Every Document

6 min read Document Extraction

The LLM Tax on Every Invoice

The most common n8n invoice workflow looks like this: an email trigger catches the PDF, a Mistral OCR node converts it to text, and a GPT-4o node parses the text into structured fields. It works. It also bills you twice per document — once for OCR tokens, once for LLM tokens.

Run 100 invoices through that pipeline and the cost is different every month. A two-page invoice with a long line items table costs more than a one-page summary. A scanned document with noise costs more OCR tokens than a clean digital PDF. You can’t predict the bill because it scales with content length, not document count.

And then there’s the hallucination problem. LLMs are probabilistic. Ask GPT-4o to extract line items from a dense table and it will occasionally invent a row, merge two rows, or misread a decimal. You won’t notice until the finance team catches a discrepancy three weeks later.

Schema-Based Extraction, Fixed Cost

Iteration Layer Document Extraction takes a different approach. You define a schema — the fields you want, their types, whether they’re required — and the extraction engine pulls those fields from the document. No OCR preprocessing step. No LLM token billing. Each extraction costs a fixed number of credits regardless of document length.

The extraction returns a confidence score for every field. Not a binary pass/fail — a number between 0.0 and 1.0 that tells you exactly how certain the engine is about each value. You can route low-confidence results to human review and auto-process everything else.

This means your per-invoice cost is predictable and your output is deterministic. Same document, same schema, same result. No temperature parameter quietly changing your line items.

The Workflow: Inbox to Spreadsheet to PDF

Here’s what we’re building in n8n: an automated pipeline that watches for invoice emails, extracts structured data, routes based on confidence, writes to a spreadsheet, and generates a PDF summary. Five nodes. No Function nodes. No LLM chain.

Step 1: Email Trigger (IMAP)

Open the n8n canvas and add a new node. Search for “Email Trigger (IMAP)” and add it.

In the node settings, enter your mail server credentials. Set the Mailbox Name to the folder where invoices arrive — typically INBOX or a subfolder like INBOX/Invoices. Under Options, set Download Attachments to true. This ensures the PDF comes through as n8n binary data, ready for the next node.

Set the Poll Times interval to match your processing cadence — every 5 minutes for near-realtime, or every hour for batch processing.

Step 2: Iteration Layer (Document Extraction)

Add an Iteration Layer node. In the Resource dropdown, select Document Extraction. Under File Input Mode, select Binary Data so the node picks up the PDF attachment from the email trigger.

Now define the extraction schema. You have two paths in the UI: the schema builder and raw JSON.

Using the UI builder: Click Add Field for each data point. For each field, enter the name, select the type from the dropdown, add a description, and toggle Required if the field must be present. This works well for flat schemas with a handful of fields.

Using raw JSON: For invoices with nested line items, switch to the JSON editor. Paste this schema:

{
  "fields": [
    {
      "name": "invoice_number",
      "type": "TEXT",
      "description": "The unique invoice identifier",
      "is_required": true
    },
    {
      "name": "vendor_name",
      "type": "TEXT",
      "description": "Name of the company that issued the invoice",
      "is_required": true
    },
    {
      "name": "invoice_date",
      "type": "DATE",
      "description": "Date the invoice was issued",
      "is_required": true
    },
    {
      "name": "line_items",
      "type": "ARRAY",
      "description": "Individual items or services billed",
      "item_schema": {
        "fields": [
          {
            "name": "description",
            "type": "TEXT",
            "description": "Item or service description"
          },
          {
            "name": "amount",
            "type": "CURRENCY_AMOUNT",
            "description": "Total amount for this line item"
          }
        ]
      }
    },
    {
      "name": "total",
      "type": "CURRENCY_AMOUNT",
      "description": "Total amount due including tax",
      "is_required": true
    }
  ]
}

The ARRAY type with a nested item_schema handles line items. Each item in the array gets its own confidence score, so you know which rows the engine is uncertain about — not just whether the overall extraction succeeded.

A typical extraction response looks like this:

{
  "invoice_number": {
    "type": "TEXT",
    "value": "INV-2026-0391",
    "confidence": 0.98
  },
  "vendor_name": {
    "type": "TEXT",
    "value": "Nordic Office Supplies AB",
    "confidence": 0.96
  },
  "invoice_date": {
    "type": "DATE",
    "value": "2026-01-28",
    "confidence": 0.97
  },
  "line_items": {
    "type": "ARRAY",
    "value": [
      [
        {
          "value": "Ergonomic keyboard",
          "confidence": 0.95
        },
        {
          "value": 899.90,
          "confidence": 0.96
        }
      ],
      [
        {
          "value": "Standing desk mount",
          "confidence": 0.93
        },
        {
          "value": 249.00,
          "confidence": 0.97
        }
      ]
    ],
    "confidence": 0.95
  },
  "total": {
    "type": "CURRENCY_AMOUNT",
    "value": 1454.88,
    "confidence": 0.97
  }
}

Every field. Every line item. Every value has a confidence score. That is what separates a demo from something you can run in production.

Step 3: IF Node (Confidence Routing)

Add an IF node after the extraction step. This routes invoices based on how confident the extraction is.

In the IF node settings, set the Condition to Number. For the Value 1 field, enter the expression {{ $json.total.confidence }}. Set the Operation to Larger. Set Value 2 to 0.90.

This creates two output branches:

  • True branch (confidence > 0.90): Auto-process. The extraction is reliable enough to write directly to the spreadsheet.
  • False branch (confidence <= 0.90): Flag for review. Route to a separate sheet, a Slack notification, or a manual review queue.

For tighter control, chain a second IF node on the false branch to separate “needs a quick look” (0.70-0.90) from “needs manual processing” (below 0.70).

Step 4: Google Sheets (Append Row)

Connect the true branch of the IF node to a Google Sheets node. Select your spreadsheet and worksheet.

Map the extraction output to columns using n8n expressions:

  • Invoice Number: {{ $json.invoice_number.value }}
  • Vendor: {{ $json.vendor_name.value }}
  • Date: {{ $json.invoice_date.value }}
  • Total: {{ $json.total.value }}
  • Confidence: {{ $json.total.confidence }}

Including the confidence score in the spreadsheet means the finance team can sort by confidence and spot-check the lowest-scoring extractions without reviewing every single document.

Step 5: Iteration Layer (Document Generation)

Add a second Iteration Layer node on the true branch, after the Google Sheets node. In the Resource dropdown, select Document Generation. Set the Format to pdf.

In the Document JSON field, provide the template:

{
  "metadata": {
    "title": "Invoice Summary — {{ $json.invoice_number.value }}"
  },
  "content": [
    {
      "type": "headline",
      "text": "Invoice Summary",
      "level": 1
    },
    {
      "type": "table",
      "rows": [
        ["Invoice Number", "{{ $json.invoice_number.value }}"],
        ["Vendor", "{{ $json.vendor_name.value }}"],
        ["Date", "{{ $json.invoice_date.value }}"],
        ["Total Due", "{{ $json.total.value }}"]
      ]
    },
    {
      "type": "headline",
      "text": "Line Items",
      "level": 2
    },
    {
      "type": "paragraph",
      "text": "See attached spreadsheet for full line item detail."
    }
  ]
}

The generated PDF comes out as n8n binary data. Send it as an email attachment, archive it in Google Drive, or store it in S3 — whatever your downstream process needs.

Cost Comparison

With the OCR + LLM approach, a single invoice costs Mistral OCR tokens (variable by page count and content density) plus GPT-4o tokens (variable by prompt length and output). A two-page invoice with 15 line items costs more than a one-page invoice with 3. Multiply by 100 invoices per month and you are estimating, not budgeting.

With Iteration Layer, each extraction costs a fixed number of credits. One credit per document, regardless of page count or line item count. 100 invoices costs the same as the month before. The schema is deterministic — same document in, same structured data out.

No per-token surprises. No hallucinated line items. No temperature parameter to tune.

Get Started

Install the Iteration Layer community node from the n8n UI — search for n8n-nodes-iterationlayer under Settings > Community Nodes. The Document Extraction docs cover all 17 field types, nested array schemas, and confidence score handling. The n8n integration docs walk through every resource and parameter.

Start with one invoice. Define the schema, run the extraction, check the confidence scores. Once you trust the output, wire up the rest of the pipeline. Sign up to get your API key.

Build your first workflow in minutes

Chain our APIs together and ship a complete pipeline before lunch. Free trial credits included — no credit card required.