Chain Document Extraction, Image Processing, and Generation in a Single Claude Code Session

8 min read

One Conversation, Four APIs

You get a PDF invoice from a supplier. You need to extract the line items, resize the supplier’s logo, and generate a branded summary card for your internal dashboard. Three separate tasks that normally mean three separate scripts, three sets of API credentials, and an afternoon of glue code.

With MCP, all four Iteration Layer APIs are available as tools inside your AI assistant. Claude Code and Cursor can call them in sequence within a single conversation — passing the output of one API directly into the next. No intermediary scripts. No file juggling. Just describe the workflow, and the assistant chains the calls together.

Setting Up All Four Tools

The Iteration Layer MCP server exposes all APIs through a single endpoint. One setup gives you access to Document Extraction, Image Transformation, Image Generation, and Document Generation.

In Claude Code:

claude mcp add iterationlayer --transport streamablehttp https://api.iterationlayer.com/mcp

In Cursor, add to .cursor/mcp.json:

{
  "mcpServers": {
    "iterationlayer": {
      "type": "streamablehttp",
      "url": "https://api.iterationlayer.com/mcp"
    }
  }
}

That’s it. One server, four tools: extract_document, transform_image, generate_image, generate_document. Authentication happens via OAuth in the browser on first use.

A Real Workflow: Invoice to Summary Card

Here’s what a multi-API chain looks like in practice. You have a supplier invoice PDF and need to produce a branded summary card for your internal system.

Step 1 — Extract the data.

“Extract the supplier name, invoice number, date, line items with descriptions and amounts, and the total from this invoice PDF.”

The assistant calls extract_document with a schema matching your request. Back comes structured JSON with every field and its confidence score:

{
  "supplierName": { "value": "Nordic Components AB", "confidence": 0.96 },
  "invoiceNumber": { "value": "NC-2026-1847", "confidence": 0.98 },
  "invoiceDate": { "value": "2026-02-28", "confidence": 0.97 },
  "lineItems": {
    "value": [
      [
        { "value": "Circuit boards (x200)", "confidence": 0.95 },
        { "value": 4800.00, "confidence": 0.97 }
      ],
      [
        { "value": "Connector assemblies (x50)", "confidence": 0.94 },
        { "value": 1250.00, "confidence": 0.96 }
      ]
    ],
    "confidence": 0.95
  },
  "total": { "value": 6050.00, "confidence": 0.98 }
}

Step 2 — Process the supplier logo.

“Now take the supplier’s logo from https://example.com/nordic-logo.png, resize it to 120x40, and convert it to PNG with a transparent background.”

The assistant calls transform_image with resize and convert operations. The logo comes back optimized for placement in the summary card.

Step 3 — Generate the summary card.

“Generate a 800x400 image for the dashboard. White background. Put the resized supplier logo at the top left. ‘NC-2026-1847’ as the title in dark text. Below that, a gray subtitle line with ‘Nordic Components AB — Feb 28, 2026’. At the bottom right, the total ‘$6,050.00’ in large bold text. Add a thin blue accent line across the top.”

The assistant calls generate_image with the extracted data and processed logo composed into a layer stack. The dashboard card is generated in one call — no manual data entry, no Figma, no template.

Three API calls, one conversation, zero glue code. The assistant passed data between each step because it has the full conversation context.

More Chaining Patterns

The invoice-to-card workflow is just one pattern. Here are a few more that combine multiple APIs in a single session.

Receipt to expense report PDF:

“Here are three receipt photos from my business trip. Extract the vendor name, date, amount, and category from each one. Then generate a PDF expense report with a table of all items, a total at the bottom, and my name ‘Jane Chen’ in the header.”

The assistant calls extract_document for each receipt, collects the structured data, then calls generate_document to produce a formatted PDF expense report. Two API types, one conversation. The receipts go in as photos and a finished PDF comes out.

Product photo to catalog page:

“Take this product photo and resize it to 600x400 with smart cropping. Then generate an A4 PDF product sheet with the cropped photo at the top, the product name ‘ErgoDesk Pro’ as a headline, a paragraph of description text, a table of specifications, and a QR code linking to the product page.”

The assistant calls transform_image for the crop, then generate_document to build the product sheet with the processed image, text blocks, a specs table, and a QR code — all in one document.

Contract to summary image:

“Extract the parties, effective date, term length, and total contract value from this PDF. Then generate a 1200x630 card with a dark background showing the key terms — party names at the top, dates and value below, styled like an OG image.”

extract_document pulls the key terms, generate_image composes them into a shareable summary card. Useful for Slack updates, internal dashboards, or stakeholder emails.

Screenshot to optimized banner:

“Take this screenshot, crop it to 16:9 aspect ratio with smart cropping, upscale it to 2x, then generate a marketing banner with the screenshot in the center, a gradient overlay, and ‘Now Available’ text on top.”

transform_image handles the crop and upscale, generate_image composes the final banner with the processed screenshot as a layer. The assistant knows to apply the transformations first and use the result in the composition.

Why This Works

The reason multi-API chaining works smoothly in conversation is that the assistant holds the full context. After extracting data from a document, it doesn’t forget the results — it uses them directly in the next tool call.

This is fundamentally different from writing pipeline code. In code, you parse the extraction response, map it to the generation request, handle error cases, and wire the data flow explicitly. In conversation, the assistant does that mapping implicitly. You say “use the supplier name from the invoice” and it knows which value you mean.

The tradeoff is clear. Conversation-based chaining is ideal for:

  • One-off workflows — processing a single document or batch without building a script
  • Prototyping pipelines — figuring out the right sequence of API calls before writing production code
  • Ad-hoc data processing — extracting, transforming, and generating when the input varies every time

For production pipelines that run on a schedule or process hundreds of documents, you’d move to direct API calls. But the chain you prototyped in conversation maps directly to the API requests you’ll write — same endpoints, same parameters, same data structures.

Combining All Four APIs

The Iteration Layer MCP server gives you four tools that cover the full content processing lifecycle:

  • Document Extraction — structured data out of PDFs, images, Word files
  • Image Transformation — resize, crop, convert, upscale, sharpen, and 18 more operations
  • Image Generation — layer-based composition for cards, banners, graphics
  • Document Generation — PDFs, DOCX, EPUB, PPTX from structured content blocks

These tools compose in any order. Extract data from a document, use it to generate an image, transform that image, embed it in a new document. Or go the other direction — take a screenshot, transform it, and extract text from it. The assistant figures out the sequence from your description.

Tips for Effective Chaining

Be explicit about data flow. When you want the assistant to use output from a previous step, reference it directly: “Use the total from the invoice extraction” rather than “add the total.” The assistant handles the mapping, but clarity helps.

Break complex chains into steps. Instead of describing a five-step workflow in one message, walk through it step by step. This lets you verify intermediate results — check that the extraction looks right before using it to generate a report.

Check confidence scores. Document Extraction returns confidence scores for every field. If a score is low, ask the assistant to show you the raw value so you can verify it before it flows into a generated document or image.

Iterate on individual steps. If the generated image looks right but the PDF layout needs work, you don’t need to re-run the whole chain. Just ask the assistant to regenerate the document with adjusted styling. The extracted data and processed images are still in the conversation context.

Get Started

Sign up for a free account at iterationlayer.com — no credit card required. Add the MCP server to Claude Code or Cursor with the configuration above, and start chaining APIs from your next conversation.

The docs cover every tool — Document Extraction schemas, Image Transformation operations, Image Generation layers, and Document Generation blocks. Start with a two-step chain — extract data from a document and generate an image from it. Once that clicks, the four-step workflows follow naturally.

Start building in minutes

Free trial included. No credit card required.