Document Generation vs Puppeteer PDF: Browser Engine or Content API?

The Default Answer

When a developer needs to generate a PDF, the first answer is almost always Puppeteer. Write some HTML, spin up headless Chrome, call page.pdf(). It’s free, it’s open source, and you already know HTML and CSS.

For a quick internal tool or a weekend project, that’s fine. The problems start when you move to production — and they compound when your requirements grow past “render this HTML to a file.”

Puppeteer works by running a full Chromium instance. Every PDF you generate goes through the same rendering pipeline a browser uses to display a web page: HTML parser, CSS layout engine, JavaScript runtime, compositor. All of that runs for every single document, whether it’s a one-page invoice or a 200-page report.

The Infrastructure Tax

Each Chromium instance consumes roughly 200-300MB of RAM. That’s not a bug — it’s a browser. When you need to generate 50 documents concurrently, you’re not scaling a PDF generator. You’re scaling 50 browsers.

The operational surface area is large:

Chromium versions need tracking and updating. Security patches don’t apply themselves.
Font installation is your problem. The fonts that render correctly on your macOS dev machine won’t exist in your Alpine Linux container unless you install them explicitly.
Concurrency means managing a pool of browser instances, handling crashes, cleaning up zombie processes.
WebSocket limits cap payloads at 256MB — a hard ceiling if you’re generating documents with embedded images or large datasets.
Serverless deployment is painful. Chromium’s binary is too large for most Lambda layers, and cold starts measured in seconds make it impractical for on-demand generation.

Docker images balloon. Memory leaks creep in. The OOM killer visits at 3 AM. You’ve seen this movie.

The CSS Paged Media Gap

Here’s the part that catches people off guard. Chromium does not support CSS Paged Media — the W3C specification designed specifically for paginated documents. That means:

No @page rules for per-page styling.
No native page numbers.
No running headers or footers that repeat across pages.
No page-break-inside: avoid that actually works reliably across nested elements.
No named pages for different sections of the same document.

Puppeteer’s page.pdf() does accept headerTemplate and footerTemplate options. But these are HTML strings injected into a narrow strip at the top or bottom of each page — they can’t reference the document content, they can’t flow with the layout, and getting page numbers right requires JavaScript workarounds that break when the content length changes.

For anything beyond the simplest layout — a report with a cover page, a table of contents, numbered pages, and section headers — you’re fighting the tool. The browser was built to render scrolling web pages, not paginated documents. Every workaround you write is a reminder of that mismatch.

PDF Only

Puppeteer generates PDFs. That’s it. If a client asks for a Word document, or your ebook pipeline needs EPUB, or your sales team wants a slide deck — you need entirely separate tooling for each format. Different libraries, different APIs, different failure modes, different maintenance burden.

This matters more than it seems up front. Document generation requirements expand. The contract that started as PDF-only eventually needs a DOCX version for legal review. The report that ships as PDF needs a PowerPoint summary for the board meeting. Each new format is a new integration project.

Structured Blocks, Not HTML

The Document Generation API takes a fundamentally different approach. Instead of rendering HTML through a browser engine, you describe your document as structured JSON blocks. The API handles layout, pagination, text flow, and rendering — then returns a finished file in your chosen format.

No browser. No Chromium binary. No Docker image bloat. No font installation. No concurrency management. You send a request, you get a document.

The content model includes block types like headlines, paragraphs with Markdown support, images, tables, lists, page breaks, table of contents, QR codes, barcodes, spacers, and dividers. Headers and footers are native — they repeat across pages automatically, with built-in page number tokens. No JavaScript injection, no HTML template strings.

Page setup covers 19 size presets (A4, Letter, A5, and more), custom dimensions, per-side margins, and background colors. Custom fonts are supported at the document level — upload once, reference everywhere.

And the output isn’t limited to PDF. The same structured input generates PDF, DOCX, EPUB, or PPTX. One content model, four formats. Change the format field in your request and the API handles the rest.

What This Looks Like

Here’s an ebook chapter generated as EPUB — the same content model could produce a PDF or DOCX by changing a single field:

import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.generateDocument({
  format: "epub",
  document: {
    metadata: { title: "The API Economy", author: "Jane Doe", language: "en" },
    page: {
      size: { preset: "A5" },
      margins: { top_in_pt: 54, right_in_pt: 54, bottom_in_pt: 54, left_in_pt: 54 },
    },
    styles: { /* ... */ },
    content: [
      { type: "headline", level: "h1", text: "Chapter 1: Why APIs Won" },
      { type: "paragraph", markdown: "The shift from monolithic software to **composable APIs** didn't happen overnight." },
      { type: "paragraph", markdown: "It started with a simple observation: developers would rather call an endpoint than manage a library." },
      { type: "image", buffer: "base64...", width_in_pt: 400, height_in_pt: 250, fit: "contain" },
      { type: "list", variant: "ordered", items: [
        { text: "APIs reduce operational overhead" },
        { text: "APIs enable composability" },
        { text: "APIs abstract away complexity" },
      ]},
    ],
  },
});

No browser launched. No HTML parsed. No CSS debugged. The API takes the structured content and produces the file. The page, styles, and content fields give you full control over the output — without owning the rendering engine.

Where Puppeteer Still Makes Sense

Puppeteer is a good tool. It’s free, open source, and backed by a large community. If your requirements are narrow — a single PDF template, low volume, a team comfortable managing Chromium infrastructure — it gets the job done.

It also excels when you genuinely need browser rendering. If your document is an interactive web page that you’re printing to PDF, or if your layout depends on JavaScript execution, a headless browser is the right tool because you need an actual browser.

The tradeoff becomes painful when:

You need paginated documents with proper headers, footers, and page numbers.
You need multiple output formats from the same content.
You’re generating documents at scale and can’t afford 200-300MB per instance.
You’re running serverless and can’t tolerate multi-second cold starts.
You’d rather describe what the document contains than how it renders.

The Tradeoff, Plainly

Puppeteer gives you full browser rendering — the entire power of HTML, CSS, and JavaScript — at the cost of running and scaling a browser. You own the infrastructure, the fonts, the concurrency, the security patches. Output is PDF only. Paged media support is missing.

Iteration Layer gives you structured document generation — block types for every content need, native pagination, four output formats — at the cost of not having arbitrary HTML rendering. You describe the content, the API handles the layout. No infrastructure to manage.

If your document is a web page that happens to be printable, use a browser. If your document is structured content that needs to become a file, use a content API.

Get Started

Check out the Document Generation docs to see the full block type reference, page configuration options, and SDK examples in TypeScript and Python. The same content model that produces your PDF today produces DOCX, EPUB, and PPTX tomorrow — no new integration required.

Iteration Layer runs on EU infrastructure (Frankfurt), which matters if your data residency requirements rule out US-hosted services.

Ingest

Transform

Generate

Categories

Featured

Overview

APIs

Integrations