Still Using Puppeteer to Generate Images? There's a Faster, Simpler Way

The Puppeteer Tax

You started with a reasonable idea. Build an HTML template, load it in headless Chrome, take a screenshot. It works on your machine. Ship it.

Then production happens.

Each Puppeteer instance spins up a full Chromium browser. That’s 200MB+ of RAM per instance. Your Docker image — which was 80MB before — balloons past 1GB because Chromium drags in half of the Debian package repository. Cold starts take seconds, not milliseconds. And when traffic spikes, you’re not scaling an API — you’re scaling a fleet of headless browsers.

The failure modes are creative too. Zombie processes that don’t clean up. Chrome instances that hang on malformed CSS. Memory leaks that slowly eat your container’s allocation until the OOM killer steps in. Fonts that render differently between your macOS dev machine and your Alpine Linux container. And every few months, a Chromium update breaks something subtle in your rendering pipeline.

If you’ve built image generation on Puppeteer or Playwright, you’ve paid this tax. The Image Generation API eliminates it entirely.

HTML Rendering vs. Layer Composition

Puppeteer generates images by rendering HTML. You write a template, inject data, load it in a browser, and screenshot the result. That means you’re running a full browser engine — HTML parser, CSS layout engine, JavaScript runtime, GPU compositor — to produce a static image.

The Image Generation API takes a different approach. Instead of rendering a web page, you compose layers. Each layer is a discrete visual element — a background color, a rectangle, a text block, an image — positioned explicitly on a canvas. No DOM, no CSS cascade, no layout engine. The API stacks the layers in order and renders the result.

This isn’t a limitation. It’s the point. Most programmatically generated images — social cards, email banners, certificates, product graphics — are compositions of positioned elements. You don’t need a browser engine for that. You need a compositor.

Same Output, Different Approach

Here’s a social card built with Puppeteer:

import puppeteer from "puppeteer";

const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.setViewport({ width: 1200, height: 630 });
await page.setContent(`
  <div style="width: 1200px; height: 630px; background: #1a1a2e; display: flex; flex-direction: column; justify-content: center; padding: 60px;">
    <h1 style="color: #e94560; font-family: Inter; font-size: 48px; margin: 0;">How We Reduced API Latency by 60%</h1>
    <p style="color: #ffffff; font-family: Inter; font-size: 24px; margin-top: 20px;">A deep dive into our optimization strategy</p>
    <p style="color: #888888; font-family: Inter; font-size: 18px; margin-top: auto;">iterationlayer.com</p>
  </div>
`);
await page.screenshot({ path: "social-card.png" });
await browser.close();

And the same card built with the API:

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const { data: { buffer: imageBase64 } } = await client.generateImage({
  dimensions: { width: 1200, height: 630 },
  output_format: "png",
  layers: [
    {
      index: 0,
      type: "solid-color-background",
      hex_color: "#1a1a2e",
    },
    {
      index: 1,
      type: "text",
      text: "How We Reduced API Latency by 60%",
      font_name: "Inter",
      font_weight: "Bold",
      font_size_in_px: 48,
      text_color: "#e94560",
      text_align: "left",
      vertical_align: "top",
      position: { x: 60, y: 180 },
      dimensions: { width: 1080, height: 200 },
    },
    {
      index: 2,
      type: "text",
      text: "A deep dive into our optimization strategy",
      font_name: "Inter",
      font_weight: "Regular",
      font_size_in_px: 24,
      text_color: "#ffffff",
      text_align: "left",
      vertical_align: "top",
      position: { x: 60, y: 400 },
      dimensions: { width: 1080, height: 60 },
    },
    {
      index: 3,
      type: "text",
      text: "iterationlayer.com",
      font_name: "Inter",
      font_weight: "Regular",
      font_size_in_px: 18,
      text_color: "#888888",
      text_align: "left",
      vertical_align: "bottom",
      position: { x: 60, y: 540 },
      dimensions: { width: 1080, height: 40 },
    },
  ],
});

const imageBuffer = Buffer.from(imageBase64, "base64");

Same visual result. No browser. No DOM. No CSS. No Chromium.

What You Stop Maintaining

The Puppeteer approach carries hidden infrastructure:

Docker image size. Chromium needs system libraries — libnss3, libatk-bridge2.0, libcups2, libdrm2, and dozens more. A minimal Node.js image is ~150MB. Add Chromium and you’re past 1GB. With the API, your container doesn’t need any of that. Your image stays at 150MB.

Memory per request. Each Puppeteer page consumes 50-200MB depending on content complexity. Process 10 images concurrently and you need 2GB of RAM just for Chrome instances. The API is a single HTTP call — memory usage is whatever your HTTP client needs.

Cold starts. Launching Chromium takes 1-3 seconds. In a serverless environment, that cold start happens on every invocation unless you keep instances warm (which costs money). The API responds in the time it takes to compose the layers — typically under a second.

Process management. Chrome processes can hang. They can crash. They can leak memory over time. You need health checks, restart logic, and timeout handling. With an API call, your failure mode is a network error — standard HTTP retry logic handles it.

Font consistency. Puppeteer renders text using the fonts installed in the container’s OS. Different base images have different fonts. Alpine has almost none. Ubuntu has some but not the ones you need. You end up copying font files into the container and configuring fontconfig. The API accepts fonts directly in the request — what you send is what renders.

Scaling: Servers vs. API Calls

Scaling Puppeteer means scaling servers. More concurrent images means more Chrome instances means more CPU and RAM. You’re running a browser farm.

The math is straightforward. Say you need to generate 1,000 social cards for a product launch. With Puppeteer at 3 seconds per image and 200MB per instance, running 10 concurrent instances needs 2GB of RAM and takes ~5 minutes. Scale to 50 concurrent instances and you need 10GB of RAM.

With the API, you fire off 1,000 HTTP requests. Your server’s job is to make those requests and collect the responses. The compute happens on the API’s infrastructure. Your machine barely notices.

This matters especially in serverless environments. Lambda functions or Cloudflare Workers with Puppeteer are a notorious pain point — the binary size exceeds layer limits, cold starts destroy latency budgets, and memory costs scale linearly with concurrency. An API call fits naturally into any serverless function.

When Puppeteer Still Makes Sense

Puppeteer isn’t wrong for everything. If you need to screenshot actual web pages — full websites with JavaScript interactions, complex CSS layouts, or third-party embeds — Puppeteer is the right tool. It’s a browser because it needs to be.

But if you’re using Puppeteer to compose positioned elements on a canvas — text here, image there, colored background — you’re using a browser engine as an image compositor. That’s like running a database to store a JSON file. It works, but you’re paying for capabilities you don’t need.

Migration Path

Moving from Puppeteer to the API is a template-by-template migration. Take one HTML template, decompose it into layers, and replace the Puppeteer call with an API call.

Start with the background. An HTML background-color becomes a solid-color-background layer. Positioned divs with background colors become rectangle layers. Text elements become text layers with explicit font, size, and position. Images become static-image layers.

The mental model shifts from “layout engine figures out where things go” to “I tell each element where to go.” For generated images — where you control the design — this is actually simpler. No CSS specificity battles, no flexbox debugging, no “why is this 1px off” mysteries.

// Before: Puppeteer template function
const generateCard = async (title: string, author: string) => {
  const page = await browser.newPage();
  await page.setContent(`<div>...</div>`);
  const buffer = await page.screenshot();
  await page.close();

  return buffer;
};

// After: SDK call
const generateCard = async (title: string, author: string) => {
  const { data: { buffer } } = await client.generateImage({
    dimensions: { width: 1200, height: 630 },
    output_format: "png",
    layers: [
      { index: 0, type: "solid-color-background", hex_color: "#1a1a2e" },
      {
        index: 1,
        type: "text",
        text: title,
        font_name: "Inter",
        font_weight: "Bold",
        font_size_in_px: 48,
        text_color: "#e94560",
        position: { x: 60, y: 180 },
        dimensions: { width: 1080, height: 200 },
      },
      {
        index: 2,
        type: "text",
        text: `By ${author}`,
        font_name: "Inter",
        font_weight: "Regular",
        font_size_in_px: 20,
        text_color: "#cccccc",
        position: { x: 60, y: 420 },
        dimensions: { width: 1080, height: 40 },
      },
    ],
  });

  return Buffer.from(buffer, "base64");
};

The function signature stays the same. The callers don’t change. The Docker image loses a gigabyte.

Get Started

Check out the Image Generation API docs for the full layer reference. Sign up for a free account — no credit card required — and try replacing one Puppeteer template. Once you see the difference in deployment complexity and response times, the rest of the migration sells itself.

Ingest

Transform

Generate

Categories

Featured

Overview

APIs

Integrations