HTML to Image in Seconds: The Developer's Complete API Guide

The Problem with HTML-to-Image

You need to generate images programmatically. Social cards, certificates, receipts, event tickets, promotional graphics. The common advice is: build it as HTML, screenshot it with a headless browser.

The tools are familiar. Puppeteer launches headless Chrome. Playwright does the same across browsers. wkhtmltoimage uses WebKit. They all follow the same pattern — render HTML, capture the viewport, save as an image.

And they all share the same problems.

Headless Browsers Are Heavy

Chrome needs memory. A single Puppeteer instance consumes 200-500 MB of RAM before it renders a single pixel. Spin up multiple instances for concurrency and you’re looking at gigabytes of memory for what should be a simple image generation task.

Cold starts hurt too. Launching a browser instance takes 1-3 seconds. If you’re running in a serverless environment, every invocation pays that cost. AWS Lambda with a Chrome layer is technically possible but practically painful — the binary alone eats half your deployment package size limit.

CSS Rendering Is Unpredictable

Headless Chrome renders CSS differently than Chrome with a display. Font rendering varies between environments. Flexbox and Grid layouts can shift by a pixel or two depending on the platform. Custom fonts that load fine locally fail to load in headless mode because the timing is wrong.

You end up with a test suite for your image templates. Not testing logic — testing whether the CSS renders the same way on your CI server as it does on your laptop. That’s not a problem you should have.

The Alternative: Layer-Based Composition

Instead of writing HTML and hoping a browser renders it correctly, describe the image as a stack of layers. That’s what the Image Generation API does.

No HTML. No CSS. No browser. You send a JSON request describing your layers — backgrounds, shapes, text, images — and get an image back.

Here’s what an HTML-to-image approach looks like versus the layer-based approach for the same output — a dark card with a title and accent bar:

The HTML approach:

<div style="width: 1200px; height: 630px; background: #1a1a2e; position: relative; font-family: 'Inter', sans-serif;">
  <div style="width: 100%; height: 4px; background: #e94560;"></div>
  <h1 style="color: #ffffff; font-size: 48px; font-weight: bold; position: absolute; top: 180px; left: 60px; width: 1080px;">
    Your Title Here
  </h1>
</div>

Then you need Puppeteer to render it:

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({ width: 1200, height: 630 });
await page.setContent(html);
await page.evaluateHandle("document.fonts.ready");
const screenshot = await page.screenshot({ type: "png" });
await browser.close();

The layer-based approach:

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.generateImage({
  dimensions: { width: 1200, height: 630 },
  output_format: "png",
  fonts: [
    {
      name: "Inter",
      weight: "Bold",
      style: "normal",
      file: {
        type: "url",
        name: "Inter-Bold.ttf",
        url: "https://example.com/fonts/Inter-Bold.ttf",
      },
    },
  ],
  layers: [
    {
      type: "solid-color-background",
      index: 0,
      hex_color: "#1a1a2e",
      opacity: 100,
    },
    {
      type: "rectangle",
      index: 1,
      hex_color: "#e94560",
      position: { x: 0, y: 0 },
      dimensions: { width: 1200, height: 4 },
      opacity: 100,
    },
    {
      type: "text",
      index: 2,
      text: "Your Title Here",
      font_name: "Inter",
      font_weight: "Bold",
      font_size_in_px: 48,
      text_color: "#ffffff",
      text_align: "left",
      position: { x: 60, y: 180 },
      dimensions: { width: 1080, height: 300 },
      is_splitting_lines: true,
      opacity: 100,
    },
  ],
});

const { data: { buffer: imageBase64 } } = result;
const imageBuffer = Buffer.from(imageBase64, "base64");

Same visual output. No browser, no DOM, no CSS, no font-loading race conditions.

The Tradeoffs — Honestly

The layer-based approach is not a drop-in replacement for every HTML-to-image use case. Here’s where each approach fits:

Layer-based is better when:

You need predictable, deterministic output
Your templates are structured (cards, banners, certificates, social images)
You want to avoid browser infrastructure entirely
You’re generating images at scale and need consistent performance
You need the output to be identical across environments

HTML-to-image is better when:

Your template requires complex CSS layouts that would be tedious to express as layers
You’re rendering existing web pages as images (actual screenshots)
You need browser-specific rendering (SVG with CSS animations, complex gradients)

For the vast majority of programmatic image generation — OG cards, social templates, certificates, promotional graphics — the structured layer approach is simpler and faster.

Mapping HTML Concepts to Layers

If you’re used to thinking in HTML/CSS, here’s how common patterns translate:

HTML/CSS	Layer equivalent
`<div>` with `background-color`	`solid-color-background` or `rectangle` layer
`<h1>` or `<p>` with styles	`text` layer with font_name, font_size_in_px, text_color
`<img src="...">`	`static-image` layer with position and dimensions
`border-top: 4px solid red`	`rectangle` layer positioned at y: 0, height: 4
`background-image: url(...)` with opacity	`image-overlay` layer with opacity control
`font-weight: bold` inside text	Markdown `bold` in the text layer content
`font-style: italic` inside text	Markdown `italic` in the text layer content
Absolutely positioned elements	Each layer has explicit `position: { x, y }`

The mental model shifts from “box model with cascading styles” to “positioned layers with explicit properties.” Less flexible in theory, more predictable in practice.

Performance Comparison

Headless Chrome: 1-3 seconds per image, 200-500 MB per instance, cold starts in serverless.

The Image Generation API: an HTTP request. Your server sends JSON, receives JSON with a base64-encoded image. No browser to launch, no memory overhead beyond the HTTP client. Concurrency is limited by your HTTP connection pool, not by browser instances.

For batch generation — 100 blog post OG images, 500 event cards, 1000 certificates — the difference is meaningful. A headless browser pipeline needs careful concurrency management to avoid memory exhaustion. The API handles it with standard HTTP request parallelism.

Text Handling

One of the biggest pain points with HTML-to-image is text. Fonts fail to load. Line heights vary. Text overflows its container because the headless browser calculates metrics differently than your local machine.

Text layers in the API have explicit properties:

font_size_in_px — exact pixel size, no CSS cascade or rem calculations
is_splitting_lines — auto-wraps text within the bounding box dimensions
text_align — left, center, or right alignment
vertical_align — top, center, or bottom within the bounding box
paragraph_spacing_in_px — explicit spacing between paragraphs

Fonts are provided as files in the request — TTF, OTF, WOFF, or WOFF2. No @font-face declarations, no CDN loading, no race conditions. The font is part of the request payload, so it’s always available when the text renders.

A Complete Example: Event Announcement Card

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.generateImage({
  dimensions: { width: 1200, height: 630 },
  output_format: "png",
  fonts: [
    {
      name: "Inter",
      weight: "Regular",
      style: "normal",
      file: { type: "url", name: "Inter-Regular.ttf", url: "https://example.com/fonts/Inter-Regular.ttf" },
    },
    {
      name: "Inter",
      weight: "Bold",
      style: "normal",
      file: { type: "url", name: "Inter-Bold.ttf", url: "https://example.com/fonts/Inter-Bold.ttf" },
    },
  ],
  layers: [
    { type: "solid-color-background", index: 0, hex_color: "#0f172a", opacity: 100 },
    { type: "rectangle", index: 1, hex_color: "#3b82f6", position: { x: 0, y: 0 }, dimensions: { width: 1200, height: 6 }, opacity: 100 },
    { type: "text", index: 2, text: "React Conf 2026", font_name: "Inter", font_weight: "Bold", font_size_in_px: 56, text_color: "#ffffff", text_align: "left", position: { x: 60, y: 160 }, dimensions: { width: 800, height: 80 }, opacity: 100 },
    { type: "text", index: 3, text: "Building Resilient UIs with Server Components", font_name: "Inter", font_weight: "Regular", font_size_in_px: 32, text_color: "#94a3b8", text_align: "left", position: { x: 60, y: 260 }, dimensions: { width: 800, height: 120 }, is_splitting_lines: true, opacity: 100 },
    { type: "text", index: 4, text: "March 15, 2026  /  San Francisco", font_name: "Inter", font_weight: "Regular", font_size_in_px: 22, text_color: "#64748b", text_align: "left", position: { x: 60, y: 540 }, dimensions: { width: 600, height: 40 }, opacity: 100 },
    { type: "static-image", index: 5, file: { type: "url", name: "speaker.jpg", url: "https://example.com/photos/speaker.jpg" }, position: { x: 880, y: 160 }, dimensions: { width: 260, height: 260 }, should_use_smart_cropping: true, opacity: 100 },
  ],
});

Five layers, one request, one deterministic image. No browser needed.

Get Started

Check the docs for the full layer reference, output formats (PNG, JPEG, WebP, AVIF), and code examples.

Sign up at iterationlayer.com for a free API key — no credit card required. Take your most complex HTML-to-image template, translate it to layers, and see how it compares.

Ingest

Transform

Generate

Categories

Featured

Overview

APIs

Integrations