The Problem with HTML-to-Image
You need to generate images programmatically. Social cards, certificates, receipts, event tickets, promotional graphics. The common advice is: build it as HTML, screenshot it with a headless browser.
The tools are familiar. Puppeteer launches headless Chrome. Playwright does the same across browsers. wkhtmltoimage uses WebKit. They all follow the same pattern — render HTML, capture the viewport, save as an image.
And they all share the same problems.
Headless Browsers Are Heavy
Chrome needs memory. A single Puppeteer instance consumes 200-500 MB of RAM before it renders a single pixel. Spin up multiple instances for concurrency and you’re looking at gigabytes of memory for what should be a simple image generation task.
Cold starts hurt too. Launching a browser instance takes 1-3 seconds. If you’re running in a serverless environment, every invocation pays that cost. AWS Lambda with a Chrome layer is technically possible but practically painful — the binary alone eats half your deployment package size limit.
CSS Rendering Is Unpredictable
Headless Chrome renders CSS differently than Chrome with a display. Font rendering varies between environments. Flexbox and Grid layouts can shift by a pixel or two depending on the platform. Custom fonts that load fine locally fail to load in headless mode because the timing is wrong.
You end up with a test suite for your image templates. Not testing logic — testing whether the CSS renders the same way on your CI server as it does on your laptop. That’s not a problem you should have.
The Alternative: Layer-Based Composition
Instead of writing HTML and hoping a browser renders it correctly, describe the image as a stack of layers. That’s what the Image Generation API does.
No HTML. No CSS. No browser. You send a JSON request describing your layers — backgrounds, shapes, text, images — and get an image back.
Here’s what an HTML-to-image approach looks like versus the layer-based approach for the same output — a dark card with a title and accent bar:
The HTML approach:
<div style="width: 1200px; height: 630px; background: #1a1a2e; position: relative; font-family: 'Inter', sans-serif;">
<div style="width: 100%; height: 4px; background: #e94560;"></div>
<h1 style="color: #ffffff; font-size: 48px; font-weight: bold; position: absolute; top: 180px; left: 60px; width: 1080px;">
Your Title Here
</h1>
</div>
Then you need Puppeteer to render it:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({ width: 1200, height: 630 });
await page.setContent(html);
await page.evaluateHandle("document.fonts.ready");
const screenshot = await page.screenshot({ type: "png" });
await browser.close();
The layer-based approach:
import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });
const result = await client.generateImage({
dimensions: { width: 1200, height: 630 },
output_format: "png",
fonts: [
{
name: "Inter",
weight: "Bold",
style: "normal",
file: {
type: "url",
name: "Inter-Bold.ttf",
url: "https://example.com/fonts/Inter-Bold.ttf",
},
},
],
layers: [
{
type: "solid-color-background",
index: 0,
hex_color: "#1a1a2e",
opacity: 100,
},
{
type: "rectangle",
index: 1,
hex_color: "#e94560",
position: { x: 0, y: 0 },
dimensions: { width: 1200, height: 4 },
opacity: 100,
},
{
type: "text",
index: 2,
text: "Your Title Here",
font_name: "Inter",
font_weight: "Bold",
font_size_in_px: 48,
text_color: "#ffffff",
text_align: "left",
position: { x: 60, y: 180 },
dimensions: { width: 1080, height: 300 },
is_splitting_lines: true,
opacity: 100,
},
],
});
const { data: { buffer: imageBase64 } } = result;
const imageBuffer = Buffer.from(imageBase64, "base64");
Same visual output. No browser, no DOM, no CSS, no font-loading race conditions.
The Tradeoffs — Honestly
The layer-based approach is not a drop-in replacement for every HTML-to-image use case. Here’s where each approach fits:
Layer-based is better when:
- You need predictable, deterministic output
- Your templates are structured (cards, banners, certificates, social images)
- You want to avoid browser infrastructure entirely
- You’re generating images at scale and need consistent performance
- You need the output to be identical across environments
HTML-to-image is better when:
- Your template requires complex CSS layouts that would be tedious to express as layers
- You’re rendering existing web pages as images (actual screenshots)
- You need browser-specific rendering (SVG with CSS animations, complex gradients)
For the vast majority of programmatic image generation — OG cards, social templates, certificates, promotional graphics — the structured layer approach is simpler and faster.
Mapping HTML Concepts to Layers
If you’re used to thinking in HTML/CSS, here’s how common patterns translate:
| HTML/CSS | Layer equivalent |
|---|---|
<div> with background-color |
solid-color-background or rectangle layer |
<h1> or <p> with styles |
text layer with font_name, font_size_in_px, text_color |
<img src="..."> |
static-image layer with position and dimensions |
border-top: 4px solid red |
rectangle layer positioned at y: 0, height: 4 |
background-image: url(...) with opacity |
image-overlay layer with opacity control |
font-weight: bold inside text |
Markdown **bold** in the text layer content |
font-style: italic inside text |
Markdown *italic* in the text layer content |
| Absolutely positioned elements |
Each layer has explicit position: { x, y } |
The mental model shifts from “box model with cascading styles” to “positioned layers with explicit properties.” Less flexible in theory, more predictable in practice.
Performance Comparison
Headless Chrome: 1-3 seconds per image, 200-500 MB per instance, cold starts in serverless.
The Image Generation API: an HTTP request. Your server sends JSON, receives JSON with a base64-encoded image. No browser to launch, no memory overhead beyond the HTTP client. Concurrency is limited by your HTTP connection pool, not by browser instances.
For batch generation — 100 blog post OG images, 500 event cards, 1000 certificates — the difference is meaningful. A headless browser pipeline needs careful concurrency management to avoid memory exhaustion. The API handles it with standard HTTP request parallelism.
Text Handling
One of the biggest pain points with HTML-to-image is text. Fonts fail to load. Line heights vary. Text overflows its container because the headless browser calculates metrics differently than your local machine.
Text layers in the API have explicit properties:
-
font_size_in_px— exact pixel size, no CSS cascade or rem calculations -
is_splitting_lines— auto-wraps text within the bounding box dimensions -
text_align— left, center, or right alignment -
vertical_align— top, center, or bottom within the bounding box -
paragraph_spacing_in_px— explicit spacing between paragraphs
Fonts are provided as files in the request — TTF, OTF, WOFF, or WOFF2. No @font-face declarations, no CDN loading, no race conditions. The font is part of the request payload, so it’s always available when the text renders.
A Complete Example: Event Announcement Card
import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });
const result = await client.generateImage({
dimensions: { width: 1200, height: 630 },
output_format: "png",
fonts: [
{
name: "Inter",
weight: "Regular",
style: "normal",
file: { type: "url", name: "Inter-Regular.ttf", url: "https://example.com/fonts/Inter-Regular.ttf" },
},
{
name: "Inter",
weight: "Bold",
style: "normal",
file: { type: "url", name: "Inter-Bold.ttf", url: "https://example.com/fonts/Inter-Bold.ttf" },
},
],
layers: [
{ type: "solid-color-background", index: 0, hex_color: "#0f172a", opacity: 100 },
{ type: "rectangle", index: 1, hex_color: "#3b82f6", position: { x: 0, y: 0 }, dimensions: { width: 1200, height: 6 }, opacity: 100 },
{ type: "text", index: 2, text: "React Conf 2026", font_name: "Inter", font_weight: "Bold", font_size_in_px: 56, text_color: "#ffffff", text_align: "left", position: { x: 60, y: 160 }, dimensions: { width: 800, height: 80 }, opacity: 100 },
{ type: "text", index: 3, text: "Building Resilient UIs with Server Components", font_name: "Inter", font_weight: "Regular", font_size_in_px: 32, text_color: "#94a3b8", text_align: "left", position: { x: 60, y: 260 }, dimensions: { width: 800, height: 120 }, is_splitting_lines: true, opacity: 100 },
{ type: "text", index: 4, text: "March 15, 2026 / San Francisco", font_name: "Inter", font_weight: "Regular", font_size_in_px: 22, text_color: "#64748b", text_align: "left", position: { x: 60, y: 540 }, dimensions: { width: 600, height: 40 }, opacity: 100 },
{ type: "static-image", index: 5, file: { type: "url", name: "speaker.jpg", url: "https://example.com/photos/speaker.jpg" }, position: { x: 880, y: 160 }, dimensions: { width: 260, height: 260 }, should_use_smart_cropping: true, opacity: 100 },
],
});
Five layers, one request, one deterministic image. No browser needed.
Get Started
Check the docs for the full layer reference, output formats (PNG, JPEG, WebP, AVIF), and code examples.
Sign up at iterationlayer.com for a free API key — no credit card required. Take your most complex HTML-to-image template, translate it to layers, and see how it compares.