Iteration Layer
Products
Use Cases
Resources
Pricing
Website Extraction

Extract structured data from public websites

Send a public URL, define the fields you need, and get typed JSON with citations.

Zero data retention Made & hosted in the EU $65 free trial credits

See Website Extraction in action

Start from a real implementation pattern, not blank docs. See the input, runnable code, and structured output your workflow can use next.

Input Preview
https://www.linkedin.com/jobs/view/engineer-manager-people-innovations-labs-openai
Engineer Manager, People Innovations Labs — OpenAI
Output Preview

title

Engineer Manager, People Innovations Labs

company

OpenAI

location

San Francisco, CA

posted_ago

7 minutes ago

applicant_note

Be among the first 25 applicants

department

People Organization
Request
curl -X POST   https://api.iterationlayer.com/website-extraction/v1/extract   -H "Authorization: Bearer YOUR_API_KEY"   -H "Content-Type: application/json"   -d '{
    "file": {
      "type": "url",
      "url": "https://example.com/pricing"
    },
    "schema": {
    "fields": [
      {
        "name": "plan_name",
        "type": "TEXT",
        "description": "The name of the pricing plan"
      },
      {
        "name": "monthly_price",
        "type": "CURRENCY_AMOUNT",
        "description": "The advertised monthly price"
      },
      {
        "name": "features",
        "type": "ARRAY",
        "description": "Included plan features",
        "fields": [
          {
            "name": "feature",
            "type": "TEXT",
            "description": "Feature label"
          }
        ]
      }
    ]
  }
}'
Response
{
  "success": true,
  "data": {
    "plan_name": {
      "type": "TEXT",
      "value": "Startup",
      "confidence": 0.97,
      "citations": ["Startup — $119 per month"],
      "source": "pricing.html"
    },
    "monthly_price": {
      "type": "CURRENCY_AMOUNT",
      "value": 119,
      "confidence": 0.95,
      "citations": ["$119 per month"],
      "source": "pricing.html"
    }
  }
}
Request
import { IterationLayer } from "iterationlayer";

const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const result = await client.extractWebsite({
  file: {
    type: "url",
    url: "https://example.com/pricing",
  },
  schema: {
    fields: [
      { type: "TEXT", name: "plan_name", description: "The name of the pricing plan" },
      { type: "CURRENCY_AMOUNT", name: "monthly_price", description: "The advertised monthly price" },
      {
        type: "ARRAY",
        name: "features",
        description: "Included plan features",
        fields: [{ type: "TEXT", name: "feature", description: "Feature label" }],
      },
    ],
  },
});
Response
{
  "success": true,
  "data": {
    "plan_name": {
      "type": "TEXT",
      "value": "Startup",
      "confidence": 0.97,
      "citations": ["Startup — $119 per month"],
      "source": "pricing.html"
    },
    "monthly_price": {
      "type": "CURRENCY_AMOUNT",
      "value": 119,
      "confidence": 0.95,
      "citations": ["$119 per month"],
      "source": "pricing.html"
    }
  }
}
Request
from iterationlayer import IterationLayer

client = IterationLayer(api_key="YOUR_API_KEY")

result = client.extract_website(
    file={
        "type": "url",
        "url": "https://example.com/pricing",
    },
    schema={
        "fields": [
            {"type": "TEXT", "name": "plan_name", "description": "The name of the pricing plan"},
            {"type": "CURRENCY_AMOUNT", "name": "monthly_price", "description": "The advertised monthly price"},
            {
                "type": "ARRAY",
                "name": "features",
                "description": "Included plan features",
                "fields": [{"type": "TEXT", "name": "feature", "description": "Feature label"}],
            },
        ]
    },
)
Response
{
  "success": true,
  "data": {
    "plan_name": {
      "type": "TEXT",
      "value": "Startup",
      "confidence": 0.97,
      "citations": ["Startup — $119 per month"],
      "source": "pricing.html"
    },
    "monthly_price": {
      "type": "CURRENCY_AMOUNT",
      "value": 119,
      "confidence": 0.95,
      "citations": ["$119 per month"],
      "source": "pricing.html"
    }
  }
}
Request
import il "github.com/iterationlayer/sdk-go"

client := il.NewClient("YOUR_API_KEY")

result, err := client.ExtractWebsite(il.ExtractWebsiteRequest{
  File: il.NewWebsiteFromURL("https://example.com/pricing"),
  Schema: il.ExtractionSchema{
    "plan_name": il.NewTextFieldConfig("plan_name", "The name of the pricing plan"),
    "monthly_price": il.NewCurrencyAmountFieldConfig("monthly_price", "The advertised monthly price"),
    "features": il.NewArrayFieldConfig("features", "Included plan features", []il.FieldConfig{
      il.NewTextFieldConfig("feature", "Feature label"),
    }),
  },
})
Response
{
  "success": true,
  "data": {
    "plan_name": {
      "type": "TEXT",
      "value": "Startup",
      "confidence": 0.97,
      "citations": ["Startup — $119 per month"],
      "source": "pricing.html"
    },
    "monthly_price": {
      "type": "CURRENCY_AMOUNT",
      "value": 119,
      "confidence": 0.95,
      "citations": ["$119 per month"],
      "source": "pricing.html"
    }
  }
}

Use the same workflow from code, agents, or n8n

When an automation moves from prototype to production, you should not have to rebuild it for every environment. Iteration Layer lets scripts, agents, and n8n workflows call the same European AI workflow runtime.

Input 40+ file formats
Extraction Documents, websites, and markdown
Generation Documents, images, and sheets
Output Structured format

Fits into your existing stack

Native SDKs for TypeScript, Python, and Go. OpenAPI spec for everything else. MCP server for AI agents and Claude Code skills. n8n integration for visual workflows.

EU AI workflow runtime

Run document, image, and file steps through one EU-hosted workflow layer with shared API conventions and billing.

Agent-ready by design

Expose the same document and image actions to MCP tools and Claude Code skills, then reuse the API contract when workflows graduate into scripts or automations.

Verified n8n node

Install the verified Iteration Layer node in n8n, then route documents and generated files through the same provider from visual workflows.

Three steps to your first extraction

01

Define a schema

Describe the structured data you want returned using our schema format. Each field has a name, a type, and an optional description to guide extraction.

02

Send a URL

Pass a public URL. The API fetches the page directly by default. Optional fetch options control locale, user agent, timeout, and whether the page should be rendered in the browser.

03

Get structured data

Receive typed JSON with extracted fields, confidence scores, and source citations so you can automate downstream workflows and route uncertain results to review.


Intelligent Extraction

The API automatically selects the best extraction approach for your schema and page content, without stitching together scraping, prompting, and post-processing logic yourself.

Schema-Driven Results

Define typed fields — dates, IBANs, currencies, addresses, nested arrays — and get structured JSON back. No selectors, HTML parsing, or prompt post-processing required.

Deep Content Understanding

Pages aren't parsed as raw text patterns. The API understands product pages, pricing tables, article content, job posts, and embedded tables, then extracts field values from that meaning.

Built-In Trust Scores

Every extracted value includes a confidence score and a verbatim source citation from the page content. Use confidence as a workflow gate, citations as review context, and approved values before updating downstream records.

Public Website Fetching

Send a public URL. The API fetches pages directly first, falling back to browser-backed retrieval when you enable fetch options like JavaScript rendering, proxy, or geo-targeting.

No Model Training

Page content is never used to train or improve AI models. This is guaranteed for all plans — not gated behind an enterprise contract.

Real-world pipelines, ready to ship

Each recipe chains multiple APIs into a complete workflow. Pick one, tweak it, and deploy — or use it as a starting point for your own pipeline.

European by design

Your data is processed on EU-hosted infrastructure and never stored beyond temporary logs. Zero data retention, GDPR-compliant workflows, and a Data Processing Agreement are available for every customer. Learn more about our security practices .

EU-hosted core processing

Application and processing infrastructure runs in Europe, with provider-scope ISO 27001 and BSI C5 evidence documented for procurement reviews.

Zero data retention

Customer files and processing results are not stored after the request. Usage logs are retained for 90 days and automatically deleted.

Clear answers for security teams

Give reviewers the answers they need up front: where files are processed, what is retained, which subprocessors are involved, and how AI inputs, outputs, review gates, and audit records move through each workflow.

Pricing

Start usage-based. Switch to a subscription when your volume becomes predictable.

Pay as you go

Usage-based

$0.033 to $0.022 / credit

Graduated pricing. Your effective rate decreases automatically as monthly usage grows.

  • No monthly commitment
  • Pay only for credits used
  • Automatic volume discounts as usage grows
Subscriptions

Predictable volume

From $29.99 /month

Fixed recurring credit packs with lower effective credit prices for steady usage.

  • Lower effective per-credit prices
  • Fixed recurring credit packs
  • Predictable monthly budget
All APIs included Free trial credits per API Project-based budget caps Auto overage billing

Still evaluating?

Compare Iteration Layer against the biggest alternatives at a glance, then open the full head-to-head pages when you want the details.

Feature Iteration Layer Firecrawl Jina AI Reader Crawl4AI
Schema-defined extraction
Yes
Define fields in a schema and receive typed JSON results
Limited
Primarily returns page content, not schema-based fields
No
Returns page content as markdown, not schema-based fields
Configurable
You assemble extraction strategy from crawler output
Confidence scores
Per field
Confidence score between 0 and 1 for every extracted field
No
No per-field confidence scoring
No
No confidence scoring
No
No built-in confidence scoring
Source citations
Yes
Verbatim source citation from the page for every field
No
No source citations for extracted values
No
No source citations
No
No source citations for extracted values
EU hosting
EU only
All processing on EU-hosted servers
US-hosted
Primary infrastructure is US-based
EU endpoints available
Jina documents EU endpoints for some services
Your choice
Depends on where you deploy

Frequently asked questions

How accurate is the extraction quality?
Our OCR benchmark shows strong extraction accuracy, reliability, and performance across 41 real workflow files, including forms, invoices, scans, tables, charts, and photos.
Can I extract from private or login-protected pages?
No. Website Extraction is designed for public pages only. You remain responsible for site terms, robots rules, lawful basis, and personal-data handling. For broader compliance context, see how GDPR and the EU AI Act apply to automated document workflows.
When should I use Website Extraction instead of Document Extraction?
Use Website Extraction when you want structured fields from a single public web page. Use Document Extraction when you have uploaded files, multiple inputs, or mixed formats like PDFs and images in the same request.
What kind of pages can I send to Website Extraction?
Send one public HTTPS URL per request. HTTP URLs are not accepted. Website Extraction is meant for public pages, not login-protected content, private dashboards, or pages that require credentials.
How do fetch options work?
Fetch options give you control over how the page is retrieved. Set a locale to control the Accept-Language header, a timeout for slow pages, a custom user agent, or should_render_javascript: true for single-page apps. Static HTML fetching is the default.
Do I need CSS selectors or custom scraping rules?
No. You define the fields you want in a schema, and the API applies that schema to the fetched page content instead of requiring selectors for each page layout.
What does the response include for each extracted field?
Each field follows the same structure as Document Extraction: type, value, confidence, citations, and source. The response also includes the requested and final url, and rich page metadata for the fetched page.
Can I run Website Extraction asynchronously?
Yes. If you provide a webhook_url, the API returns 201 immediately and sends the result to your HTTPS webhook when processing finishes.

Build your first workflow in minutes

Chain our APIs into a workflow you can test with your own data. Free trial credits included.

Zero data retention Made & hosted in the EU $65 free trial credits