Legal Documents Are Structured, but Not for Machines
A contract has clear sections — parties, effective dates, terms, clauses, signatures. A human reads it and immediately finds the sections. A machine sees a wall of text with no reliable markers.
Legal tech teams spend months building document extraction tools that work for one contract format and break on the next. The clause numbering changes, the party definitions move, the amendment section appears in a different place. Template-based approaches don’t survive first contact with a law firm that formats things differently.
The Document Extraction API uses schema-based extraction instead. You define the fields you want — parties, dates, clauses, terms — and the parser finds them regardless of formatting. No templates to maintain, no layout rules to configure.
Schema for Contract Extraction
import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });
const { data } = await client.extract({
files: [
{ type: "url", name: "contract.pdf", url: "https://example.com/contract.pdf" }
],
schema: {
fields: [
{
name: "contract_title",
type: "TEXT",
description: "Title or name of the contract",
is_required: true,
},
{
name: "effective_date",
type: "DATE",
description: "Date the contract takes effect",
is_required: true,
},
{
name: "expiration_date",
type: "DATE",
description: "Date the contract expires or terminates",
},
{
name: "parties",
type: "ARRAY",
description: "Organizations or individuals entering the contract",
item_schema: {
fields: [
{ name: "name", type: "TEXT", description: "Full legal name of the party" },
{ name: "role", type: "ENUM", description: "Role in the contract", values: ["buyer", "seller", "licensor", "licensee", "employer", "employee", "landlord", "tenant", "service_provider", "client", "other"] },
{ name: "address", type: "ADDRESS", description: "Registered address of the party" },
],
},
},
{
name: "clauses",
type: "ARRAY",
description: "Key contractual clauses and their summaries",
item_schema: {
fields: [
{ name: "clause_number", type: "TEXT", description: "Section or clause number" },
{ name: "title", type: "TEXT", description: "Clause heading or title" },
{ name: "summary", type: "TEXTAREA", description: "Brief summary of the clause content" },
],
},
},
{
name: "governing_law",
type: "TEXT",
description: "Jurisdiction or governing law specified in the contract",
},
{
name: "total_contract_value",
type: "CURRENCY_AMOUNT",
description: "Total monetary value or consideration of the contract",
},
{
name: "currency",
type: "CURRENCY_CODE",
description: "Currency of the contract value",
},
{
name: "has_non_compete",
type: "BOOLEAN",
description: "Whether the contract includes a non-compete clause",
},
{
name: "has_confidentiality",
type: "BOOLEAN",
description: "Whether the contract includes a confidentiality or NDA clause",
},
{
name: "termination_notice_period",
type: "TEXT",
description: "Required notice period for termination (e.g., '30 days', '90 days')",
},
],
},
});
The Response: Structured Contract Data
{
"success": true,
"data": {
"contractTitle": {
"type": "TEXT",
"value": "Master Services Agreement",
"confidence": 0.97
},
"effectiveDate": {
"type": "DATE",
"value": "2026-01-01",
"confidence": 0.96
},
"parties": {
"type": "ARRAY",
"value": [
[
{ "value": "Northwind Technologies Inc.", "confidence": 0.95 },
{ "value": ["client"], "confidence": 0.92 },
{ "value": { "street": "200 Park Avenue", "city": "New York", "region": "NY", "postal_code": "10166", "country": "US" }, "confidence": 0.93 }
],
[
{ "value": "Alpine Consulting GmbH", "confidence": 0.96 },
{ "value": ["service_provider"], "confidence": 0.94 },
{ "value": { "street": "Bahnhofstrasse 21", "city": "Zurich", "postal_code": "8001", "country": "CH" }, "confidence": 0.91 }
]
],
"confidence": 0.93
},
"clauses": {
"type": "ARRAY",
"value": [
[
{ "value": "3.1", "confidence": 0.94 },
{ "value": "Scope of Services", "confidence": 0.96 },
{ "value": "Consultant shall provide software development and technical advisory services as detailed in Exhibit A.", "confidence": 0.89 }
],
[
{ "value": "7.2", "confidence": 0.93 },
{ "value": "Limitation of Liability", "confidence": 0.95 },
{ "value": "Neither party's aggregate liability shall exceed the total fees paid under this agreement in the preceding 12 months.", "confidence": 0.87 }
]
],
"confidence": 0.91
},
"hasNonCompete": {
"type": "BOOLEAN",
"value": true,
"confidence": 0.88
},
"hasConfidentiality": {
"type": "BOOLEAN",
"value": true,
"confidence": 0.95
}
}
}
Why Schema-Based Extraction Works for Contracts
Contracts vary wildly. An employment agreement looks nothing like a SaaS license. A real estate lease has different sections than a vendor services agreement. Template-based parsers need a template for each — and they break when a law firm uses their own formatting.
Schema-based extraction adapts to the document. You describe what you want (“parties with names, roles, and addresses”) and the parser figures out where that information lives in each specific document. It works because the parser understands the content, not just the layout.
The same schema handles:
- Different section numbering styles (1.1, I.A, Article 1)
- Parties defined in a preamble, a definitions section, or inline
- Clauses organized by topic, by article, or by schedule
- Documents in multiple formats (PDF, DOCX, scanned images)
Boolean Fields for Quick Contract Screening
The BOOLEAN field type is useful for contract review workflows. Define yes/no questions about the contract:
- Does it include a non-compete clause?
- Is there an auto-renewal provision?
- Does it include an indemnification section?
- Is there an arbitration clause?
The parser reads the full document and answers each question with a boolean value and a confidence score. This lets you screen large volumes of contracts without reading every page.
ADDRESS Fields Decompose Automatically
Party addresses come back decomposed into components — street, city, region, postal code, country. No address parsing library needed. The country field returns an ISO 3166-1 alpha-2 code, ready for your database.
ENUM Fields for Clause Classification
The ENUM field type goes beyond simple extraction. Use it to classify contracts and clauses into predefined categories. The role field in the parties array above is one example. You can apply the same approach to classify the contract itself:
{
name: "contract_type",
type: "ENUM",
description: "Type of legal agreement",
values: [
"master_services_agreement",
"employment_agreement",
"nda",
"license_agreement",
"lease",
"purchase_agreement",
"partnership_agreement",
"other"
],
}
The parser reads the full document and selects the best match from your predefined values. This is useful for routing contracts to the right review team or applying different extraction schemas based on contract type.
You can also add ENUM fields inside the clauses ARRAY to categorize each clause:
{
name: "clause_type",
type: "ENUM",
description: "Category of the clause",
values: ["liability", "termination", "payment", "confidentiality", "ip_ownership", "indemnification", "force_majeure", "dispute_resolution", "other"],
}
This gives you a machine-readable index of clause types across all your contracts — useful for compliance audits, risk assessments, or quickly finding every indemnification clause in your contract library.
Multi-File Contracts
Real contracts rarely come as a single document. A master agreement references exhibits. An amendment modifies specific sections. A statement of work adds scope. The parser handles multi-file contracts by accepting up to 20 files in a single request.
Send the main agreement, its amendments, and attached exhibits together. The parser applies the same schema to all files and returns results for each. Your code can then merge the results — the amendment’s effective date overrides the original, the exhibit’s scope of work supplements the main agreement’s clause summaries.
Confidence-Based Review Workflows
Contract data extraction is high-stakes. A wrong effective date or a missed termination clause can be expensive. The confidence scores let you build appropriate review workflows:
- High confidence fields (above 0.92) — auto-populate your contract management system
- Medium confidence fields (0.75 to 0.92) — pre-fill but flag for legal review
- Low confidence fields (below 0.75) — require manual verification
This is the difference between a tool lawyers ignore and a tool lawyers trust.
What’s Next
Extracted clauses feed directly into Document Generation for contract summary PDFs — same auth, same credit pool.
Get Started
Check the docs for the complete field type reference and multi-file extraction documentation. The TypeScript and Python SDKs are available for server-side integration.
Sign up for a free account — no credit card required. Start with one contract and see how the schema-based approach handles your specific document formats.