Extract IBANs, Currencies, and Addresses from Financial Documents — Validated, Not Just Strings

8 min read Document Extraction

Regex Doesn’t Understand Financial Data

You need to extract an IBAN from a bank statement. So you write a regex. It matches something that looks like an IBAN — starts with two letters, followed by two digits, followed by up to 30 alphanumeric characters. The pattern matches. You store the result.

Except the regex can’t validate the IBAN. It doesn’t check the country-specific length. It doesn’t verify the check digits. It matches “DE89 3704 0044 0532 0130 00” and “DE00 0000 0000 0000 0000 00” with equal confidence. One is a real IBAN, the other is garbage.

Currency amounts are worse. Is “1.234,56” one thousand two hundred thirty-four euros and fifty-six cents? Or is it one point two three four with some trailing nonsense? Depends on the locale. A regex can’t know.

The Document Extraction API has purpose-built field types for financial data — IBAN, CURRENCY_AMOUNT, CURRENCY_CODE, and ADDRESS. They don’t just extract text. They validate, normalize, and structure the results.

IBAN Extraction with Validation

The IBAN field type extracts International Bank Account Numbers and validates them:

const schema = {
  fields: [
    {
      name: "beneficiary_iban",
      type: "IBAN",
      description: "IBAN of the payment recipient",
      is_required: true,
    },
    {
      name: "beneficiary_name",
      type: "TEXT",
      description: "Name of the payment recipient",
    },
  ],
};

The response returns a validated IBAN string:

{
  "beneficiaryIban": {
    "type": "IBAN",
    "value": "DE89370400440532013000",
    "confidence": 0.96
  }
}

This isn’t a regex match. The parser identifies the IBAN in the document, validates its format against the country-specific rules, and returns it in the standard format. If the document contains something that looks like an IBAN but doesn’t validate, the confidence score reflects that.

IBAN Validation Edge Cases

IBAN formats vary by country. A German IBAN is 22 characters. A Norwegian IBAN is 15. A Maltese IBAN is 31. A regex that accepts “2 letters + 2 digits + up to 30 alphanumeric characters” matches all of them — and also matches strings that aren’t IBANs at all.

The IBAN field type validates several things that a regex cannot. It checks the country-specific length (DE must be exactly 22 characters, not 20 or 24). It verifies the check digits using the MOD-97 algorithm defined in ISO 13616. It ensures the BBAN (Basic Bank Account Number) portion follows the country’s structure — for Germany, that’s an 8-digit bank code followed by a 10-digit account number.

Documents often contain multiple strings that look like IBANs. Account statements might list the account holder’s IBAN, the sender’s IBAN, a reference number that happens to start with two letters, and a transaction ID with similar formatting. The field description helps the parser identify which one you want. “IBAN of the payment recipient” is more specific than “IBAN” — and that specificity matters when a document has three valid IBANs on the same page.

Currency Amounts Across Locales

The CURRENCY_AMOUNT field type handles the formatting chaos of international financial documents:

  • “$1,234.56” — US format with comma thousands separator and period decimal
  • “1.234,56 €” — European format with period thousands separator and comma decimal
  • “CHF 1’234.56” — Swiss format with apostrophe thousands separator
  • “¥123,456” — no decimal places

A regex parser needs a different pattern for each locale. The CURRENCY_AMOUNT field type handles all of them and returns a normalized numeric value:

const schema = {
  fields: [
    {
      name: "invoice_total",
      type: "CURRENCY_AMOUNT",
      description: "Total invoice amount",
      is_required: true,
    },
    {
      name: "currency",
      type: "CURRENCY_CODE",
      description: "Currency of the invoice (ISO 4217 code)",
    },
  ],
};
{
  "invoiceTotal": {
    "type": "CURRENCY_AMOUNT",
    "value": 1234.56,
    "confidence": 0.95
  },
  "currency": {
    "type": "CURRENCY_CODE",
    "value": "EUR",
    "confidence": 0.94
  }
}

The amount comes back as a number — not a string with locale-specific formatting. The currency comes back as an ISO 4217 code — not a symbol that could mean multiple currencies ($ is used by USD, CAD, AUD, and dozens of others).

Currency Disambiguation

Currency symbols are ambiguous. The $ sign is used by the US dollar, Canadian dollar, Australian dollar, Hong Kong dollar, Singapore dollar, and at least 20 other currencies. The kr symbol could be Swedish krona, Norwegian krone, or Danish krone. Even “FR” on a document could mean French francs (obsolete) or something else entirely.

The CURRENCY_CODE field type returns an ISO 4217 three-letter code — USD, CAD, AUD, SEK, NOK. No ambiguity. The parser uses context from the document to determine the correct currency: the issuing bank’s country, the document language, other addresses on the page. A Swiss bank statement showing “Fr. 1’234.56” returns CHF, not some generic “franc” designation.

When a document uses multiple currencies — a foreign exchange confirmation, for example — define separate CURRENCY_CODE fields for each. “Source currency of the exchange” and “target currency of the exchange” give the parser enough context to distinguish them.

ADDRESS Decomposition

Financial documents are full of addresses — billing addresses, beneficiary addresses, registered office addresses. The ADDRESS field type doesn’t just extract the address as a text blob. It decomposes it into structured components:

const schema = {
  fields: [
    {
      name: "billing_address",
      type: "ADDRESS",
      description: "Billing address of the customer",
    },
  ],
};
{
  "billingAddress": {
    "type": "ADDRESS",
    "value": {
      "street": "Kurfürstendamm 194",
      "city": "Berlin",
      "region": "Berlin",
      "postal_code": "10707",
      "country": "DE"
    },
    "confidence": 0.93
  }
}

The country is an ISO 3166-1 alpha-2 code. The components are split out and ready for your database — no address parsing library needed.

International Address Formats

Addresses are surprisingly hard to decompose. A US address has a street, city, state, and ZIP code in a predictable order. A Japanese address starts with the prefecture and works down to the building number — the opposite of Western conventions. A UK address might include a county, or it might not. German addresses put the postal code before the city. Brazilian addresses include a neighborhood (bairro) as a standard component.

The ADDRESS field type handles these variations and normalizes the output into consistent components: street, city, region, postal_code, and country. The region field maps to whatever the country calls its primary subdivision — state in the US, Bundesland in Germany, prefecture in Japan, province in Canada.

Not every address has every component. A rural address might not have a street name. A city-state like Singapore might not have a region. The parser returns the components it can identify and omits the rest. Your code should handle optional fields rather than expecting every component to be present.

A Complete Financial Document Schema

Here’s a schema for a bank transfer confirmation that uses all the financial field types:

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({ apiKey: "YOUR_API_KEY" });

const result = await client.extract({
  files: [
    { type: "url", name: "transfer.pdf", url: "https://example.com/transfer.pdf" }
  ],
  schema: {
    fields: [
      { name: "transaction_date", type: "DATE", description: "Date of the transfer" },
      { name: "sender_name", type: "TEXT", description: "Name of the sending party" },
      { name: "sender_iban", type: "IBAN", description: "IBAN of the sender" },
      { name: "sender_address", type: "ADDRESS", description: "Address of the sender" },
      { name: "recipient_name", type: "TEXT", description: "Name of the receiving party" },
      { name: "recipient_iban", type: "IBAN", description: "IBAN of the recipient" },
      { name: "recipient_address", type: "ADDRESS", description: "Address of the recipient" },
      { name: "transfer_amount", type: "CURRENCY_AMOUNT", description: "Amount transferred" },
      { name: "currency", type: "CURRENCY_CODE", description: "Currency of the transfer" },
      { name: "reference", type: "TEXT", description: "Payment reference or description" },
    ],
  },
});

One API call. Two validated IBANs. Two decomposed addresses. A normalized currency amount with its ISO code. All with confidence scores.

Why Purpose-Built Types Matter for Fintech

Generic text extraction followed by post-processing is fragile. Every post-processing step — IBAN validation, currency parsing, address decomposition — is a point of failure. And each step needs its own test suite, error handling, and locale awareness.

Purpose-built field types push this complexity into the extraction layer, where it’s handled once and consistently. Your fintech application receives clean, validated, structured data. The IBAN is valid or it’s flagged. The currency amount is a number or it’s flagged. The address is decomposed or it’s flagged.

No regex library for IBANs. No locale-aware currency parser. No address normalization service. One API with types designed for financial data.

Combining Financial Types with CALCULATED Fields

Financial documents often contain values that should add up. An invoice has a subtotal, a tax amount, and a total. A bank statement has an opening balance, transactions, and a closing balance. Purpose-built types and CALCULATED fields work together to verify these relationships.

const schema = {
  fields: [
    { name: "subtotal", type: "CURRENCY_AMOUNT", description: "Invoice subtotal before tax", is_required: true },
    { name: "vat_amount", type: "CURRENCY_AMOUNT", description: "VAT amount", is_required: true },
    { name: "printed_total", type: "CURRENCY_AMOUNT", description: "Total as printed on the invoice", is_required: true },
    { name: "computed_total", type: "CALCULATED", description: "Subtotal plus VAT", operation: "sum", source_field_names: ["subtotal", "vat_amount"] },
    { name: "currency", type: "CURRENCY_CODE", description: "Invoice currency" },
    { name: "supplier_iban", type: "IBAN", description: "IBAN of the invoice issuer" },
    { name: "supplier_address", type: "ADDRESS", description: "Address of the invoice issuer" },
  ],
};

One API call gives you validated financial identifiers, normalized currency amounts, a structured address, and a computed cross-check on the totals. The alternative is five separate libraries — an IBAN validator, a currency parser, an address decomposer, a math verifier, and the extraction logic itself.

What’s Next

Extracted financial data flows directly into Document Generation for formatted reports — same auth, same credit pool.

Get Started

Check the docs for all 17 field types, including the financial types covered here. The TypeScript and Python SDKs return typed responses, so your editor knows the shape of every field result.

Sign up for a free account — no credit card required. Try extracting financial data from your own documents and compare the structured output to what your current regex or template approach produces.

Start building in minutes

Free trial included. No credit card required.