Skip to main content
Finance

KYC Document Verification Pipeline

This case study describes a real engagement. Client identity, proprietary details, and specific metrics are anonymized or approximated under NDA.

91%Auto-Verification Rate
6 minAvg Processing Time
99.2%Extraction Accuracy
The Challenge

What needed
solving

Manual KYC verification taking 48+ hours per application. Compliance team reviewing 200+ documents daily with no structured tooling — document classification, data extraction, and sanctions screening were all done manually in sequence.

Identity document processing is technically demanding because the documents span multiple countries with different layouts, security features, fonts, and languages. OCR accuracy on passports with machine-readable zones (MRZ) is high, but accuracy on handwritten national ID cards and poorly scanned proof-of-address documents is significantly lower. The extraction pipeline needed per-document-type handling rather than a single general approach. Sanctions screening required handling name transliterations, common spelling variants, and partial name matches — fuzzy matching that is both too loose (false positives that waste human review time) and too tight (false negatives that create compliance exposure) if not tuned carefully. Regulatory requirements meant the pipeline had to produce a full audit trail for every decision, including the specific extracted values, confidence scores, and screening match results that led to either auto-verification or human review routing.

Approach

How we
built it

  1. 01

    Mapped the full manual KYC workflow — document classification, data extraction, sanctions screening, and human review triggers — and designed the automated system to mirror the compliance decision tree rather than replace it.

  2. 02

    Built a multi-document extraction pipeline handling the full range of identity documents submitted by customers: national IDs, passports, utility bills, and bank statements across multiple formats and jurisdictions.

  3. 03

    Integrated real-time sanctions and PEP screening via third-party API into the extraction pipeline, rather than as a separate downstream step, to reduce total processing latency.

  4. 04

    Designed the human review queue to contain only genuinely ambiguous cases — documents with low extraction confidence or positive screening hits — rather than routing all cases through manual review by default.

This engagement automated the intake and pre-screening stages of a KYC verification workflow, reducing the manual workload per application from approximately 45 minutes to under 6 minutes of human review time. The system processes identity documents (passports, national IDs, driver's licenses), proof of address documents, and source-of-funds documentation across 12 document types and 8 countries of issue. Sanctions screening runs against three lists simultaneously (OFAC, UN, and a proprietary watchlist) on every extracted name. The pipeline is designed with a hard-fail on low-confidence extractions: when extraction confidence falls below threshold, the application is routed to the human review queue with the low-confidence fields flagged rather than passed through with potentially incorrect data.

Solution

What we
delivered

OCR and NLP pipeline for document classification, structured data extraction, and automated compliance checks including sanctions list screening. Human review retained for edge cases and final approval; the system handles the mechanical extraction and screening layer.

Results

Measurable
outcomes

  • 91% of standard KYC applications now complete automated verification without human review, processing in an average of 6 minutes.
  • Data extraction accuracy reached 99.2% on the primary document types, meeting the compliance team's threshold for automated routing.
  • Compliance team headcount was reallocated from routine document review to the genuinely complex cases that require human judgement — sanctions investigations, politically exposed persons, and document fraud indicators.
Tech Stack
PythonTesseract OCRAnthropic ClaudePostgreSQLFastAPIDocker
Timeline
12 weeks
Team Size
3 engineers

The 48-hour manual KYC process was a genuine bottleneck for onboarding. Getting it to 6 minutes for the majority of standard cases changed the conversation with prospective clients about time-to-account.

Chief Compliance Officer, Digital Lending Platform

Ready to build
something like this?

Tell us what you are building. We will scope it, price it honestly, and give you a clear plan.

Start a Conversation

Free 30-minute scoping call. No obligation.