This case study describes a real engagement. Client identity, proprietary details, and specific metrics are anonymized or approximated under NDA.
Intelligent Document Routing for Government Services
Paper-based application processing with 15-day average turnaround from submission to department assignment. Documents misrouted 18% of the time, creating rework cycles. No digital record of document status or processing history.
OCR and classification pipeline that digitizes incoming applications, classifies them by document type and destination department, and routes them to the correct workflow queue automatically. Paper documents are digitized at intake; all downstream processing is digital.
This engagement digitized and automated the intake layer of a citizen services application processing operation. The system handles incoming applications in paper and scanned-image form, runs them through an OCR and classification pipeline, extracts applicant identifiers and application metadata, determines the correct destination department and priority level, and inserts them into the appropriate digital workflow queue. The destination departments retain their existing case management systems; the routing layer integrates via API adapters, with a MinIO object store holding the digitized document images and a PostgreSQL database holding the routing records and processing history. Average turnaround from document receipt to department queue entry is 3.2 days, down from 15 days.
The Challenge
Government application forms are among the most structurally varied documents in any domain. Applications span land registration, business licensing, social welfare, health certifications, and infrastructure permits — each with different form layouts, terminology, and processing requirements. Many incoming documents are third-party-submitted forms with handwritten sections, non-standard layouts, or poor scan quality. The misrouting problem was driven partly by human inconsistency and partly by ambiguous cases where the correct department assignment depended on fine-grained content analysis rather than document type alone. The integration constraint was significant: each of the 14 destination departments used a different case management system with a different API or file-drop intake mechanism, requiring 14 separate integration adapters. Audit trail requirements were stricter than commercial deployments: every routing decision had to be fully documented and reversible.
How We Built It
Document taxonomy and intake process mapping (Weeks 1–3): We catalogued 60+ application types across 14 departments, mapping each to its routing rules, required fields, and known edge cases that caused misrouting. We also audited 6 months of misrouted application records with department staff to understand the patterns — approximately 60% of misrouting occurred across just 8 application type pairs that were systematically confused. This taxonomy became the classification target specification and allowed us to prioritize OCR accuracy on the fields that drove routing decisions.
OCR pipeline and document digitization (Weeks 4–7): The digitization pipeline handles incoming batches via a scanning station upload, applying image pre-processing (deskew, contrast normalization, resolution upscaling, margin detection) before OCR. Tesseract handles printed text extraction; a Anthropic Claude Vision pass handles handwritten fields and low-quality scans that Tesseract misreads. Field extraction uses a form-boundary detection layer to segment structured forms into field regions before OCR, improving accuracy on structured documents compared to full-page OCR followed by extraction. Digitized document images are stored in MinIO; extracted field data and confidence scores are stored in PostgreSQL.
Classification and routing logic (Weeks 8–11): Document classification uses extracted field values — form type identifiers, department codes, service category keywords, and applicant type flags — to assign each application to the correct department and workflow queue. The classifier handles the 8 high-confusion application type pairs with specific disambiguation rules derived from the misrouting audit. Routing decisions include a confidence score; low-confidence classifications route to a human review queue with the ambiguous fields highlighted. All routing decisions are logged with the specific field values and classifier signals that drove the decision, making each decision fully auditable and reversible.
Department integrations and operations tooling (Weeks 12–14): We built 14 integration adapters, each translating the standardized routing record format into the target department's intake API or file-drop format. A central operations dashboard shows processing queue status, routing volumes by department, classification confidence distribution, and flagged items requiring human review. The dashboard also provides a correction interface for human review queue items, where operators see the document image alongside the extracted fields and can correct the routing with a single interaction. Corrections are logged to the training dataset for future model improvements.
What We Delivered
Average turnaround from document receipt to department queue entry dropped from 15 days to 3.2 days — a 79% reduction. The reduction is driven by the elimination of the manual document sorting and internal distribution process, which previously occupied a 6-person team working through physical paper batches. That team's work has shifted to managing the human review queue and handling applicant queries, rather than document handling.
Auto-classification rate reached 87% of incoming applications in the first month of operation, with 13% routed to the human review queue for confirmation. Misroute rate dropped from 18% to 2.1% — the remaining misroutes occur primarily on application types that were not in the original training taxonomy and have since been added through the correction feedback mechanism. Rework cycles caused by misrouting have nearly eliminated the backlog that had accumulated from the previous manual process.
The digitization pipeline has created a document record system that did not exist before the engagement. For the first time, department staff can query application processing history by applicant identifier, application type, and date range. This has reduced the time spent handling applicant status inquiries — which previously required manual search through physical records — to a database lookup.
Ready to build something like this?
Tell us what you are building. We will scope it, price it honestly, and give you a clear plan.
Start a ConversationFree 30-minute scoping call. No obligation.