Accessible Document AI
Making documents accessible to everyone through AI-powered processing
Tech Stack
45+
Pages/min
94%
Accuracy
<2s
Latency
Problem Statement
Traditional document processing ignores accessibility, leaving visually impaired users unable to access crucial information embedded in images, charts, and complex layouts.
Overview
An intelligent document processing pipeline that combines Google Vision OCR with Gemini Flash to extract text and generate accessible descriptions for images, charts, and diagrams. The system creates embeddings for semantic search and QA capabilities.
My Role & Contributions
Full Stack Developer & ML Engineer - Designed the architecture, implemented the OCR pipeline, integrated LLM for descriptions, and built the vector search system.
Tech Stack
Challenges & Solutions
Challenge
Scaling OCR processing to handle 100+ concurrent document uploads without GPU bottlenecks
Solution
Built distributed processing with Celery workers, Redis queue, and rate-limited batch API calls with exponential backoff
Challenge
Implementing reliable vector search with 94% accuracy across 100K+ document chunks with semantic drift
Solution
Designed hybrid search combining dense embeddings (Azure OpenAI) with BM25 for precision-recall balance
Challenge
Architecting fault-tolerant async pipeline with retry logic and dead-letter queues for API rate limits
Solution
Implemented circuit breaker pattern with Cosmos DB change feed for real-time indexing and failure recovery