Back to Projects
aiFeatured Project

Accessible Document AI

Making documents accessible to everyone through AI-powered processing

Tech Stack

PythonFastAPIGoogle Vision OCRGoogle GeminiAzure OpenAI EmbeddingsMongoDBCosmos DB Vector Search

45+

Pages/min

94%

Accuracy

<2s

Latency

Problem Statement

Traditional document processing ignores accessibility, leaving visually impaired users unable to access crucial information embedded in images, charts, and complex layouts.

Overview

An intelligent document processing pipeline that combines Google Vision OCR with Gemini Flash to extract text and generate accessible descriptions for images, charts, and diagrams. The system creates embeddings for semantic search and QA capabilities.

My Role & Contributions

Full Stack Developer & ML Engineer - Designed the architecture, implemented the OCR pipeline, integrated LLM for descriptions, and built the vector search system.

Tech Stack

PythonFastAPIGoogle Vision OCRGoogle GeminiAzure OpenAI EmbeddingsMongoDBCosmos DB Vector Search

Challenges & Solutions

1

Challenge

Scaling OCR processing to handle 100+ concurrent document uploads without GPU bottlenecks

Solution

Built distributed processing with Celery workers, Redis queue, and rate-limited batch API calls with exponential backoff

2

Challenge

Implementing reliable vector search with 94% accuracy across 100K+ document chunks with semantic drift

Solution

Designed hybrid search combining dense embeddings (Azure OpenAI) with BM25 for precision-recall balance

3

Challenge

Architecting fault-tolerant async pipeline with retry logic and dead-letter queues for API rate limits

Solution

Implemented circuit breaker pattern with Cosmos DB change feed for real-time indexing and failure recovery