Offline AI Voice Assistant
Privacy-first voice assistant with complete offline operation
Tech Stack
3
Stars
<2s
Response
100%
Privacy
Problem Statement
Cloud-based voice assistants require constant internet connectivity, raise privacy concerns by sending audio to external servers, and are inaccessible in secure or remote environments without internet access.
Overview
A fully offline, CPU-optimized voice assistant that provides complete privacy without compromising functionality. Built to run on standard laptops without GPU requirements, making AI accessible for secure, remote, or resource-constrained environments with end-to-end voice-to-voice interaction.
My Role & Contributions
Full Stack Developer - Designed the complete voice pipeline, optimized CPU-based STT with faster-whisper, integrated Ollama with Llama 3.2, implemented multi-engine TTS, built asynchronous audio processing.
Tech Stack
Challenges & Solutions
Challenge
Achieving <2s end-to-end latency (STT + LLM + TTS) on CPU-only hardware with 8GB RAM and no GPU acceleration
Solution
Leveraged faster-whisper with CTranslate2 backend (4x faster than vanilla), int8 quantization; Ollama with GGUF Q4_K_M quantization for Llama 3.2
Challenge
Optimizing memory footprint to run Whisper (1.5GB) + Llama 3.2-3B quantized (2.1GB) + TTS models concurrently without OOM crashes
Solution
Implemented streaming pipeline: VAD triggers async STT, token streaming from LLM to TTS (sentence-by-sentence) for perceived <2s latency; aggressive GC tuning
Challenge
Building robust VAD (Voice Activity Detection) to handle background noise, overlapping speech, and silence detection edge cases
Solution
Built custom WebRTC VAD with energy threshold adaptive to ambient noise, 300ms speech buffer, and 500ms silence timeout with false trigger suppression