Back to Projects
aiFeatured Project

Offline AI Voice Assistant

Privacy-first voice assistant with complete offline operation

Tech Stack

PythonFaster-WhisperOllama (Llama 3.2)Coqui TTSpyttsx3PortAudioFFmpeg

3

Stars

<2s

Response

100%

Privacy

Problem Statement

Cloud-based voice assistants require constant internet connectivity, raise privacy concerns by sending audio to external servers, and are inaccessible in secure or remote environments without internet access.

Overview

A fully offline, CPU-optimized voice assistant that provides complete privacy without compromising functionality. Built to run on standard laptops without GPU requirements, making AI accessible for secure, remote, or resource-constrained environments with end-to-end voice-to-voice interaction.

My Role & Contributions

Full Stack Developer - Designed the complete voice pipeline, optimized CPU-based STT with faster-whisper, integrated Ollama with Llama 3.2, implemented multi-engine TTS, built asynchronous audio processing.

Tech Stack

PythonFaster-WhisperOllama (Llama 3.2)Coqui TTSpyttsx3PortAudioFFmpeg

Challenges & Solutions

1

Challenge

Achieving <2s end-to-end latency (STT + LLM + TTS) on CPU-only hardware with 8GB RAM and no GPU acceleration

Solution

Leveraged faster-whisper with CTranslate2 backend (4x faster than vanilla), int8 quantization; Ollama with GGUF Q4_K_M quantization for Llama 3.2

2

Challenge

Optimizing memory footprint to run Whisper (1.5GB) + Llama 3.2-3B quantized (2.1GB) + TTS models concurrently without OOM crashes

Solution

Implemented streaming pipeline: VAD triggers async STT, token streaming from LLM to TTS (sentence-by-sentence) for perceived <2s latency; aggressive GC tuning

3

Challenge

Building robust VAD (Voice Activity Detection) to handle background noise, overlapping speech, and silence detection edge cases

Solution

Built custom WebRTC VAD with energy threshold adaptive to ambient noise, 300ms speech buffer, and 500ms silence timeout with false trigger suppression