Local ChatGPT
Self-hosted ChatGPT with Ollama and multi-model support
Tech Stack
<3s
Response
100%
Privacy
1
Stars
Problem Statement
Organizations need AI chat capabilities but cannot use cloud services due to data privacy regulations, security concerns, and prohibitive API costs for high-volume usage.
Overview
A privacy-focused, self-hosted chat interface powered by local LLMs through Ollama. Provides ChatGPT-like conversational AI without sending data to external servers, supporting multiple open-source models like Llama and Mistral with zero per-query costs.
My Role & Contributions
Full Stack Developer - Built the chat interface with WebSocket real-time communication, integrated Ollama backend, implemented conversation memory management, built async processing for responsiveness.
Tech Stack
Challenges & Solutions
Challenge
Managing conversation context with Llama's 4096 token limit across multi-turn dialogues while maintaining coherence and relevance
Solution
Implemented sliding window with intelligent summarization: compress old messages with distilbart, maintain last 10 turns + summary as context (avg 2048 tokens)
Challenge
Achieving response quality and speed comparable to cloud LLMs (GPT-3.5) using quantized open-source models on consumer hardware
Solution
Benchmarked 5 models (Llama 3.2, Mistral 7B, Mixtral 8x7B); used model routing based on query complexity; enabled KV cache and batched inference with Ollama
Challenge
Handling 50+ concurrent WebSocket connections with real-time streaming without blocking or degrading response time
Solution
Built async FastAPI with background task queue (Redis + Celery), token streaming via WebSocket with backpressure handling, and connection pooling (max 100)