Back to Projects
aiFeatured Project

Local ChatGPT

Self-hosted ChatGPT with Ollama and multi-model support

Tech Stack

PythonFastAPIOllamaLlama/Mistral ModelsWebSocketsSQLite

<3s

Response

100%

Privacy

1

Stars

Problem Statement

Organizations need AI chat capabilities but cannot use cloud services due to data privacy regulations, security concerns, and prohibitive API costs for high-volume usage.

Overview

A privacy-focused, self-hosted chat interface powered by local LLMs through Ollama. Provides ChatGPT-like conversational AI without sending data to external servers, supporting multiple open-source models like Llama and Mistral with zero per-query costs.

My Role & Contributions

Full Stack Developer - Built the chat interface with WebSocket real-time communication, integrated Ollama backend, implemented conversation memory management, built async processing for responsiveness.

Tech Stack

PythonFastAPIOllamaLlama/Mistral ModelsWebSocketsSQLite

Challenges & Solutions

1

Challenge

Managing conversation context with Llama's 4096 token limit across multi-turn dialogues while maintaining coherence and relevance

Solution

Implemented sliding window with intelligent summarization: compress old messages with distilbart, maintain last 10 turns + summary as context (avg 2048 tokens)

2

Challenge

Achieving response quality and speed comparable to cloud LLMs (GPT-3.5) using quantized open-source models on consumer hardware

Solution

Benchmarked 5 models (Llama 3.2, Mistral 7B, Mixtral 8x7B); used model routing based on query complexity; enabled KV cache and batched inference with Ollama

3

Challenge

Handling 50+ concurrent WebSocket connections with real-time streaming without blocking or degrading response time

Solution

Built async FastAPI with background task queue (Redis + Celery), token streaming via WebSocket with backpressure handling, and connection pooling (max 100)