OpenChat: Self-Hosted AI Chat Platform with Streaming, Multi-Provider Routing, and Persistent Sessions
OpenChat is a comprehensive, production-style AI chat product built as a monorepo, designed for self-hosting. The architecture combines a Next.js frontend, a FastAPI backend, PostgreSQL for persistence, and Docker Compose for deployment.
Users interact with local or remote LLMs through a polished, dark-first interface that renders markdown and syntax-highlighted code. Crucially, responses stream token-by-token over HTTP, providing a superior user experience compared to traditional buffered completions.
The backend implements a sophisticated provider router capable of dispatching requests to various backends, including Ollama and any OpenAI-compatible API. It normalizes streaming chunks, manages session persistence via async SQLAlchemy, and handles complex multi-turn memory management. The frontend leverages Zustand for state management and utilizes Next.js BFF API routes to proxy all backend communication, ensuring the browser never talks directly to the core API.
Engineering quality was paramount: the project includes comprehensive testing suites (pytest, Vitest, Playwright) and CI/CD gates via GitHub Actions, ensuring reliability across every push and PR. Furthermore, the design incorporates hooks for future RAG and agent tool integration, future-proofing the core chat pipeline.
Features
- Real-time
- token-by-token streaming chat experience
- Multi-provider routing supporting Ollama and OpenAI APIs
- Durable session management with history persistence and session control
- Thinking mode displaying collapsible reasoning blocks
- Support for image and file attachments with vision capabilities
- Next.js BFF API proxy for enhanced security
- Full CI/CD pipeline with unit
- integration
- and E2E testing
Challenges
The primary challenges included achieving true end-to-end streaming from the LLM provider through the FastAPI layer to the browser's ReadableStream, while maintaining a responsive UI. Additionally, the system needed to abstract multiple, disparate model backends behind a single, consistent API contract, all while ensuring session consistency in PostgreSQL during asynchronous chunk arrival.
Solutions
We solved the streaming challenge by implementing async streaming from the provider through FastAPI's StreamingResponse to the browser's ReadableStream, updating the UI state incrementally with Zustand. A dedicated ProviderRouter was introduced to normalize requests for various backends. For persistence, we utilized layered backend design with async SQLAlchemy and Alembic for robust, session-scoped memory management. The frontend was secured by routing all traffic through Next.js BFF routes, and the entire stack was hardened with comprehensive testing and CI/CD gates.