Web Development

OpenChat: Self-Hosted AI Chat Platform with Streaming, Multi-Provider Routing, and Persistent Sessions

OpenChat is a comprehensive, production-style AI chat product built as a monorepo, designed for self-hosting. The architecture combines a Next.js frontend, a FastAPI backend, PostgreSQL for persistence, and Docker Compose for deployment.

Users interact with local or remote LLMs through a polished, dark-first interface that renders markdown and syntax-highlighted code. Crucially, responses stream token-by-token over HTTP, providing a superior user experience compared to traditional buffered completions.

The backend implements a sophisticated provider router capable of dispatching requests to various backends, including Ollama and any OpenAI-compatible API. It normalizes streaming chunks, manages session persistence via async SQLAlchemy, and handles complex multi-turn memory management. The frontend leverages Zustand for state management and utilizes Next.js BFF API routes to proxy all backend communication, ensuring the browser never talks directly to the core API.

Engineering quality was paramount: the project includes comprehensive testing suites (pytest, Vitest, Playwright) and CI/CD gates via GitHub Actions, ensuring reliability across every push and PR. Furthermore, the design incorporates hooks for future RAG and agent tool integration, future-proofing the core chat pipeline.

Features

Real-time
token-by-token streaming chat experience
Multi-provider routing supporting Ollama and OpenAI APIs
Durable session management with history persistence and session control
Thinking mode displaying collapsible reasoning blocks
Support for image and file attachments with vision capabilities
Next.js BFF API proxy for enhanced security
Full CI/CD pipeline with unit
integration
and E2E testing

Challenges

The primary challenges included achieving true end-to-end streaming from the LLM provider through the FastAPI layer to the browser's ReadableStream, while maintaining a responsive UI. Additionally, the system needed to abstract multiple, disparate model backends behind a single, consistent API contract, all while ensuring session consistency in PostgreSQL during asynchronous chunk arrival.

Solutions

We solved the streaming challenge by implementing async streaming from the provider through FastAPI's StreamingResponse to the browser's ReadableStream, updating the UI state incrementally with Zustand. A dedicated ProviderRouter was introduced to normalize requests for various backends. For persistence, we utilized layered backend design with async SQLAlchemy and Alembic for robust, session-scoped memory management. The frontend was secured by routing all traffic through Next.js BFF routes, and the entire stack was hardened with comprehensive testing and CI/CD gates.

Technologies

Next.js React TypeScript Tailwind CSS FastAPI Python PostgreSQL Zustand Docker Docker Compose Ollama

Client: Personal Project

Duration: 3 months

Published: Jun 2026

← All projects