zylon-ai/private-gpt

Item: zylon-ai/private-gpt
Rating: 5
Author: RepoPilot

Interact with your documents using the power of GPT, 100% privately, no data leaks

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 2mo ago
✓39+ active contributors
✓Distributed ownership (top contributor 29% of recent commits)
✓Apache-2.0 licensed
✓CI configured
✓Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/zylon-ai/private-gpt)](https://repopilot.app/r/zylon-ai/private-gpt)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/zylon-ai/private-gpt on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: zylon-ai/private-gpt

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/zylon-ai/private-gpt shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 2mo ago
39+ active contributors
Distributed ownership (top contributor 29% of recent commits)
Apache-2.0 licensed
CI configured
Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live zylon-ai/private-gpt repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/zylon-ai/private-gpt.

What it runs against: a local clone of zylon-ai/private-gpt — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in zylon-ai/private-gpt | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 100 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>zylon-ai/private-gpt</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of zylon-ai/private-gpt. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/zylon-ai/private-gpt.git
#   cd private-gpt
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of zylon-ai/private-gpt and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "zylon-ai/private-gpt(\\.git)?\\b" \\
  && ok "origin remote is zylon-ai/private-gpt" \\
  || miss "origin remote is not zylon-ai/private-gpt (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "private_gpt/main.py" \\
  && ok "private_gpt/main.py" \\
  || miss "missing critical file: private_gpt/main.py"
test -f "private_gpt/di.py" \\
  && ok "private_gpt/di.py" \\
  || miss "missing critical file: private_gpt/di.py"
test -f "private_gpt/components/ingest/ingest_component.py" \\
  && ok "private_gpt/components/ingest/ingest_component.py" \\
  || miss "missing critical file: private_gpt/components/ingest/ingest_component.py"
test -f "private_gpt/components/llm/llm_component.py" \\
  && ok "private_gpt/components/llm/llm_component.py" \\
  || miss "missing critical file: private_gpt/components/llm/llm_component.py"
test -f "private_gpt/components/embedding/embedding_component.py" \\
  && ok "private_gpt/components/embedding/embedding_component.py" \\
  || miss "missing critical file: private_gpt/components/embedding/embedding_component.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 100 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~70d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/zylon-ai/private-gpt"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

PrivateGPT is a production-ready RAG (Retrieval Augmented Generation) system that allows querying documents with LLMs entirely offline and locally. It provides both high-level APIs for document ingestion and chat with context, and low-level APIs for embeddings and retrieval—all built to ensure zero data leaves your execution environment, compatible with local models via Ollama and llama.cpp. Monolithic Python project (239K Python LOC) with clear separation: frontend Gradio UI in fern/, API server as main service, Docker setup (router.yml, docker-compose.yaml), documentation as MDX in fern/docs/, and tooling (Makefile, pre-commit hooks). Configuration-driven via settings and ingestion scripts.

👥Who it's for

Enterprise AI engineers and developers building private document QA systems for regulated industries (finance, defense, healthcare, government), plus power users wanting self-hosted LLM applications without cloud dependencies or data privacy concerns.

🌱Maturity & risk

Production-ready and actively maintained by Zylon. Indicators: CI/CD workflows active (.github/workflows/tests.yml, release-please automation), Docker support (Dockerfile.ollama, Dockerfile.llamacpp-cpu, docker-compose.yaml), public documentation via Fern (fern/docs structure), and release automation (.release-please-manifest.json). Verdict: actively developed and production-grade.

Low risk: well-structured GitHub Actions (tests.yml, publish-docs.yml) and release automation suggest active oversight. Primary risk is Python/LLM dependency complexity—many external ML models and embedding services can break unpredictably. Assess: check last commit date in CI logs and monitor releases via release-please manifest for stalls.

Active areas of work

Active release cycle using release-please (release-please-config.json, release-please-manifest.json). Documentation building via Fern (fern/docs.yml, publish-docs.yml). Bug tracking and feature requests via GitHub Issues (.github/ISSUE_TEMPLATE/). PR template in place suggesting ongoing contributions.

🚀Get running

git clone https://github.com/zylon-ai/private-gpt.git
cd private-gpt
make install  # Uses Makefile
make run      # Or docker-compose up

Daily commands:

# Via docker-compose (preferred for completeness)
docker-compose up

# Or via Makefile
make run

# For development with live reload
make dev

UI accessible at localhost:7860 (Gradio default), API at localhost:8000 (inferred).

🗺️Map of the codebase

private_gpt/main.py — Primary entry point orchestrating the entire PrivateGPT application lifecycle and initialization
private_gpt/di.py — Dependency injection container that wires together all core components (LLM, embedding, vector store, node store)
private_gpt/components/ingest/ingest_component.py — Core document ingestion pipeline handling document parsing, chunking, embedding, and storage
private_gpt/components/llm/llm_component.py — LLM component abstraction managing language model initialization and prompt configuration
private_gpt/components/embedding/embedding_component.py — Embedding component handling vector representation generation for documents and queries
private_gpt/components/vector_store/vector_store_component.py — Vector database abstraction providing semantic search and document retrieval capabilities
private_gpt/open_ai/openai_models.py — OpenAI-compatible API models and response formatting for chat completions and query answering

🛠️How to make changes

Add a New LLM Provider

Create a new LLM integration by extending the LLM component factory in the DI container (private_gpt/di.py)
Configure the LLM component with provider-specific settings and initialization logic (private_gpt/components/llm/llm_component.py)
Add custom LLM integration code to the custom subdirectory following the SageMaker pattern (private_gpt/components/llm/custom/sagemaker.py)
Update settings and configuration to expose provider-specific parameters (private_gpt/constants.py)

Add a New Embedding Model

Create embedding integration in the custom embeddings directory (private_gpt/components/embedding/custom/sagemaker.py)
Register the new embedding provider in the dependency injection container (private_gpt/di.py)
Configure embedding component with model-specific parameters and dimensions (private_gpt/components/embedding/embedding_component.py)

Add Support for a New Vector Database

Create new vector store implementation in the vector_store components directory (private_gpt/components/vector_store/vector_store_component.py)
Implement the vector store interface with similarity search and insert methods (private_gpt/components/vector_store/batched_chroma.py)
Register the vector store in the dependency injection container with initialization logic (private_gpt/di.py)

Extend the Document Ingestion Pipeline

Add custom document parsers or processing logic to the ingest helper utilities (private_gpt/components/ingest/ingest_helper.py)
Integrate new document types or processing steps into the main ingestion component (private_gpt/components/ingest/ingest_component.py)
Configure node storage and metadata handling for the new document format (private_gpt/components/node_store/node_store_component.py)

🔧Why these technologies

LlamaIndex/LangChain — Provides abstractions for LLM orchestration, document indexing, and RAG workflows without vendor lock-in
Chroma Vector Database — Lightweight, embeddable vector store ideal for local-first deployments requiring no external infrastructure
FastAPI/Gradio UI — FastAPI enables OpenAI-compatible REST API; Gradio provides zero-config web UI for document Q&A
Ollama/LlamaCPP — Enables running quantized LLMs locally on consumer hardware with minimal VRAM requirements
Docker Compose — Orchestrates multi-service deployments (router, workers, Ollama) for production-ready scaling

⚖️Trade-offs already made

Local-first, no cloud dependencies
- Why: Privacy and compliance requirements mandate all processing stays within user's environment
- Consequence: Must run LLMs locally on limited hardware, resulting in slower inference; requires larger local storage for models
OpenAI API compatibility layer
- Why: Allows drop-in replacement of OpenAI with local models; existing integrations and clients work unchanged
- Consequence: Constrains API surface to OpenAI's spec; some advanced local-specific features must be squeezed into existing fields
Single-machine vector store (Chroma) by default
- Why: Eliminates external infrastructure for quickstart; meets MVP scalability for <10M documents
- Consequence: Limited horizontal scaling; Chroma lacks distributed clustering; production deployments may require migration to Weaviate/Qdrant
Document node storage abstraction
- Why: Decouples raw document metadata from vector embeddings for retrieval flexibility
- Consequence: Adds extra layer requiring sync between vector store and node store; potential consistency issues on ingestion failure

🚫Non-goals (don't propose these)

Real-time streaming ingestion of large document feeds
Multi-user authentication and role-based access control
Distributed, horizontally-scaled deployments out-of-the-box
Support for proprietary enterprise search engines or databases
Fine-tuning or training of LLMs
Automatic document format conversion or OCR for scanned PDFs

🪤Traps & gotchas

Model downloads: Local LLM models (Ollama, llama.cpp) must be pre-downloaded or will fail on first run—check fern/docs/pages/manual/llms.mdx for model management. 2. Environment variables: Settings likely require .env configuration (API keys for cloud fallbacks, model paths)—look for .env.example or settings.mdx for required vars. 3. Ingestion state: Document ingestion state is persisted—resetting requires explicit cleanup commands (fern/docs/pages/manual/ingestion-reset.mdx). 4. Embedding model CPU/GPU: Embedding generation can be slow without GPU; default model choice critical for performance. 5. Router config: Docker setup requires understanding .docker/router.yml YAML syntax for service discovery.

🏗️Architecture

💡Concepts to learn

Retrieval Augmented Generation (RAG) — The entire architecture (ingestion pipeline, embedding storage, context retrieval) exists to implement RAG—understanding this pattern is core to modifying how documents inform LLM responses
Embedding Models & Vector Search — PrivateGPT requires a local embedding model to vectorize documents and queries; choosing/optimizing the embedding model directly impacts retrieval quality and performance
Prompt Engineering & Context Window Management — High-level chat API abstracts context retrieval and prompt injection—understanding how contexts are selected and formatted is critical for debugging response quality
Streaming Responses (Server-Sent Events) — PrivateGPT supports streaming responses per the README; understanding chunked response handling matters for UI integration and real-time feedback
Document Parsing & Chunking Strategies — Ingestion pipeline must parse diverse document formats and split text into context windows—chunking strategy affects retrieval precision and LLM token usage
Container Orchestration with Docker Compose — Multi-service deployment (API, LLM service, router) coordinated via docker-compose.yaml—modifying this requires understanding service networking and environment injection
OpenAI API Specification Compatibility — PrivateGPT extends OpenAI's API standard for drop-in compatibility—knowing what endpoints/parameters matter helps you extend or customize the API surface

langchain-ai/langchain — Core framework likely used for RAG pipeline orchestration, document splitting, and embedding integrations
llm-utils-dev/llm — Sibling ecosystem for local LLM management and inference; complements PrivateGPT's model provider abstraction
ollama/ollama — Primary local LLM runtime supported by PrivateGPT (Dockerfile.ollama); essential for 'no internet' deployment
ggerganov/llama.cpp — Alternative local LLM runtime (Dockerfile.llamacpp-cpu); drop-in replacement for Ollama in PrivateGPT
openai/openai-python — API specification this project extends—PrivateGPT maintains OpenAI API compatibility for client library reuse

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for document ingestion pipeline

The repo has comprehensive documentation on ingestion (fern/docs/pages/manual/ingestion.mdx) and ingestion-reset (fern/docs/pages/manual/ingestion-reset.mdx), but the test workflows (.github/workflows/tests.yml) likely lack specific integration tests covering the ingestion lifecycle. This is critical for a document QA system where ingestion reliability directly impacts user trust and core functionality.

[ ] Review current tests.yml workflow to identify gaps in ingestion test coverage
[ ] Create integration test suite covering: document upload, parsing, embedding, and reset operations
[ ] Add tests for edge cases (large files, unsupported formats, corrupted documents)
[ ] Integrate tests into CI/CD pipeline in .github/workflows/tests.yml
[ ] Document test setup requirements in fern/docs/pages/installation/troubleshooting.mdx

Add comprehensive GitHub Actions workflow for Docker image validation

The repo has multiple Dockerfiles (Dockerfile.llamacpp-cpu, Dockerfile.ollama) and docker-compose.yaml, but lacks a dedicated CI workflow to validate these images build correctly and pass basic smoke tests. This prevents regression where Docker builds silently fail until users try to deploy.

[ ] Create new workflow .github/workflows/docker-build.yml
[ ] Add build steps for both Dockerfile.llamacpp-cpu and Dockerfile.ollama
[ ] Implement basic smoke tests (container starts, health checks, API responds)
[ ] Test docker-compose.yaml orchestration and service connectivity
[ ] Configure workflow to run on Dockerfile changes and schedule periodic builds

Document and add tests for vectordb and reranker configuration options

The repo has manual documentation pages for vectordb (fern/docs/pages/manual/vectordb.mdx) and reranker (fern/docs/pages/manual/reranker.mdx), but likely lacks automated tests validating that different backend configurations work correctly. This is critical since these components directly impact RAG quality and users need confidence in switching providers.

[ ] Audit existing settings documentation (fern/docs/pages/manual/settings.mdx) for all supported vectordb/reranker backends
[ ] Create parametrized unit tests for each supported vectordb backend (Milvus, Pinecone, Weaviate, etc.)
[ ] Create parametrized unit tests for each supported reranker (local, API-based, etc.)
[ ] Add configuration validation tests to prevent misconfiguration at startup
[ ] Update .github/workflows/tests.yml to run these configuration tests

🌿Good first issues

Add unit tests for the document ingestion pipeline (check if tests/ covers private_gpt/ingestion/)—many splitting and parsing edge cases likely uncovered.
Expand troubleshooting.mdx with common setup failures (missing Ollama, model download timeouts, port conflicts)—gaps visible in fern/docs/pages/installation/troubleshooting.mdx.
Document reranker configuration with examples in fern/docs/pages/manual/reranker.mdx—currently sparse, needs concrete model and usage examples.

⭐Top contributors

Click to expand

@jaluma — 29 commits
@imartinez — 14 commits
@github-actions[bot] — 6 commits
@dbzoo — 6 commits
@icsy7867 — 5 commits

📝Recent commits