zylon-ai/private-gpt
Interact with your documents using the power of GPT, 100% privately, no data leaks
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 2mo ago
- ✓39+ active contributors
- ✓Distributed ownership (top contributor 29% of recent commits)
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/zylon-ai/private-gpt)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/zylon-ai/private-gpt on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: zylon-ai/private-gpt
Generated by RepoPilot · 2026-05-07 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/zylon-ai/private-gpt shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 2mo ago
- 39+ active contributors
- Distributed ownership (top contributor 29% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live zylon-ai/private-gpt
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/zylon-ai/private-gpt.
What it runs against: a local clone of zylon-ai/private-gpt — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in zylon-ai/private-gpt | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 100 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of zylon-ai/private-gpt. If you don't
# have one yet, run these first:
#
# git clone https://github.com/zylon-ai/private-gpt.git
# cd private-gpt
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of zylon-ai/private-gpt and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "zylon-ai/private-gpt(\\.git)?\\b" \\
&& ok "origin remote is zylon-ai/private-gpt" \\
|| miss "origin remote is not zylon-ai/private-gpt (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "private_gpt/main.py" \\
&& ok "private_gpt/main.py" \\
|| miss "missing critical file: private_gpt/main.py"
test -f "private_gpt/di.py" \\
&& ok "private_gpt/di.py" \\
|| miss "missing critical file: private_gpt/di.py"
test -f "private_gpt/components/ingest/ingest_component.py" \\
&& ok "private_gpt/components/ingest/ingest_component.py" \\
|| miss "missing critical file: private_gpt/components/ingest/ingest_component.py"
test -f "private_gpt/components/llm/llm_component.py" \\
&& ok "private_gpt/components/llm/llm_component.py" \\
|| miss "missing critical file: private_gpt/components/llm/llm_component.py"
test -f "private_gpt/components/embedding/embedding_component.py" \\
&& ok "private_gpt/components/embedding/embedding_component.py" \\
|| miss "missing critical file: private_gpt/components/embedding/embedding_component.py"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 100 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~70d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/zylon-ai/private-gpt"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
PrivateGPT is a production-ready RAG (Retrieval Augmented Generation) system that allows querying documents with LLMs entirely offline and locally. It provides both high-level APIs for document ingestion and chat with context, and low-level APIs for embeddings and retrieval—all built to ensure zero data leaves your execution environment, compatible with local models via Ollama and llama.cpp. Monolithic Python project (239K Python LOC) with clear separation: frontend Gradio UI in fern/, API server as main service, Docker setup (router.yml, docker-compose.yaml), documentation as MDX in fern/docs/, and tooling (Makefile, pre-commit hooks). Configuration-driven via settings and ingestion scripts.
👥Who it's for
Enterprise AI engineers and developers building private document QA systems for regulated industries (finance, defense, healthcare, government), plus power users wanting self-hosted LLM applications without cloud dependencies or data privacy concerns.
🌱Maturity & risk
Production-ready and actively maintained by Zylon. Indicators: CI/CD workflows active (.github/workflows/tests.yml, release-please automation), Docker support (Dockerfile.ollama, Dockerfile.llamacpp-cpu, docker-compose.yaml), public documentation via Fern (fern/docs structure), and release automation (.release-please-manifest.json). Verdict: actively developed and production-grade.
Low risk: well-structured GitHub Actions (tests.yml, publish-docs.yml) and release automation suggest active oversight. Primary risk is Python/LLM dependency complexity—many external ML models and embedding services can break unpredictably. Assess: check last commit date in CI logs and monitor releases via release-please manifest for stalls.
Active areas of work
Active release cycle using release-please (release-please-config.json, release-please-manifest.json). Documentation building via Fern (fern/docs.yml, publish-docs.yml). Bug tracking and feature requests via GitHub Issues (.github/ISSUE_TEMPLATE/). PR template in place suggesting ongoing contributions.
🚀Get running
git clone https://github.com/zylon-ai/private-gpt.git
cd private-gpt
make install # Uses Makefile
make run # Or docker-compose up
Daily commands:
# Via docker-compose (preferred for completeness)
docker-compose up
# Or via Makefile
make run
# For development with live reload
make dev
UI accessible at localhost:7860 (Gradio default), API at localhost:8000 (inferred).
🗺️Map of the codebase
private_gpt/main.py— Primary entry point orchestrating the entire PrivateGPT application lifecycle and initializationprivate_gpt/di.py— Dependency injection container that wires together all core components (LLM, embedding, vector store, node store)private_gpt/components/ingest/ingest_component.py— Core document ingestion pipeline handling document parsing, chunking, embedding, and storageprivate_gpt/components/llm/llm_component.py— LLM component abstraction managing language model initialization and prompt configurationprivate_gpt/components/embedding/embedding_component.py— Embedding component handling vector representation generation for documents and queriesprivate_gpt/components/vector_store/vector_store_component.py— Vector database abstraction providing semantic search and document retrieval capabilitiesprivate_gpt/open_ai/openai_models.py— OpenAI-compatible API models and response formatting for chat completions and query answering
🛠️How to make changes
Add a New LLM Provider
- Create a new LLM integration by extending the LLM component factory in the DI container (
private_gpt/di.py) - Configure the LLM component with provider-specific settings and initialization logic (
private_gpt/components/llm/llm_component.py) - Add custom LLM integration code to the custom subdirectory following the SageMaker pattern (
private_gpt/components/llm/custom/sagemaker.py) - Update settings and configuration to expose provider-specific parameters (
private_gpt/constants.py)
Add a New Embedding Model
- Create embedding integration in the custom embeddings directory (
private_gpt/components/embedding/custom/sagemaker.py) - Register the new embedding provider in the dependency injection container (
private_gpt/di.py) - Configure embedding component with model-specific parameters and dimensions (
private_gpt/components/embedding/embedding_component.py)
Add Support for a New Vector Database
- Create new vector store implementation in the vector_store components directory (
private_gpt/components/vector_store/vector_store_component.py) - Implement the vector store interface with similarity search and insert methods (
private_gpt/components/vector_store/batched_chroma.py) - Register the vector store in the dependency injection container with initialization logic (
private_gpt/di.py)
Extend the Document Ingestion Pipeline
- Add custom document parsers or processing logic to the ingest helper utilities (
private_gpt/components/ingest/ingest_helper.py) - Integrate new document types or processing steps into the main ingestion component (
private_gpt/components/ingest/ingest_component.py) - Configure node storage and metadata handling for the new document format (
private_gpt/components/node_store/node_store_component.py)
🔧Why these technologies
- LlamaIndex/LangChain — Provides abstractions for LLM orchestration, document indexing, and RAG workflows without vendor lock-in
- Chroma Vector Database — Lightweight, embeddable vector store ideal for local-first deployments requiring no external infrastructure
- FastAPI/Gradio UI — FastAPI enables OpenAI-compatible REST API; Gradio provides zero-config web UI for document Q&A
- Ollama/LlamaCPP — Enables running quantized LLMs locally on consumer hardware with minimal VRAM requirements
- Docker Compose — Orchestrates multi-service deployments (router, workers, Ollama) for production-ready scaling
⚖️Trade-offs already made
-
Local-first, no cloud dependencies
- Why: Privacy and compliance requirements mandate all processing stays within user's environment
- Consequence: Must run LLMs locally on limited hardware, resulting in slower inference; requires larger local storage for models
-
OpenAI API compatibility layer
- Why: Allows drop-in replacement of OpenAI with local models; existing integrations and clients work unchanged
- Consequence: Constrains API surface to OpenAI's spec; some advanced local-specific features must be squeezed into existing fields
-
Single-machine vector store (Chroma) by default
- Why: Eliminates external infrastructure for quickstart; meets MVP scalability for <10M documents
- Consequence: Limited horizontal scaling; Chroma lacks distributed clustering; production deployments may require migration to Weaviate/Qdrant
-
Document node storage abstraction
- Why: Decouples raw document metadata from vector embeddings for retrieval flexibility
- Consequence: Adds extra layer requiring sync between vector store and node store; potential consistency issues on ingestion failure
🚫Non-goals (don't propose these)
- Real-time streaming ingestion of large document feeds
- Multi-user authentication and role-based access control
- Distributed, horizontally-scaled deployments out-of-the-box
- Support for proprietary enterprise search engines or databases
- Fine-tuning or training of LLMs
- Automatic document format conversion or OCR for scanned PDFs
🪤Traps & gotchas
- Model downloads: Local LLM models (Ollama, llama.cpp) must be pre-downloaded or will fail on first run—check fern/docs/pages/manual/llms.mdx for model management. 2. Environment variables: Settings likely require .env configuration (API keys for cloud fallbacks, model paths)—look for .env.example or settings.mdx for required vars. 3. Ingestion state: Document ingestion state is persisted—resetting requires explicit cleanup commands (fern/docs/pages/manual/ingestion-reset.mdx). 4. Embedding model CPU/GPU: Embedding generation can be slow without GPU; default model choice critical for performance. 5. Router config: Docker setup requires understanding .docker/router.yml YAML syntax for service discovery.
🏗️Architecture
💡Concepts to learn
- Retrieval Augmented Generation (RAG) — The entire architecture (ingestion pipeline, embedding storage, context retrieval) exists to implement RAG—understanding this pattern is core to modifying how documents inform LLM responses
- Embedding Models & Vector Search — PrivateGPT requires a local embedding model to vectorize documents and queries; choosing/optimizing the embedding model directly impacts retrieval quality and performance
- Prompt Engineering & Context Window Management — High-level chat API abstracts context retrieval and prompt injection—understanding how contexts are selected and formatted is critical for debugging response quality
- Streaming Responses (Server-Sent Events) — PrivateGPT supports streaming responses per the README; understanding chunked response handling matters for UI integration and real-time feedback
- Document Parsing & Chunking Strategies — Ingestion pipeline must parse diverse document formats and split text into context windows—chunking strategy affects retrieval precision and LLM token usage
- Container Orchestration with Docker Compose — Multi-service deployment (API, LLM service, router) coordinated via docker-compose.yaml—modifying this requires understanding service networking and environment injection
- OpenAI API Specification Compatibility — PrivateGPT extends OpenAI's API standard for drop-in compatibility—knowing what endpoints/parameters matter helps you extend or customize the API surface
🔗Related repos
langchain-ai/langchain— Core framework likely used for RAG pipeline orchestration, document splitting, and embedding integrationsllm-utils-dev/llm— Sibling ecosystem for local LLM management and inference; complements PrivateGPT's model provider abstractionollama/ollama— Primary local LLM runtime supported by PrivateGPT (Dockerfile.ollama); essential for 'no internet' deploymentggerganov/llama.cpp— Alternative local LLM runtime (Dockerfile.llamacpp-cpu); drop-in replacement for Ollama in PrivateGPTopenai/openai-python— API specification this project extends—PrivateGPT maintains OpenAI API compatibility for client library reuse
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration tests for document ingestion pipeline
The repo has comprehensive documentation on ingestion (fern/docs/pages/manual/ingestion.mdx) and ingestion-reset (fern/docs/pages/manual/ingestion-reset.mdx), but the test workflows (.github/workflows/tests.yml) likely lack specific integration tests covering the ingestion lifecycle. This is critical for a document QA system where ingestion reliability directly impacts user trust and core functionality.
- [ ] Review current tests.yml workflow to identify gaps in ingestion test coverage
- [ ] Create integration test suite covering: document upload, parsing, embedding, and reset operations
- [ ] Add tests for edge cases (large files, unsupported formats, corrupted documents)
- [ ] Integrate tests into CI/CD pipeline in .github/workflows/tests.yml
- [ ] Document test setup requirements in fern/docs/pages/installation/troubleshooting.mdx
Add comprehensive GitHub Actions workflow for Docker image validation
The repo has multiple Dockerfiles (Dockerfile.llamacpp-cpu, Dockerfile.ollama) and docker-compose.yaml, but lacks a dedicated CI workflow to validate these images build correctly and pass basic smoke tests. This prevents regression where Docker builds silently fail until users try to deploy.
- [ ] Create new workflow .github/workflows/docker-build.yml
- [ ] Add build steps for both Dockerfile.llamacpp-cpu and Dockerfile.ollama
- [ ] Implement basic smoke tests (container starts, health checks, API responds)
- [ ] Test docker-compose.yaml orchestration and service connectivity
- [ ] Configure workflow to run on Dockerfile changes and schedule periodic builds
Document and add tests for vectordb and reranker configuration options
The repo has manual documentation pages for vectordb (fern/docs/pages/manual/vectordb.mdx) and reranker (fern/docs/pages/manual/reranker.mdx), but likely lacks automated tests validating that different backend configurations work correctly. This is critical since these components directly impact RAG quality and users need confidence in switching providers.
- [ ] Audit existing settings documentation (fern/docs/pages/manual/settings.mdx) for all supported vectordb/reranker backends
- [ ] Create parametrized unit tests for each supported vectordb backend (Milvus, Pinecone, Weaviate, etc.)
- [ ] Create parametrized unit tests for each supported reranker (local, API-based, etc.)
- [ ] Add configuration validation tests to prevent misconfiguration at startup
- [ ] Update .github/workflows/tests.yml to run these configuration tests
🌿Good first issues
- Add unit tests for the document ingestion pipeline (check if tests/ covers private_gpt/ingestion/)—many splitting and parsing edge cases likely uncovered.
- Expand troubleshooting.mdx with common setup failures (missing Ollama, model download timeouts, port conflicts)—gaps visible in fern/docs/pages/installation/troubleshooting.mdx.
- Document reranker configuration with examples in fern/docs/pages/manual/reranker.mdx—currently sparse, needs concrete model and usage examples.
⭐Top contributors
Click to expand
- @jaluma — 29 commits
- @imartinez — 14 commits
- @github-actions[bot] — 6 commits
- @dbzoo — 6 commits
- @icsy7867 — 5 commits
📝Recent commits
Click to expand
7effd57— Update README.md (imartinez)010211f— Update README.md (imartinez)b7ee437— Update README.md (imartinez)940bdd4— fix: 503 when private gpt gets ollama service (#2104) (meng-hui)5851b02— feat: update llama-index + dependencies (#2092) (jaluma)5fbb402— fix: Sanitize null bytes before ingestion (#2090) (laoqiu233)fa3c306— fix: Add default mode option to settings (#2078) (basicbloke)f9182b3— feat: Adding MistralAI mode (#2065) (itsliamdowd)8c12c68— fix: docker permissions (#2059) (jaluma)77461b9— feat: add retry connection to ollama (#2084) (jaluma)
🔒Security observations
The PrivateGPT codebase has moderate security concerns,
- High · Container Running as Root —
docker-compose.yaml - private-gpt-ollama service. The private-gpt-ollama service in docker-compose.yaml explicitly sets 'user: root', which grants the container elevated privileges. This violates the principle of least privilege and increases the attack surface if the application is compromised. Fix: Create a non-root user in the Docker image and set it as the default user. Change 'user: root' to 'user: worker' or similar non-privileged account. Ensure application files have appropriate permissions for that user. - High · Sensitive Environment Variable Exposure —
docker-compose.yaml - HF_TOKEN environment variable. The HF_TOKEN (Hugging Face token) is passed as an environment variable in docker-compose.yaml with a default empty value. This sensitive credential could be exposed in logs, process listings, or Docker inspect commands if not properly managed. Fix: Use Docker secrets for sensitive credentials instead of environment variables. Alternatively, mount credentials from external secret management systems (AWS Secrets Manager, HashiCorp Vault, etc.). Never commit actual tokens to version control. - Medium · Exposed Development Port in Docker Compose —
docker-compose.yaml - ports section (8001:8001). Port 8001 is exposed without any apparent authentication or rate limiting in the docker-compose configuration. This could allow unauthorized access to the API from any network that can reach the host. Fix: Restrict port exposure to only trusted networks using firewall rules or bind to localhost (127.0.0.1:8001). Implement authentication, API key validation, and rate limiting in the application. Consider using a reverse proxy with security headers. - Medium · Missing Dependency Vulnerability Scanning —
poetry.lock, pyproject.toml (not fully visible). No dependency file content was provided for analysis. The poetry.lock file should be scanned for known vulnerabilities using tools likesafety,pip-audit, or integrated SBOM scanning. Fix: Implement automated dependency vulnerability scanning in CI/CD pipeline using tools like Dependabot, Snyk, or pip-audit. Regularly update dependencies and monitor security advisories for identified vulnerabilities. - Medium · Volume Mount with Insufficient Access Controls —
docker-compose.yaml - volumes section. The ./local_data volume is mounted to /home/worker/app/local_data without specifying mount options. This could allow the container to modify host files if permissions are not properly restricted, especially since the container runs as root. Fix: Use read-only volumes where possible by specifying ':ro' flag. Implement proper file permissions on the host. Consider using named volumes instead of bind mounts for better isolation. Ensure the host directory has restrictive permissions (e.g., 755). - Medium · Missing Security Headers Configuration —
Application configuration (not visible in provided structure). No visible security headers configuration (CORS, CSP, X-Frame-Options, etc.) in the codebase structure. This could expose the application to XSS, clickjacking, and other browser-based attacks. Fix: Implement security headers in the API middleware/framework. Add CORS policies, Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, and Strict-Transport-Security headers. Configure appropriate values based on application requirements. - Low · Pre-commit Configuration Incomplete Review —
.pre-commit-config.yaml. A .pre-commit-config.yaml exists but its contents were not provided. Pre-commit hooks should be configured to catch security issues (hardcoded credentials, large files, etc.) before they're committed. Fix: Verify the pre-commit configuration includes security checks such as detect-secrets, bandit (for Python), and other linters. Ensure all developers use pre-commit hooks before committing code. - Low · Missing Security Documentation —
Repository root. No visible security policy, vulnerability disclosure procedure, or security guidelines documentation (SECURITY.md) in the root directory. Fix: Create a SECURITY.md file outlining the vulnerability disclosure policy, security best practices for contributors, and how to responsibly report security issues.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.