mindverse/Second-Me
Train your AI self, amplify you, bridge the world
Slowing — last commit 7mo ago
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
last commit was 7mo ago; no CI workflows detected
- ✓Last commit 7mo ago
- ✓15 active contributors
- ✓Distributed ownership (top contributor 43% of recent commits)
- ✓Apache-2.0 licensed
- ✓Tests present
- ⚠Slowing — last commit 7mo ago
- ⚠No CI workflows detected
What would change the summary?
- →Deploy as-is Mixed → Healthy if: 1 commit in the last 180 days
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Safe to depend on" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/mindverse/second-me)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/mindverse/second-me on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: mindverse/Second-Me
Generated by RepoPilot · 2026-05-07 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/mindverse/Second-Me shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Slowing — last commit 7mo ago
- Last commit 7mo ago
- 15 active contributors
- Distributed ownership (top contributor 43% of recent commits)
- Apache-2.0 licensed
- Tests present
- ⚠ Slowing — last commit 7mo ago
- ⚠ No CI workflows detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live mindverse/Second-Me
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/mindverse/Second-Me.
What it runs against: a local clone of mindverse/Second-Me — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in mindverse/Second-Me | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 249 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of mindverse/Second-Me. If you don't
# have one yet, run these first:
#
# git clone https://github.com/mindverse/Second-Me.git
# cd Second-Me
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of mindverse/Second-Me and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "mindverse/Second-Me(\\.git)?\\b" \\
&& ok "origin remote is mindverse/Second-Me" \\
|| miss "origin remote is not mindverse/Second-Me (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "lpm_frontend/src/app/dashboard/page.tsx" \\
&& ok "lpm_frontend/src/app/dashboard/page.tsx" \\
|| miss "missing critical file: lpm_frontend/src/app/dashboard/page.tsx"
test -f "lpm_frontend/package.json" \\
&& ok "lpm_frontend/package.json" \\
|| miss "missing critical file: lpm_frontend/package.json"
test -f "Dockerfile.backend" \\
&& ok "Dockerfile.backend" \\
|| miss "missing critical file: Dockerfile.backend"
test -f ".env" \\
&& ok ".env" \\
|| miss "missing critical file: .env"
test -f "docker-compose.yml" \\
&& ok "docker-compose.yml" \\
|| miss "missing critical file: docker-compose.yml"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 249 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~219d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/mindverse/Second-Me"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Second Me is an open-source AI self-training platform that lets users create a personalized AI clone trained on their own memories and context using Hierarchical Memory Modeling (HMM) and Me-Alignment algorithms. It combines locally-trained AI models (using llama.cpp, transformers, torch) with a decentralized network layer, allowing users to deploy their AI identity across multiple applications while maintaining data privacy and control. Monorepo with clear separation: lpm_frontend/ contains the Next.js TypeScript frontend (with ESLint, Prettier, Stylelint config), backend Python code inferred from root directory structure, Docker orchestration via docker-compose.yml (standard and GPU variants), and plugin/integration points under integrate/ (WeChat bot). Dependencies and model files bundled locally in /dependencies/ rather than fetched from package registries.
👥Who it's for
AI researchers, privacy-conscious technologists, and domain experts who want to build AI agents that authentically represent them without surrendering data to centralized platforms like OpenAI. Also appeals to developers building AI-native applications that need personalized context and identity-preserving AI interactions.
🌱Maturity & risk
Early-stage but actively developed: the codebase has substantial Python (1.8M) and TypeScript (464K) lines, includes Docker multi-variant support (CUDA, Apple Silicon, standard), and references peer-reviewed papers (arXiv links in README for AI-native Memory 1.0 and 2.0). However, the presence of /dependencies/ with custom graphrag and llama.cpp packages, along with the experimental .dev27 version tag, suggests this is still in active research/development rather than production-stable.
Moderate risk: the project depends on heavily customized forks (graphrag-modified.tar.gz, llama.cpp.zip in dependencies/) rather than published packages, creating upgrade and reproducibility challenges. The WeChat bot integration (integrate/wechat_bot.py) and Docker GPU support (three separate Dockerfiles) indicate complexity; GPU CUDA support adds brittleness across hardware. No visible CI/CD pipeline or automated test suite in the file list suggests manual testing burden.
Active areas of work
Active research direction toward AI-native Memory 2.0 (referenced in README badges). Recent work includes hierarchical memory modeling and the Me-Alignment algorithm for identity preservation. Multi-platform Docker support suggests ongoing infrastructure investment. Documentation in progress: docs/ folder contains guides for custom model config (Ollama), embedding model switching, and local vs. public chat APIs.
🚀Get running
git clone https://github.com/mindverse/Second-Me.git
cd Second-Me
cp .env.example .env # configure API keys and model paths
make install # if Makefile has this target
# For frontend: cd lpm_frontend && npm install && npm run dev
# For backend: python -m pip install -r requirements.txt && python main.py
# Or use Docker: docker-compose up (or docker-compose-gpu.yml for CUDA)
Daily commands:
# Development with Docker Compose:
docker-compose up
# or with GPU support:
docker-compose -f docker-compose-gpu.yml up
# Frontend (Next.js):
cd lpm_frontend && npm run dev # runs on http://localhost:3000
# Backend (if running standalone):
python app.py # or equivalent entrypoint inferred from structure
🗺️Map of the codebase
lpm_frontend/src/app/dashboard/page.tsx— Main dashboard entry point; every contributor must understand the primary user interface and navigation structurelpm_frontend/package.json— Defines Next.js, React, and all frontend dependencies; required to set up local development environmentDockerfile.backend— Core backend containerization; essential for understanding deployment and service initialization.env— Environment configuration for API keys and service endpoints; critical for running both frontend and backenddocker-compose.yml— Orchestrates all services (frontend, backend, database, cache); the primary way to run the full stackintegrate/wechat_bot.py— Integration point for WeChat; shows how external platforms connect to Second-Me's core AI training and inferencelpm_frontend/src/app/dashboard/train/page.tsx— AI self-training interface; central to the product's core value proposition of personal AI amplification
🧩Components & responsibilities
- Next.js Frontend (lpm_frontend) — Render dashboard pages, handle user input, call backend APIs, stream
🛠️How to make changes
Add a new Training Page
- Create a new .tsx file in lpm_frontend/src/app/dashboard/train/[feature]/page.tsx following the structure of identity or memories (
lpm_frontend/src/app/dashboard/train/page.tsx) - Add navigation link in the train hub page to reference your new feature (
lpm_frontend/src/app/dashboard/train/page.tsx) - Import UI components and integrate with backend API routes (if needed) (
lpm_frontend/src/app/dashboard/layout.tsx)
Add a new Backend Service (Docker)
- Create a new Dockerfile or extend Dockerfile.backend with your service initialization (
Dockerfile.backend) - Add a new service definition in docker-compose.yml with proper environment variables and volume mounts (
docker-compose.yml) - Update .env with any required API keys or service endpoints for the new service (
.env)
Add a new External Integration
- Create a new Python module in integrate/ folder following the pattern of wechat_bot.py (
integrate/wechat_bot.py) - Add dependencies to integrate/requirements.txt (
integrate/requirements.txt) - Implement message handling that sends user input to the backend AI endpoint and streams responses (
integrate/wechat_bot.py)
Add a new Application Page (API/MCP/Network)
- Create page.tsx in lpm_frontend/src/app/dashboard/applications/[category]/page.tsx (
lpm_frontend/src/app/dashboard/applications/page.tsx) - Add layout and navigation in the applications layout file (
lpm_frontend/src/app/dashboard/applications/layout.tsx) - Wire up backend API routes or third-party service calls as needed (
lpm_frontend/src/app/dashboard/applications/api-mcp/page.tsx)
🔧Why these technologies
- Next.js 14 + React 18 — Modern SSR/SSG framework for fast, SEO-friendly dashboard UI with TypeScript type safety
- Python + PyTorch + Transformers — Industry-standard stack for LLM fine-tuning and inference; supports CUDA for GPU acceleration
- GraphRAG (Custom Fork) — Enables AI-native memory indexing via knowledge graphs; core innovation for personal AI training
- Chroma Vector DB — Fast semantic search for retrieving relevant memories during inference; supports embeddings at scale
- SQLite + Docker Volumes — Lightweight persistent storage for user data and training artifacts; easy local development
- Docker & docker-compose — Reproducible multi-service deployment; isolates frontend, backend, and database; supports CPU/GPU variants
⚖️Trade-offs already made
-
Local-first LLM inference (Ollama/llama.cpp) vs cloud API
- Why: Privacy-preserving personal AI that doesn't send user data to third parties; user independence
- Consequence: Requires more compute resources locally; slower inference than large cloud models; complexity in GPU setup
-
GraphRAG for memory indexing vs flat vector store
- Why: Structured knowledge graphs better capture entity relationships and enable reasoning
- Consequence: Higher implementation complexity; slower indexing; requires custom dependency management
-
SQLite for main storage vs distributed database
- Why: Simple setup for single-user or small teams; direct file-based persistence
- Consequence: No horizontal scaling; limited concurrency; not suitable for multi-tenant SaaS
-
Docker multi-variant (CPU, CUDA, Apple Metal)
- Why: Supports heterogeneous hardware (cloud, gaming GPUs, M-series Macs)
- Consequence: Maintenance burden for multiple Dockerfiles; build complexity; potential runtime inconsistencies
🚫Non-goals (don't propose these)
- Real-time collaborative editing (single-user focus)
- Multi-tenant SaaS platform (personal AI, not hosted service)
- Web-scale distributed training (local fine-tuning only)
- Mobile app (web-only; Next.js frontend)
🪤Traps & gotchas
- Custom dependency packages: /dependencies/ contains graphrag-modified.tar.gz and llama.cpp.zip—not fetched from PyPI; you must extract and reference these locally or modify pip install paths. 2. GPU CUDA rebuild: docker/app/rebuild_llama_cuda.sh exists; if using GPU, CUDA 11.8+ and cuDNN must match your host system. 3. Environment secrets: .env is not in .gitignore listing; ensure you never commit API keys or model paths. 4. ChromaDB + SQLite: Docker SQL init at docker/sqlite/init.sql—if migrating databases, manual schema sync required. 5. Multiple Dockerfiles: Choose the right variant (standard, CUDA, Apple Silicon) upfront; mixing can cause library conflicts.
🏗️Architecture
💡Concepts to learn
- Hierarchical Memory Modeling (HMM) — Core algorithm in Second Me that structures memories at multiple levels of abstraction for efficient retrieval and identity representation—critical to understanding how the AI self is trained
- Me-Alignment Algorithm — Second Me's proprietary method to align model outputs with the user's authentic identity and values—essential for the 'self' in 'Second Me'
- Vector Embeddings & Semantic Search — Second Me uses ChromaDB for vector storage of memories; understanding embeddings is crucial for customizing how the AI retrieves and ranks relevant context
- Quantization (llama.cpp) — Second Me bundles llama.cpp for running large models locally on consumer hardware via quantized weights—understanding this is key to GPU/CPU trade-offs
- Knowledge Graph Extraction (GraphRAG) — Second Me's customized GraphRAG fork structures memories as graphs rather than flat text—enables richer relational queries about the user's identity
- Decentralized AI Identity / DID — Second Me's vision to deploy AI selves across a network as portable digital identities; understanding this shapes how you design extensibility and privacy controls
- Docker Multi-Architecture Builds — Three separate Dockerfiles (standard, CUDA, Apple Silicon) highlight the complexity of supporting different hardware—important for understanding deployment flexibility and pain points
🔗Related repos
lm-sys/FastChat— Similar open-source LLM serving framework for local model deployment; Second Me could adopt FastChat's inference optimization patternsollama/ollama— Lightweight local LLM runner that Second Me explicitly integrates with via docs/Custom Model Config(Ollama).md—natural companion for model managementchroma-core/chroma— Vector database that Second Me uses for memory storage (docker/app/init_chroma.py); core dependency for semantic searchmicrosoft/graphrag— Knowledge graph extraction framework; Second Me forks and customizes graphrag (dependencies/graphrag-modified.tar.gz) for identity-aware memory indexingvercel/next.js— Frontend framework powering lpm_frontend/; Second Me's UI depends on Next.js patterns and configuration
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add GitHub Actions CI/CD workflow for multi-platform Docker builds
The repo has multiple Dockerfiles (backend, backend.apple, backend.cuda) and docker-compose variants (gpu/cpu) but no GitHub Actions workflow to validate builds. This prevents regressions when contributors modify dependencies or Docker configurations. A workflow should test all three backend Dockerfiles and the frontend build.
- [ ] Create .github/workflows/docker-build.yml to build and test Dockerfile.backend, Dockerfile.backend.apple, and Dockerfile.backend.cuda
- [ ] Add step to validate docker-compose.yml and docker-compose-gpu.yml syntax
- [ ] Add step to build lpm_frontend using Dockerfile.frontend
- [ ] Test that docker/app/init_chroma.py and docker/app/check_gpu_support.sh execute without errors in container context
- [ ] Document in CONTRIBUTING.md that all Docker builds must pass before merging
Create comprehensive integration tests for wechat_bot.py with mocking
The integrate/wechat_bot.py module exists with dependencies (wxpy, python-dotenv) but has no visible test suite. Given WeChat bot integration is a critical user-facing feature, this needs tests to prevent breaking changes when refactoring the backend API or message handling logic.
- [ ] Create integrate/tests/ directory with init.py
- [ ] Write integrate/tests/test_wechat_bot.py with mocked wxpy.Bot to test message receive/send flow
- [ ] Add test for .env variable loading (OPENAI_API_KEY, etc.) via python-dotenv
- [ ] Add test to verify bot correctly formats prompts sent to backend API (likely POST requests based on docs)
- [ ] Update integrate/requirements.txt to include pytest and pytest-mock development dependencies
Add API contract tests for backend endpoints referenced in docs/
The docs/ folder contains guides for 'Local Chat API' and 'Public Chat API' but there are no visible tests validating the backend API contracts. Contributors may inadvertently break API endpoints when modifying the Python backend. Tests should validate request/response schemas.
- [ ] Create tests/ directory at repo root (or backend tests/) with test_chat_api.py
- [ ] Write tests for endpoints documented in docs/Local Chat API.md (likely /chat or /complete endpoints)
- [ ] Write tests for endpoints documented in docs/Public Chat API.md
- [ ] Add test to verify embedding model switching works (docs/Embedding Model Switching.md references this feature)
- [ ] Add pytest configuration (pytest.ini or pyproject.toml) and integrate with GitHub Actions workflow from Idea #1
🌿Good first issues
- Add comprehensive test suite for the Me-Alignment algorithm in the Python backend—currently no
/tests/or test files visible in the listing, which is risky for a research codebase that other users rely on. - Document the ChromaDB + embedding model integration with concrete code examples in
docs/Embedding Model Switching.md—file exists but likely needs step-by-step walkthrough for users not familiar with vector databases. - Create a GitHub Actions CI/CD workflow file (
.github/workflows/test.yml) to auto-run linting on the frontend (ESLint, Prettier in place) and validate Docker builds across all three variants (standard, CUDA, Apple Silicon).
⭐Top contributors
Click to expand
- @kevin-mindverse — 43 commits
- @ryangyuan — 16 commits
- @ScarletttMoon — 10 commits
- @yingapple — 8 commits
- @yexiangle — 8 commits
📝Recent commits
Click to expand
d0e4025— Update README.md (#403) (yingapple)179c52e— Merge pull request #349 from mindverse/develop (kevin-mindverse)56e6767— Merge branch 'master' into develop (kevin-mindverse)601f371— change log (#346) (kevin-mindverse)a65f4a5— hotfix for strage problem which leads to the error during graphrag. (#347) (yingapple)8ada304— feat: add cloud deployment options (#334) (kevinaimonster)da2b704— fix(model tokenizer): just use the model tokenizer without anythink else. (yingapple)bec5b88— fix(deployment): resolve model deployment issue on CUDA + Windows environment (yingapple)1e7d607— Feat/ Enhance Issue Templates (#333) (kevin-mindverse)c3855f3— Fix/0429/fix all log (#318) (ryangyuan)
🔒Security observations
- Critical · Exposed Environment File in Docker Volume —
docker-compose.yml, line with './.env:/app/.env'. The .env file is mounted directly into the Docker container as a volume (./.env:/app/.env). This file likely contains sensitive credentials, API keys, and database passwords. If the container is compromised or the host is accessed, all secrets are exposed. Fix: Use Docker secrets or a secrets management system (HashiCorp Vault, AWS Secrets Manager). Never mount .env files directly. Instead, load secrets through environment variable interpolation or use docker-compose secrets feature. - High · Outdated Python Dependencies with Known Vulnerabilities —
integrate/requirements.txt and dependencies section. The dependencies list includes python-dotenv==0.19.0 (released Feb 2021) and wxpy==0.3.9.8 (deprecated). These versions contain known security vulnerabilities and lack security patches. torch>=1.8.0 and transformers>=4.5.0 are acceptable but should be regularly updated. Fix: Update to latest stable versions: python-dotenv>=1.0.0, upgrade torch and transformers to latest versions. Remove wxpy or replace with maintained alternatives. Implement dependency scanning in CI/CD. - High · Multiple Exposed Ports Without Security Controls —
docker-compose.yml, ports section. Docker compose exposes ports 8002 and 8080 without any network segmentation, TLS/SSL, or authentication. The backend API is directly accessible to anyone who can reach the host network. Fix: Implement a reverse proxy (nginx/Traefik) with TLS/SSL termination. Restrict port access via firewall rules. Implement API authentication (JWT, API keys). Use network policies to isolate container communication. - High · Incomplete Security Policy —
SECURITY.md. SECURITY.md file is a template with placeholder text. It doesn't provide actual vulnerability disclosure procedures, security contacts, or response timelines. This violates security best practices and makes it impossible for security researchers to report vulnerabilities responsibly. Fix: Complete the SECURITY.md file with: specific security contact email, vulnerability disclosure timeline, supported versions for security updates, and acknowledgment process. Include a bug bounty program if applicable. - Medium · Excessive Memory Limits in Docker Configuration —
docker-compose.yml, deploy/resources section. Container memory limit is set to 64GB with reservation of 6GB. This could allow denial-of-service attacks if the application has memory leak vulnerabilities. Large memory allocation increases blast radius of potential exploits. Fix: Set realistic memory limits based on actual application requirements (typically 2-8GB for similar applications). Implement memory monitoring and alerts. Add resource requests/limits for all containers. - Medium · Insufficient Input Validation Controls —
integrate/wechat_bot.py and backend API endpoints. The codebase includes integration with WeChat bot (wechat_bot.py) and multiple API endpoints, but no visible input validation framework mentioned. Given the AI/LLM nature, there's risk of prompt injection and data exfiltration. Fix: Implement strict input validation and sanitization for all external inputs. Use parameterized queries for database operations. Implement prompt injection detection for LLM inputs. Add rate limiting and request size restrictions. - Medium · No HTTPS/TLS Configuration Visible —
docker-compose.yml and frontend/backend configuration. Docker compose configuration shows plaintext HTTP ports (8002, 8080) with no TLS/SSL termination visible. Communication between frontend and backend is unencrypted. Fix: Implement TLS/SSL for all communications. Use Let's Encrypt certificates for production. Configure HSTS headers. Implement certificate pinning for sensitive APIs. - Medium · Potential SQL Injection Risk —
docker/sqlite/init.sql and backend database layer. Docker initialization script (docker/sqlite/init.sql) exists but implementation details are not visible. SQLite database in /app/data volume could be vulnerable if ORM or query builders are not properly used in the backend. Fix: Use ORM frameworks (SQLAlchemy, Tortoise ORM) instead of raw SQL. Implement parameterized queries. Add SQL query logging and monitoring. Regular security audits of database layer.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.