mindverse/Second-Me

Item: mindverse/Second-Me
Rating: 3
Author: RepoPilot

Train your AI self, amplify you, bridge the world

Mixed

Slowing — last commit 7mo ago

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isMixed

last commit was 7mo ago; no CI workflows detected

✓Last commit 7mo ago
✓15 active contributors
✓Distributed ownership (top contributor 43% of recent commits)
✓Apache-2.0 licensed
✓Tests present
⚠Slowing — last commit 7mo ago
⚠No CI workflows detected

What would change the summary?

→Deploy as-is Mixed → Healthy if: 1 commit in the last 180 days

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Safe to depend on" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Safe to depend on](https://repopilot.app/api/badge/mindverse/second-me?axis=dependency)](https://repopilot.app/r/mindverse/second-me)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/mindverse/second-me on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: mindverse/Second-Me

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/mindverse/Second-Me shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Slowing — last commit 7mo ago

Last commit 7mo ago
15 active contributors
Distributed ownership (top contributor 43% of recent commits)
Apache-2.0 licensed
Tests present
⚠ Slowing — last commit 7mo ago
⚠ No CI workflows detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live mindverse/Second-Me repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/mindverse/Second-Me.

What it runs against: a local clone of mindverse/Second-Me — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in mindverse/Second-Me | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 249 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>mindverse/Second-Me</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of mindverse/Second-Me. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/mindverse/Second-Me.git
#   cd Second-Me
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of mindverse/Second-Me and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "mindverse/Second-Me(\\.git)?\\b" \\
  && ok "origin remote is mindverse/Second-Me" \\
  || miss "origin remote is not mindverse/Second-Me (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "lpm_frontend/src/app/dashboard/page.tsx" \\
  && ok "lpm_frontend/src/app/dashboard/page.tsx" \\
  || miss "missing critical file: lpm_frontend/src/app/dashboard/page.tsx"
test -f "lpm_frontend/package.json" \\
  && ok "lpm_frontend/package.json" \\
  || miss "missing critical file: lpm_frontend/package.json"
test -f "Dockerfile.backend" \\
  && ok "Dockerfile.backend" \\
  || miss "missing critical file: Dockerfile.backend"
test -f ".env" \\
  && ok ".env" \\
  || miss "missing critical file: .env"
test -f "docker-compose.yml" \\
  && ok "docker-compose.yml" \\
  || miss "missing critical file: docker-compose.yml"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 249 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~219d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/mindverse/Second-Me"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Second Me is an open-source AI self-training platform that lets users create a personalized AI clone trained on their own memories and context using Hierarchical Memory Modeling (HMM) and Me-Alignment algorithms. It combines locally-trained AI models (using llama.cpp, transformers, torch) with a decentralized network layer, allowing users to deploy their AI identity across multiple applications while maintaining data privacy and control. Monorepo with clear separation: lpm_frontend/ contains the Next.js TypeScript frontend (with ESLint, Prettier, Stylelint config), backend Python code inferred from root directory structure, Docker orchestration via docker-compose.yml (standard and GPU variants), and plugin/integration points under integrate/ (WeChat bot). Dependencies and model files bundled locally in /dependencies/ rather than fetched from package registries.

👥Who it's for

AI researchers, privacy-conscious technologists, and domain experts who want to build AI agents that authentically represent them without surrendering data to centralized platforms like OpenAI. Also appeals to developers building AI-native applications that need personalized context and identity-preserving AI interactions.

🌱Maturity & risk

Early-stage but actively developed: the codebase has substantial Python (1.8M) and TypeScript (464K) lines, includes Docker multi-variant support (CUDA, Apple Silicon, standard), and references peer-reviewed papers (arXiv links in README for AI-native Memory 1.0 and 2.0). However, the presence of /dependencies/ with custom graphrag and llama.cpp packages, along with the experimental .dev27 version tag, suggests this is still in active research/development rather than production-stable.

Moderate risk: the project depends on heavily customized forks (graphrag-modified.tar.gz, llama.cpp.zip in dependencies/) rather than published packages, creating upgrade and reproducibility challenges. The WeChat bot integration (integrate/wechat_bot.py) and Docker GPU support (three separate Dockerfiles) indicate complexity; GPU CUDA support adds brittleness across hardware. No visible CI/CD pipeline or automated test suite in the file list suggests manual testing burden.

Active areas of work

Active research direction toward AI-native Memory 2.0 (referenced in README badges). Recent work includes hierarchical memory modeling and the Me-Alignment algorithm for identity preservation. Multi-platform Docker support suggests ongoing infrastructure investment. Documentation in progress: docs/ folder contains guides for custom model config (Ollama), embedding model switching, and local vs. public chat APIs.

🚀Get running

git clone https://github.com/mindverse/Second-Me.git
cd Second-Me
cp .env.example .env  # configure API keys and model paths
make install         # if Makefile has this target
# For frontend: cd lpm_frontend && npm install && npm run dev
# For backend: python -m pip install -r requirements.txt && python main.py
# Or use Docker: docker-compose up (or docker-compose-gpu.yml for CUDA)

Daily commands:

# Development with Docker Compose:
docker-compose up
# or with GPU support:
docker-compose -f docker-compose-gpu.yml up

# Frontend (Next.js):
cd lpm_frontend && npm run dev  # runs on http://localhost:3000

# Backend (if running standalone):
python app.py  # or equivalent entrypoint inferred from structure

🗺️Map of the codebase

lpm_frontend/src/app/dashboard/page.tsx — Main dashboard entry point; every contributor must understand the primary user interface and navigation structure
lpm_frontend/package.json — Defines Next.js, React, and all frontend dependencies; required to set up local development environment
Dockerfile.backend — Core backend containerization; essential for understanding deployment and service initialization
.env — Environment configuration for API keys and service endpoints; critical for running both frontend and backend
docker-compose.yml — Orchestrates all services (frontend, backend, database, cache); the primary way to run the full stack
integrate/wechat_bot.py — Integration point for WeChat; shows how external platforms connect to Second-Me's core AI training and inference
lpm_frontend/src/app/dashboard/train/page.tsx — AI self-training interface; central to the product's core value proposition of personal AI amplification

🧩Components & responsibilities

Next.js Frontend (lpm_frontend) — Render dashboard pages, handle user input, call backend APIs, stream

🛠️How to make changes

Add a new Training Page

Create a new .tsx file in lpm_frontend/src/app/dashboard/train/[feature]/page.tsx following the structure of identity or memories (lpm_frontend/src/app/dashboard/train/page.tsx)
Add navigation link in the train hub page to reference your new feature (lpm_frontend/src/app/dashboard/train/page.tsx)
Import UI components and integrate with backend API routes (if needed) (lpm_frontend/src/app/dashboard/layout.tsx)

Add a new Backend Service (Docker)

Create a new Dockerfile or extend Dockerfile.backend with your service initialization (Dockerfile.backend)
Add a new service definition in docker-compose.yml with proper environment variables and volume mounts (docker-compose.yml)
Update .env with any required API keys or service endpoints for the new service (.env)

Add a new External Integration

Create a new Python module in integrate/ folder following the pattern of wechat_bot.py (integrate/wechat_bot.py)
Add dependencies to integrate/requirements.txt (integrate/requirements.txt)
Implement message handling that sends user input to the backend AI endpoint and streams responses (integrate/wechat_bot.py)

Add a new Application Page (API/MCP/Network)

Create page.tsx in lpm_frontend/src/app/dashboard/applications/[category]/page.tsx (lpm_frontend/src/app/dashboard/applications/page.tsx)
Add layout and navigation in the applications layout file (lpm_frontend/src/app/dashboard/applications/layout.tsx)
Wire up backend API routes or third-party service calls as needed (lpm_frontend/src/app/dashboard/applications/api-mcp/page.tsx)

🔧Why these technologies

Next.js 14 + React 18 — Modern SSR/SSG framework for fast, SEO-friendly dashboard UI with TypeScript type safety
Python + PyTorch + Transformers — Industry-standard stack for LLM fine-tuning and inference; supports CUDA for GPU acceleration
GraphRAG (Custom Fork) — Enables AI-native memory indexing via knowledge graphs; core innovation for personal AI training
Chroma Vector DB — Fast semantic search for retrieving relevant memories during inference; supports embeddings at scale
SQLite + Docker Volumes — Lightweight persistent storage for user data and training artifacts; easy local development
Docker & docker-compose — Reproducible multi-service deployment; isolates frontend, backend, and database; supports CPU/GPU variants

⚖️Trade-offs already made

Local-first LLM inference (Ollama/llama.cpp) vs cloud API
- Why: Privacy-preserving personal AI that doesn't send user data to third parties; user independence
- Consequence: Requires more compute resources locally; slower inference than large cloud models; complexity in GPU setup
GraphRAG for memory indexing vs flat vector store
- Why: Structured knowledge graphs better capture entity relationships and enable reasoning
- Consequence: Higher implementation complexity; slower indexing; requires custom dependency management
SQLite for main storage vs distributed database
- Why: Simple setup for single-user or small teams; direct file-based persistence
- Consequence: No horizontal scaling; limited concurrency; not suitable for multi-tenant SaaS
Docker multi-variant (CPU, CUDA, Apple Metal)
- Why: Supports heterogeneous hardware (cloud, gaming GPUs, M-series Macs)
- Consequence: Maintenance burden for multiple Dockerfiles; build complexity; potential runtime inconsistencies

🚫Non-goals (don't propose these)

Real-time collaborative editing (single-user focus)
Multi-tenant SaaS platform (personal AI, not hosted service)
Web-scale distributed training (local fine-tuning only)
Mobile app (web-only; Next.js frontend)

🪤Traps & gotchas

Custom dependency packages: /dependencies/ contains graphrag-modified.tar.gz and llama.cpp.zip—not fetched from PyPI; you must extract and reference these locally or modify pip install paths. 2. GPU CUDA rebuild: docker/app/rebuild_llama_cuda.sh exists; if using GPU, CUDA 11.8+ and cuDNN must match your host system. 3. Environment secrets: .env is not in .gitignore listing; ensure you never commit API keys or model paths. 4. ChromaDB + SQLite: Docker SQL init at docker/sqlite/init.sql—if migrating databases, manual schema sync required. 5. Multiple Dockerfiles: Choose the right variant (standard, CUDA, Apple Silicon) upfront; mixing can cause library conflicts.

🏗️Architecture

💡Concepts to learn

Hierarchical Memory Modeling (HMM) — Core algorithm in Second Me that structures memories at multiple levels of abstraction for efficient retrieval and identity representation—critical to understanding how the AI self is trained
Me-Alignment Algorithm — Second Me's proprietary method to align model outputs with the user's authentic identity and values—essential for the 'self' in 'Second Me'
Vector Embeddings & Semantic Search — Second Me uses ChromaDB for vector storage of memories; understanding embeddings is crucial for customizing how the AI retrieves and ranks relevant context
Quantization (llama.cpp) — Second Me bundles llama.cpp for running large models locally on consumer hardware via quantized weights—understanding this is key to GPU/CPU trade-offs
Knowledge Graph Extraction (GraphRAG) — Second Me's customized GraphRAG fork structures memories as graphs rather than flat text—enables richer relational queries about the user's identity
Decentralized AI Identity / DID — Second Me's vision to deploy AI selves across a network as portable digital identities; understanding this shapes how you design extensibility and privacy controls
Docker Multi-Architecture Builds — Three separate Dockerfiles (standard, CUDA, Apple Silicon) highlight the complexity of supporting different hardware—important for understanding deployment flexibility and pain points

lm-sys/FastChat — Similar open-source LLM serving framework for local model deployment; Second Me could adopt FastChat's inference optimization patterns
ollama/ollama — Lightweight local LLM runner that Second Me explicitly integrates with via docs/Custom Model Config(Ollama).md—natural companion for model management
chroma-core/chroma — Vector database that Second Me uses for memory storage (docker/app/init_chroma.py); core dependency for semantic search
microsoft/graphrag — Knowledge graph extraction framework; Second Me forks and customizes graphrag (dependencies/graphrag-modified.tar.gz) for identity-aware memory indexing
vercel/next.js — Frontend framework powering lpm_frontend/; Second Me's UI depends on Next.js patterns and configuration

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add GitHub Actions CI/CD workflow for multi-platform Docker builds

The repo has multiple Dockerfiles (backend, backend.apple, backend.cuda) and docker-compose variants (gpu/cpu) but no GitHub Actions workflow to validate builds. This prevents regressions when contributors modify dependencies or Docker configurations. A workflow should test all three backend Dockerfiles and the frontend build.

[ ] Create .github/workflows/docker-build.yml to build and test Dockerfile.backend, Dockerfile.backend.apple, and Dockerfile.backend.cuda
[ ] Add step to validate docker-compose.yml and docker-compose-gpu.yml syntax
[ ] Add step to build lpm_frontend using Dockerfile.frontend
[ ] Test that docker/app/init_chroma.py and docker/app/check_gpu_support.sh execute without errors in container context
[ ] Document in CONTRIBUTING.md that all Docker builds must pass before merging

Create comprehensive integration tests for wechat_bot.py with mocking

The integrate/wechat_bot.py module exists with dependencies (wxpy, python-dotenv) but has no visible test suite. Given WeChat bot integration is a critical user-facing feature, this needs tests to prevent breaking changes when refactoring the backend API or message handling logic.

[ ] Create integrate/tests/ directory with init.py
[ ] Write integrate/tests/test_wechat_bot.py with mocked wxpy.Bot to test message receive/send flow
[ ] Add test for .env variable loading (OPENAI_API_KEY, etc.) via python-dotenv
[ ] Add test to verify bot correctly formats prompts sent to backend API (likely POST requests based on docs)
[ ] Update integrate/requirements.txt to include pytest and pytest-mock development dependencies

Add API contract tests for backend endpoints referenced in docs/

The docs/ folder contains guides for 'Local Chat API' and 'Public Chat API' but there are no visible tests validating the backend API contracts. Contributors may inadvertently break API endpoints when modifying the Python backend. Tests should validate request/response schemas.

[ ] Create tests/ directory at repo root (or backend tests/) with test_chat_api.py
[ ] Write tests for endpoints documented in docs/Local Chat API.md (likely /chat or /complete endpoints)
[ ] Write tests for endpoints documented in docs/Public Chat API.md
[ ] Add test to verify embedding model switching works (docs/Embedding Model Switching.md references this feature)
[ ] Add pytest configuration (pytest.ini or pyproject.toml) and integrate with GitHub Actions workflow from Idea #1

🌿Good first issues

Add comprehensive test suite for the Me-Alignment algorithm in the Python backend—currently no /tests/ or test files visible in the listing, which is risky for a research codebase that other users rely on.
Document the ChromaDB + embedding model integration with concrete code examples in docs/Embedding Model Switching.md—file exists but likely needs step-by-step walkthrough for users not familiar with vector databases.
Create a GitHub Actions CI/CD workflow file (.github/workflows/test.yml) to auto-run linting on the frontend (ESLint, Prettier in place) and validate Docker builds across all three variants (standard, CUDA, Apple Silicon).

⭐Top contributors

Click to expand

@kevin-mindverse — 43 commits
@ryangyuan — 16 commits
@ScarletttMoon — 10 commits
@yingapple — 8 commits
@yexiangle — 8 commits

📝Recent commits