mufeedvh/code2prompt

Item: mufeedvh/code2prompt
Rating: 5
Author: RepoPilot

A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 3w ago
✓8 active contributors
✓Distributed ownership (top contributor 49% of recent commits)

Show all 6 evidence items →

✓MIT licensed
✓CI configured
✓Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/mufeedvh/code2prompt)](https://repopilot.app/r/mufeedvh/code2prompt)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/mufeedvh/code2prompt on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: mufeedvh/code2prompt

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/mufeedvh/code2prompt shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 3w ago
8 active contributors
Distributed ownership (top contributor 49% of recent commits)
MIT licensed
CI configured
Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live mufeedvh/code2prompt repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/mufeedvh/code2prompt.

What it runs against: a local clone of mufeedvh/code2prompt — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in mufeedvh/code2prompt | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 54 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>mufeedvh/code2prompt</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of mufeedvh/code2prompt. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/mufeedvh/code2prompt.git
#   cd code2prompt
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of mufeedvh/code2prompt and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "mufeedvh/code2prompt(\\.git)?\\b" \\
  && ok "origin remote is mufeedvh/code2prompt" \\
  || miss "origin remote is not mufeedvh/code2prompt (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "crates/code2prompt-core/src/lib.rs" \\
  && ok "crates/code2prompt-core/src/lib.rs" \\
  || miss "missing critical file: crates/code2prompt-core/src/lib.rs"
test -f "crates/code2prompt-core/src/session.rs" \\
  && ok "crates/code2prompt-core/src/session.rs" \\
  || miss "missing critical file: crates/code2prompt-core/src/session.rs"
test -f "crates/code2prompt-core/src/template.rs" \\
  && ok "crates/code2prompt-core/src/template.rs" \\
  || miss "missing critical file: crates/code2prompt-core/src/template.rs"
test -f "crates/code2prompt/src/main.rs" \\
  && ok "crates/code2prompt/src/main.rs" \\
  || miss "missing critical file: crates/code2prompt/src/main.rs"
test -f "crates/code2prompt-core/src/file_processor/mod.rs" \\
  && ok "crates/code2prompt-core/src/file_processor/mod.rs" \\
  || miss "missing critical file: crates/code2prompt-core/src/file_processor/mod.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 54 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~24d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/mufeedvh/code2prompt"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Code2Prompt is a Rust-based CLI tool that converts entire codebases into single, LLM-optimized prompts with automatic source tree generation, Handlebars-based prompt templating, and token counting via tiktoken-rs. It solves the context-engineering problem of preparing code for AI models by handling file filtering, encoding detection, git integration, and multiple output formats (Markdown, XML, JSON). Monorepo with three crates: crates/code2prompt-core (473KB Rust, core engine), crates/code2prompt (CLI wrapper), and crates/code2prompt-python (PyO3 bindings). Core modules: file_processor/ (CSV, TSV, JSON-L, Jupyter notebooks), configuration.rs (config parsing), template.rs (Handlebars rendering), tokenizer.rs (token counting), filter.rs/sort.rs/selection.rs (file filtering logic), and git.rs (git integration).

👥Who it's for

AI engineers and developers building AI agents who need to efficiently package codebases as LLM context; users of Claude, ChatGPT, or local models who manually prepare prompts; MCP (Model Context Protocol) server builders; and Python SDK users via the PyPI-distributed code2prompt-rs package.

🌱Maturity & risk

Actively maintained with 473KB of Rust code and published on crates.io + PyPI. Has CI/CD pipelines (ci.yml, release.yml, website.yml), Discord community, and a documented website. The monorepo structure with core + CLI + Python bindings suggests production-ready tooling, though commit recency and open issue count would indicate ongoing development pace.

Low risk: single-author (mufeedvh) but has established release pipeline and package distribution. Moderate dependency count (tiktoken-rs, git2 with vendored OpenSSL/libgit2, handlebars, anyhow, etc.) mitigates supply-chain risk. The vendored git2 dependencies add build complexity but improve portability. No visible indication of stalled development.

Active areas of work

Active CI/CD with release automation; website generation via Astro (33KB); template library with 10+ specialized prompts (refactor, bug-fixing, security audits, CTF solvers, documentation); Python SDK distribution via PyPI; likely ongoing feature expansion given Discord community presence.

🚀Get running

Clone the repo, then build the CLI: git clone https://github.com/mufeedvh/code2prompt && cd code2prompt && cargo install --path crates/code2prompt. Or use pre-built: cargo install code2prompt or brew install code2prompt or pip install code2prompt-rs.

Daily commands: CLI: code2prompt . --output-file prompt.txt or code2prompt path/to/project. Copy to clipboard: code2prompt .. SDK: python -c "from code2prompt import Code2Prompt; c = Code2Prompt('.'); print(c.generate_prompt())". Dev: cargo run --release -- .

🗺️Map of the codebase

crates/code2prompt-core/src/lib.rs — Core library entry point; defines the primary API and orchestrates all prompt generation functionality.
crates/code2prompt-core/src/session.rs — Session management that coordinates file processing, filtering, and prompt generation; central workflow logic.
crates/code2prompt-core/src/template.rs — Template engine for prompt rendering using Handlebars; critical for output customization.
crates/code2prompt/src/main.rs — CLI entry point; routes all user commands and initializes the core session pipeline.
crates/code2prompt-core/src/file_processor/mod.rs — File processor dispatcher; handles content detection and delegates to format-specific processors (CSV, JSON, Jupyter, etc.).
crates/code2prompt-core/src/configuration.rs — Configuration schema and loader; defines all tool settings and .c2pconfig parsing.
crates/code2prompt-core/src/tokenizer.rs — Token counting for multiple LLM models; critical for estimating prompt size before submission.

🛠️How to make changes

Add support for a new file format

Create a new processor module in crates/code2prompt-core/src/file_processor/ named after the format (e.g., yaml.rs). (crates/code2prompt-core/src/file_processor/yaml.rs)
Implement the FileProcessor trait with process_file() and is_applicable() methods. (crates/code2prompt-core/src/file_processor/yaml.rs)
Register the new processor in file_processor/mod.rs by adding it to the dispatcher logic. (crates/code2prompt-core/src/file_processor/mod.rs)
Add integration tests in crates/code2prompt-core/tests/file_processor_test.rs covering your format. (crates/code2prompt-core/tests/file_processor_test.rs)

Add a new LLM prompt template

Create a new Handlebars template file in crates/code2prompt-core/templates/ (e.g., my-task.hbs). (crates/code2prompt-core/templates/my-task.hbs)
Use Handlebars syntax and access to context variables like {{repo_tree}}, {{files}}, {{total_tokens}}. (crates/code2prompt-core/src/template.rs)
If adding a built-in template, register it in crates/code2prompt-core/src/builtin_templates.rs. (crates/code2prompt-core/src/builtin_templates.rs)
Test template rendering in crates/code2prompt-core/tests/template_test.rs. (crates/code2prompt-core/tests/template_test.rs)

Add a new CLI command or option

Define new command struct in crates/code2prompt/src/args.rs using clap derive macros. (crates/code2prompt/src/args.rs)
Implement command handler in crates/code2prompt/src/model/commands.rs to orchestrate the session. (crates/code2prompt/src/model/commands.rs)
Route the command in crates/code2prompt/src/main.rs based on CLI args. (crates/code2prompt/src/main.rs)
Update configuration schema in crates/code2prompt-core/src/configuration.rs if needed. (crates/code2prompt-core/src/configuration.rs)

Add support for a new LLM model tokenizer

Open crates/code2prompt-core/src/tokenizer.rs where the model enumeration and tokenizer implementations live. (crates/code2prompt-core/src/tokenizer.rs)
Add your model to the Model enum and implement token counting logic (using available token counting crates like tiktoken-rs). (crates/code2prompt-core/src/tokenizer.rs)
Add tests to crates/code2prompt-core/tests/ verifying token count accuracy against the official tokenizer. (crates/code2prompt-core/tests/tokenizer_test.rs)

🔧Why these technologies

Rust + Cargo workspace — Type-safe, zero-cost abstractions, and fast compilation enable reliable CLI tools; workspace structure separates core logic from CLI and Python bindings.
Handlebars templating — Flexible, logic-minimal template syntax allows users to customize LLM prompts without code changes; supports arbitrary context variables.
Clap for CLI parsing — Declarative, derive-based argument parsing with automatic help/validation; integrates well with Rust's type system.
PyO3 for Python bindings — Enables Python SDK (code2prompt-python) without duplicating core

🪤Traps & gotchas

No obvious environment variable requirements in the config data. git2's vendored libgit2/OpenSSL adds build time (~1-2 min); ensure Rust toolchain is up-to-date. Clipboard integration (arboard) may fail on headless systems—check DISPLAY/Wayland setup. Token counting via tiktoken-rs requires downloading model vocabularies on first use (cached locally). Custom templates must be valid Handlebars or rendering will fail with cryptic errors. The .c2pconfig file format is TOML but underdocumented in the README snippet.

🏗️Architecture

💡Concepts to learn

Context Engineering / Prompt Engineering — The domain problem Code2Prompt solves; understanding how to structure code context for LLM consumption directly impacts AI output quality

chroma-core/chroma — Vector database used by LLM pipelines to index and retrieve code snippets—complementary to Code2Prompt for RAG workflows
anthropics/anthropic-sdk-python — Official SDK for Claude API; Code2Prompt prompts are often fed into Claude, making this a natural integration target
openai/gpt-tokenizer — OpenAI's tokenizer; Code2Prompt uses tiktoken-rs as the Rust equivalent for the same token-counting task
github/gitignore — Community .gitignore templates; Code2Prompt respects these patterns, and users often reference this repo to understand filtering behavior
mozilla/stanza — NLP parsing library; potential alternative for more sophisticated code context extraction beyond current file-level processing

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive tests for file_processor modules (CSV, TSV, JSONL, Jupyter)

The repo has specialized file processors for CSV, TSV, JSONL, and Jupyter notebooks (crates/code2prompt-core/src/file_processor/), but file_processor_test.rs appears minimal. These processors handle format-specific parsing and edge cases (e.g., escaped delimiters in CSV, cell extraction in Jupyter) that warrant dedicated unit tests. This improves reliability for users processing diverse file types and makes future refactoring safer.

[ ] Examine existing file_processor_test.rs to understand current coverage
[ ] Add test cases for csv.rs: quoted fields, escaped characters, empty cells, malformed rows
[ ] Add test cases for tsv.rs: tab handling, quoted values, unicode whitespace
[ ] Add test cases for jsonl.rs: multiple JSON objects per line, invalid JSON handling, missing newlines
[ ] Add test cases for ipynb.rs: cell content extraction, metadata preservation, notebook format variations
[ ] Verify tests pass and add documentation comment explaining edge cases covered

Add integration tests for template rendering with actual LLM prompts (smoke tests)

The repo includes 11 built-in templates (document-the-code, refactor, fix-bugs, security audits, CTF solvers, etc.) in crates/code2prompt-core/templates/, but there's no integration test validating that these templates render correctly with real codebase inputs. Adding smoke tests ensures template variables are correctly bound and that prompt generation doesn't fail on typical codebases. This catches template regressions early.

[ ] Create a new test file: crates/code2prompt-core/tests/template_rendering_integration_test.rs
[ ] Set up a minimal sample codebase fixture (e.g., a small Rust/Python/JS project structure)
[ ] For each template in crates/code2prompt-core/templates/, render it with the sample codebase
[ ] Verify output contains expected sections (source tree, file contents, custom template instructions)
[ ] Verify token counts are calculated (using default tokenizer) and are non-zero
[ ] Add assertions for XML and Markdown format variants (referencing default_template_*.hbs)

Add CI workflow for Python SDK (crates/code2prompt-python) to validate PyPI publishability

The repo has a Python SDK in crates/code2prompt-python/ with pyproject.toml and PyPI publication (seen in README badges), but there's no dedicated CI workflow validating Python builds, tests, or pre-release checks. The existing ci.yml likely focuses on Rust. This gap means Python SDK breakages could slip through. Add a workflow to build the wheel, run Python tests, and validate pyproject.toml metadata.

[ ] Review .github/workflows/ci.yml to confirm Python testing is missing or incomplete
[ ] Create .github/workflows/python-ci.yml with matrix for Python 3.9-3.12 (aligned with pyo3 abi3-py312 in Cargo.toml)
[ ] Add step to build Python wheel using 'maturin build' in crates/code2prompt-python/
[ ] Add step to validate pyproject.toml and metadata (using 'twine check' or 'build' module)
[ ] Add step to run any Python tests in crates/code2prompt-python/python-sdk/ (create if missing)
[ ] Verify workflow runs on PR and push to main branch; include build output artifacts

🌿Good first issues

Add tokenizer tests for edge cases (very long lines, unusual encodings, Unicode): crates/code2prompt-core/src/tokenizer.rs currently lacks unit tests for boundary conditions
Document the .c2pconfig schema and add examples: the repo lacks a formal spec for configuration options and template variables available in Handlebars contexts
Implement missing file processor for XML files: file_processor/ has CSV, TSV, JSON-L, and Jupyter but no structured XML handler despite XML being a common template output format

⭐Top contributors

Click to expand

@ODAncona — 49 commits
[@Olivier D'Ancona](https://github.com/Olivier D'Ancona) — 22 commits
@dependabot[bot] — 21 commits
@HadiCherkaoui — 4 commits
@yulonglin — 1 commits

📝Recent commits

Click to expand

b1cb9b8 — Merge pull request #314 from mufeedvh/dependabot/github_actions/main/softprops/action-gh-release-3 (ODAncona)
e4ca7e4 — Bump softprops/action-gh-release from 2 to 3 (dependabot[bot])
29b7873 — Merge pull request #310 from HadiCherkaoui/feat/xml-output-format (ODAncona)
0cd1dbd — Update crates/code2prompt-core/src/default_template_md.hbs (HadiCherkaoui)
db86efb — fix: register extension and no_codeblock as built-in template identifiers (HadiCherkaoui)
0b9993c — refactor: remove dead params from wrap_code_block (HadiCherkaoui)
8c4cc00 — feat: delegate code-block fencing from wrap_code_block to templates (HadiCherkaoui)
e73c34d — Merge pull request #309 from yulonglin/patch-1 (ODAncona)
465a2cb — Fix argument typo in quickstart: --output-file (yulonglin)
0250ffa — Merge pull request #285 from mufeedvh/dependabot/cargo/main/tokio-1.49.0 (ODAncona)

🔒Security observations

The codebase demonstrates good security practices overall with modern Rust ecosystem dependencies and workspace structure. The main security concerns are: (1) potential lag in vendored OpenSSL updates, (2) dynamic template handling without explicit security boundaries, (3) git repository processing without validation, and (4) clipboard access without safety mechanisms. No critical vulnerabilities or hardcoded secrets were identified in the file structure analysis. The project uses well-maintained dependencies with regular updates (Tokio, Serde, etc.). Recommendations focus on implementing input validation, sandboxing template processing, and documenting security implications of clipboard and git repository access.

Medium · Vendored OpenSSL with Potential Update Lag — Cargo.toml - workspace.dependencies (git2 with vendored-openssl feature). The git2 dependency uses 'vendored-openssl' feature, which bundles OpenSSL. While this improves portability, vendored dependencies may lag behind upstream security patches. If a vulnerability is discovered in OpenSSL, it may take time for git2 to update and rebuild the vendored version. Fix: Monitor git2 releases for security updates and maintain a regular update schedule. Consider using native SSL implementations where possible or ensure CI/CD pipeline tracks OpenSSL CVEs.
Medium · Unvetted Third-party Template Handling — crates/code2prompt-core/src/template.rs and template files in crates/code2prompt-core/templates/. The codebase handles user-provided Handlebars templates which are rendered dynamically. If template content comes from untrusted sources without proper sanitization, it could lead to template injection or information disclosure attacks. Fix: Implement input validation and sanitization for user-provided templates. Consider using a sandboxed template engine or restricting template syntax to prevent arbitrary code execution. Document security implications of template usage.
Medium · Git Repository Access Without Validation — crates/code2prompt-core/src/git.rs. The git2 integration in crates/code2prompt-core/src/git.rs processes git repositories. Malicious git hooks or repository metadata could potentially be exploited if not properly validated before execution. Fix: Validate git repository state before processing. Disable automatic git hook execution. Consider running git operations in a sandboxed environment or with restricted permissions. Validate repository structure and metadata.
Low · Clipboard Access Functionality — crates/code2prompt/src/clipboard.rs. The clipboard feature (crates/code2prompt/src/clipboard.rs) reads from and writes to system clipboard. While not inherently vulnerable, clipboard access can expose sensitive data if the clipboard contains confidential information or if the application is compromised. Fix: Implement optional clipboard access with user confirmation. Add warnings about clipboard content exposure. Consider adding clipboard data sanitization or temporary clipboard clearing after use. Document privacy implications.
Low · Character Encoding Detection Library — Cargo.toml - chardetng dependency. The chardetng library is used for encoding detection. While generally safe, automatic encoding detection can be exploited in edge cases to bypass security filters or cause unexpected behavior with specially crafted files. Fix: Keep chardetng and other encoding-related dependencies updated. Implement additional validation for detected encodings. Consider explicitly specifying supported encodings instead of auto-detection where possible.
Low · Debug Build Panic Configuration — Cargo.toml - [profile.release]. The release profile uses 'panic = abort' which is good for security, but no explicit panic handling strategy is defined. This could potentially lead to information leakage through panic messages in error scenarios. Fix: Implement comprehensive panic hook handlers to sanitize error messages before display. Avoid exposing stack traces or internal paths in production. Consider using a custom panic handler that logs safely.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

mufeedvh/code2prompt

Embed the "Healthy" badge

Onboarding doc

Onboarding: mufeedvh/code2prompt

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add support for a new file format

Add a new LLM prompt template

Add a new CLI command or option

Add support for a new LLM model tokenizer

🔧Why these technologies

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive tests for file_processor modules (CSV, TSV, JSONL, Jupyter)

Add integration tests for template rendering with actual LLM prompts (smoke tests)

Add CI workflow for Python SDK (crates/code2prompt-python) to validate PyPI publishability

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next