RepoPilotOpen in app →

XAMPPRocky/tokei

Count your code, quickly.

Healthy

Healthy across the board

weakest axis
Use as dependencyConcerns

non-standard license (Other)

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 1d ago
  • 67+ active contributors
  • Distributed ownership (top contributor 9% of recent commits)
Show all 7 evidence items →
  • Other licensed
  • CI configured
  • Tests present
  • Non-standard license (Other) — review terms
What would change the summary?
  • Use as dependency ConcernsMixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/xampprocky/tokei)](https://repopilot.app/r/xampprocky/tokei)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/xampprocky/tokei on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: XAMPPRocky/tokei

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/XAMPPRocky/tokei shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 1d ago
  • 67+ active contributors
  • Distributed ownership (top contributor 9% of recent commits)
  • Other licensed
  • CI configured
  • Tests present
  • ⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live XAMPPRocky/tokei repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/XAMPPRocky/tokei.

What it runs against: a local clone of XAMPPRocky/tokei — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in XAMPPRocky/tokei | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>XAMPPRocky/tokei</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of XAMPPRocky/tokei. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/XAMPPRocky/tokei.git
#   cd tokei
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of XAMPPRocky/tokei and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "XAMPPRocky/tokei(\\.git)?\\b" \\
  && ok "origin remote is XAMPPRocky/tokei" \\
  || miss "origin remote is not XAMPPRocky/tokei (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "src/main.rs" \\
  && ok "src/main.rs" \\
  || miss "missing critical file: src/main.rs"
test -f "src/language/language_type.rs" \\
  && ok "src/language/language_type.rs" \\
  || miss "missing critical file: src/language/language_type.rs"
test -f "src/language/mod.rs" \\
  && ok "src/language/mod.rs" \\
  || miss "missing critical file: src/language/mod.rs"
test -f "src/stats.rs" \\
  && ok "src/stats.rs" \\
  || miss "missing critical file: src/stats.rs"
test -f "src/cli.rs" \\
  && ok "src/cli.rs" \\
  || miss "missing critical file: src/cli.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/XAMPPRocky/tokei"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Tokei is a blazingly-fast Rust command-line tool that counts lines of code, comments, and blanks across 200+ programming languages, handling nested code (e.g., Rust embedded in Markdown) and producing formatted statistics tables. It's designed as a modern, parallel alternative to cloc that leverages Rayon for concurrent file processing and grep-searcher for syntax-aware line categorization. Monolithic binary with clear modular structure: src/language/ defines 200+ language grammars via language_type.tera.rs (Tera templating), src/cli.rs wraps the core counting logic, src/stats.rs aggregates metrics, and src/utils/ provides helpers. A single src/lib.rs exports the public API; build.rs generates language_type.rs from languages.json.

👥Who it's for

Software engineers and teams who need rapid code metrics for CI/CD pipelines, code reviews, and project audits. DevOps engineers deploying it in Docker (see publish_image.yaml) and developers integrating it as a Rust library (via docs.rs). Package maintainers across Homebrew, Chocolatey, and Repology rely on it.

🌱Maturity & risk

Production-ready and actively maintained. Version 14.0.0 with dual MIT/Apache-2.0 licensing, comprehensive CI (mean_bean_ci.yml runs fuzzing and tests), release automation via release-plz.yaml, and 159K lines of Rust. No obvious stale issues given the robust GitHub Actions setup and recent activity patterns.

Low risk: narrow dependency surface (aho-corasick for pattern matching, serde ecosystem for serialization), no single-maintainer bottleneck evident from contributor setup. Fuzzing infrastructure (fuzz/fuzz_targets/) mitigates parser bugs. MSRV pinned to Rust 1.71, reducing compatibility drift.

Active areas of work

Active release cycle via release-plz.yaml (semantic versioning automation). Recent focus on CI reliability (mean_bean_ci.yml, mean_bean_deploy.yml) and container distribution (publish_image.yaml). Fuzzing campaign in fuzz/ directory prevents regressions in the parser.

🚀Get running

git clone https://github.com/XAMPPRocky/tokei.git
cd tokei
cargo build --release
./target/release/tokei --help

Daily commands:

cargo build --features cli
cargo run -- <path>          # Count lines in a directory or file
cargo run -- --json <path>    # Output JSON
cargo run -- --help           # Full CLI options

For library use: cargo build --lib (excludes cli feature).

🗺️Map of the codebase

  • src/main.rs — Entry point for the CLI application; defines the main command-line interface and orchestrates the core tokei workflow.
  • src/language/language_type.rs — Core enum defining all supported programming languages and their syntax rules; essential for understanding how tokei classifies files.
  • src/language/mod.rs — Language module definition; coordinates language detection, syntax analysis, and the overall language abstraction layer.
  • src/stats.rs — Statistics aggregation and reporting logic; calculates lines of code, comments, blanks, and other metrics.
  • src/cli.rs — Command-line argument parsing and configuration; defines all user-facing options and input handling.
  • Cargo.toml — Project manifest; declares dependencies, features (cli, cbor, yaml), and build configuration for the entire project.
  • languages.json — Language definition database; maps file extensions and patterns to syntax rules (comments, strings, nested structures).

🧩Components & responsibilities

  • CLI Parser (cli.rs, cli_utils.rs) (Clap, env_logger, colored) — Parses command-line arguments, validates options, and constructs the Config object passed to core analysis.
    • Failure mode: Invalid arguments cause graceful error message and early exit; no partial processing.
  • File Input & Filtering (input.rs, config.rs) (std::fs, walkdir-like traversal, regex for patterns) — Discovers files in specified paths, applies .tokeignore rules, and yields eligible files to the analysis pipeline.
    • Failure mode: Permission denied or missing files are logged and skipped; analysis continues for accessible files.
  • Language Detection (language_type.rs, languages.rs) (Enum dispatch, languages.json lookup tables) — Maps file extensions and names to LanguageType enum; provides syntax rule metadata for each language.
    • Failure mode: Unmapped extensions default to 'Unknown' language;

🛠️How to make changes

Add Support for a New Programming Language

  1. Add the new language variant to the LanguageType enum in src/language/language_type.rs (src/language/language_type.rs)
  2. Define syntax rules (line comments, block comments, strings, nested structures) in the tera template (src/language/language_type.tera.rs)
  3. Add file extensions and language name mapping in languages.json (languages.json)
  4. Register the language in src/language/languages.rs in the appropriate lookup map/function (src/language/languages.rs)
  5. Add test data file in tests/data/ (e.g., tests/data/newlang.ext) and verify via tests/accuracy.rs (tests/accuracy.rs)

Add a New Output Format (e.g., new serialization feature)

  1. Add a new feature gate in Cargo.toml (e.g., 'protobuf') with corresponding optional dependencies (Cargo.toml)
  2. Create serialization logic in src/ (e.g., src/serialization.rs or extend stats.rs) (src/stats.rs)
  3. Add a new output mode flag in src/cli.rs argument parser (src/cli.rs)
  4. Wire the output format in src/main.rs to call the new serialization function (src/main.rs)

Customize Tokei Behavior with a .tokeignore Config File

  1. Check the config parsing logic in src/config.rs to understand supported directives (src/config.rs)
  2. Create a .tokeignore file in your project root with ignore patterns (syntax similar to .gitignore) (.tokeignore)
  3. Run tokei from the project root; it will automatically discover and apply .tokeignore rules via src/input.rs (src/input.rs)

🔧Why these technologies

  • Rust — Memory-safe, blazing-fast execution critical for scanning large codebases; minimal runtime overhead.
  • Clap CLI framework — Robust argument parsing with auto-help generation, enabling rich CLI UX without boilerplate.
  • Serde + JSON/YAML/CBOR — Pluggable serialization backends (feature-gated) allow diverse output formats for integration with other tools.
  • Tera templating for language_type.tera.rs — Avoids hand-writing massive match arms; code-generated LanguageType enum from a single template maintains DRY principle.

⚖️Trade-offs already made

  • Single-threaded file processing with optional parallelism

    • Why: Simplifies state management and syntax rule lookup while maintaining reasonable performance for most use cases.
    • Consequence: Very large monorepos (100k+ files) may benefit from explicit parallelization; current design favors clarity.
  • languages.json as static, embedded database rather than dynamic plugin system

    • Why: Ensures fast startup, no runtime discovery overhead, and deterministic behavior across installations.
    • Consequence: Adding new languages requires code rebuild; community contributions need a maintainer to merge and release.
  • Comment/string detection via regex-based syntax rules rather than full AST parsing

    • Why: Fast, language-agnostic, and avoids the complexity of language-specific parsers.
    • Consequence: Edge cases (nested comments, complex escape sequences) may be misclassified; accuracy trades off against speed.

🚫Non-goals (don't propose these)

  • Does not provide real-time IDE integration or live as-you-type statistics.
  • Does not perform code quality analysis, linting, or cyclomatic complexity measurement.
  • Does not track code authorship, git blame, or contributor attribution.
  • Does not handle obfuscated, minified, or generated code specially; counts all lines equally.
  • Not a build system or dependency manager; purely a metrics collection tool.

🪤Traps & gotchas

Build-time code generation: languages.json is baked into binaries at build.rs; edit it and rebuild. Rust 1.71+ required: MSRV enforced; older Rust toolchains will fail. Parallel file I/O: Rayon thread pool behavior depends on RAYON_NUM_THREADS and CPU core count; single-threaded results may differ. Regex compilation overhead: First run creates regex cache; subsequent runs faster. .tokeignore: Project respects a .tokeignore file (like .gitignore) for exclusion; not auto-generated.

🏗️Architecture

💡Concepts to learn

  • Tera templating at build time — Tokei generates 200+ language parsers from a single JSON config via Tera (build.rs). Understanding this meta-programming pattern is key to extending language support without writing Rust code.
  • Parallel file traversal with Rayon — Tokei's speed comes from Rayon's thread pool for concurrent file scanning. Learners must understand work-stealing and rayon::scope to debug performance or add custom aggregation logic.
  • Aho-Corasick multi-pattern matching — Used for efficiently detecting comment delimiters and string boundaries in a single pass. Critical for understanding why nested language detection is fast.
  • Syntax-aware state machine parsing — src/language/syntax.rs implements a finite state machine to track in-string, in-comment, in-block-comment states while scanning. This avoids false positives (e.g., treating '#' inside a string as a comment marker).
  • DashMap for concurrent counting — Results from parallel threads are merged into a lock-free concurrent hashmap. Understanding DashMap vs Mutex<HashMap> trade-offs is essential for scaling to large codebases.
  • Encoding detection with encoding_rs_io — Tokei handles UTF-8, UTF-16, and legacy encodings transparently. Critical for counting legacy codebases without charset-related crashes.
  • Feature flags for optional outputs — CBOR and YAML output are gated behind feature flags (cbor, yaml) to reduce binary size. Learners should understand Cargo features for optional dependencies.
  • AlDanial/cloc — Original Perl-based code counter; Tokei is the modern Rust-parallel alternative with similar goals
  • boyter/scc — Go-based code counter with similar feature set; direct competitor in the fast-stats space
  • rust-lang/cargo — Rust package manager; Tokei uses clap-cargo for integration and follows Cargo conventions
  • BurntSushi/ripgrep — Sibling project in the fast-Rust-CLI ecosystem; both leverage ignore crate and regex for efficient file scanning
  • serde-rs/serde — Core serialization dependency; Tokei's JSON/YAML/CBOR output formats depend entirely on Serde ecosystem

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive tests for src/language/embedding.rs and language detection edge cases

The repo has 100+ language definitions (visible from the extensive tests/data directory with Daml, Modelica, Arturo, etc.) but tests/accuracy.rs appears to be the only language-focused test file. The embedding.rs module handles nested language detection (e.g., HTML with embedded JavaScript) which is complex logic that deserves dedicated test coverage. New contributors could add property-based tests using the existing proptest dependency to validate edge cases like mixed-language files, unclosed delimiters, and comment/string boundary conditions.

  • [ ] Create tests/language_embedding.rs with test cases for nested language pairs
  • [ ] Add proptest-based property tests in the new file for embedding edge cases
  • [ ] Test specific scenarios: HTML+JS, CSS+JS, markdown+code blocks using test data from tests/data/
  • [ ] Verify the tests catch regressions in src/language/embedding.rs logic

Add integration tests for all CBOR and YAML serialization paths

The repo has optional features for 'cbor' and 'yaml' serialization (visible in Cargo.toml), but there are no visible tests in tests/ that exercise these code paths. This means CBOR/YAML output could silently break without detection. A new contributor could add integration tests that count code in various formats and verify serialization to both CBOR and YAML works correctly.

  • [ ] Create tests/serialization_formats.rs with integration tests for yaml and cbor features
  • [ ] Add test functions that: (1) count lines in test files, (2) serialize to YAML, (3) deserialize and validate
  • [ ] Repeat for CBOR format using serde_cbor
  • [ ] Ensure tests run only when features are enabled using #[cfg(feature = "yaml")]

Add tests for src/cli_utils.rs and src/config.rs configuration loading edge cases

These modules handle user configuration and CLI argument parsing, but there's no visible dedicated test file for them. Configuration bugs (e.g., invalid .tokeignore patterns, malformed config files) directly impact user experience. A contributor could add tests for the configuration pipeline: loading from ~/.config/tokei, parsing .tokeignore files, and merging CLI arguments with config files.

  • [ ] Create tests/config_integration.rs to test src/config.rs and src/cli_utils.rs
  • [ ] Add test cases for: loading from etcetera paths, parsing .tokeignore patterns, invalid TOML configs
  • [ ] Test .tokeignore file patterns against actual test data files in tests/data/
  • [ ] Add a test that verifies CLI args correctly override config file settings

🌿Good first issues

  • Add line-count tests for src/language/syntax.rs token classification (currently heavy on regex, light on unit tests for edge cases like nested string escapes). Examine fuzz_targets/ for inspiration.
  • Write integration tests for embedding languages (Rust in Markdown, SQL in Python docstrings). Reference the example output in README showing '|- Rust' nesting but find minimal tests in the repo.
  • Document the languages.json schema in CONTRIBUTING.md with examples (line_comment, block_comment, strings, string_delimiters fields are not formally specified). See language_type.tera.rs for hints on required keys.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • fa44e51 — Add Laravel Blade language support (#1353) (benfaerber)
  • 505d648 — feat: Added C3 programming language. (#1319) (stpettersens)
  • d9cbc01 — Add Djot language support in languages.json (#1321) (kevinschweikert)
  • 6c71dd7 — Update strum/strum_macros to 0.27.2 (#1316) (musicinmybrain)
  • 04c7676 — ci(mean-bean-deploy): use v4 of the artifact actions (#1315) (SerenModz21)
  • ecf173d — ci(mean-bean-deploy): use master branch instead of main (#1314) (SerenModz21)
  • 8cdd6fa — chore: release v14.0.0 (#1299) (github-actions[bot])
  • cba46d5 — Fix downcast type mismatches in clap_builder (#1310) (CordlessCoder)
  • efd5a1d — chore: remove tokei.rs references (XAMPPRocky)
  • c32ed04 — Add support for Koka (#1306) (omegametabroccolo)

🔒Security observations

The Tokei codebase demonstrates reasonable security practices with a modern Rust setup, proper use of Cargo.toml features, and MIT/Apache-2.0 dual licensing. However, there are opportunities for improvement: (1) Fix the incomplete strum dependency specification, (2) Implement automated dependency vulnerability scanning in CI/CD, and (3) Establish a more formal security audit process for optional dependencies. The project does not appear to handle sensitive data or external inputs in a dangerous manner based on the file structure analysis. No SQL injection, XSS, or credential exposure issues were identified. The codebase is primarily a code counting utility with no exposed ports or dangerous configurations evident.

  • Medium · Incomplete Dependency in dev-dependencies — Cargo.toml, dev-dependencies section. The dev-dependency 'strum' version string is incomplete: 'strum = "0.27' is missing the closing quote and patch version. This could cause build inconsistencies or unintended version resolution. Fix: Complete the version specification to 'strum = "0.27.x"' or specify the exact version needed (e.g., '0.27.0')
  • Low · Missing Security Audit Configuration — .github/workflows/mean_bean_ci.yml. No evidence of dependency security scanning setup (e.g., cargo-audit, dependabot) in the CI/CD configuration files. While GitHub Actions workflows are present, there's no explicit security audit step visible. Fix: Add cargo-audit or similar dependency vulnerability scanning to CI/CD pipeline. Consider adding dependabot configuration for automated dependency updates.
  • Low · Broad Optional Feature Dependencies — Cargo.toml, features section. Optional features 'cbor' and 'yaml' pull in serialization libraries that could potentially introduce vulnerabilities if not properly audited. No explicit version pinning strategy visible for optional dependencies. Fix: Regularly audit optional feature dependencies. Consider pinning critical serialization library versions and using MSRV (Minimum Supported Rust Version) testing.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · XAMPPRocky/tokei — RepoPilot