RepoPilotOpen in app β†’

Wilfred/difftastic

a structural diff that understands syntax πŸŸ₯🟩

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained β€” safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI β€” clean foundation to fork and modify.

Learn fromHealthy

Documented and popular β€” useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture β€” runnable as-is.

  • βœ“Last commit 2d ago
  • βœ“14 active contributors
  • βœ“MIT licensed
Show all 6 evidence items β†’
  • βœ“CI configured
  • βœ“Tests present
  • ⚠Concentrated ownership β€” top contributor handles 78% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README β€” live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/wilfred/difftastic)](https://repopilot.app/r/wilfred/difftastic)

Paste at the top of your README.md β€” renders inline like a shields.io badge.

β–ΈPreview social card (1200Γ—630)

This card auto-renders when someone shares https://repopilot.app/r/wilfred/difftastic on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: Wilfred/difftastic

Generated by RepoPilot Β· 2026-05-09 Β· Source

πŸ€–Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale β€” STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI Β· unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/Wilfred/difftastic shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything β€” but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO β€” Healthy across the board

  • Last commit 2d ago
  • 14 active contributors
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Concentrated ownership β€” top contributor handles 78% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

βœ…Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live Wilfred/difftastic repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale β€” regenerate it at repopilot.app/r/Wilfred/difftastic.

What it runs against: a local clone of Wilfred/difftastic β€” the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in Wilfred/difftastic | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≀ 32 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> β€” paste this script from inside your clone of <code>Wilfred/difftastic</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of Wilfred/difftastic. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/Wilfred/difftastic.git
#   cd difftastic
#
# Then paste this script. Every check is read-only β€” no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of Wilfred/difftastic and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "Wilfred/difftastic(\\.git)?\\b" \\
  && ok "origin remote is Wilfred/difftastic" \\
  || miss "origin remote is not Wilfred/difftastic (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift β€” was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "src/main.rs" \\
  && ok "src/main.rs" \\
  || miss "missing critical file: src/main.rs"
test -f "Cargo.toml" \\
  && ok "Cargo.toml" \\
  || miss "missing critical file: Cargo.toml"
test -f "manual/src/SUMMARY.md" \\
  && ok "manual/src/SUMMARY.md" \\
  || miss "missing critical file: manual/src/SUMMARY.md"
test -f "src/parse/mod.rs" \\
  && ok "src/parse/mod.rs" \\
  || miss "missing critical file: src/parse/mod.rs"
test -f "src/diff/mod.rs" \\
  && ok "src/diff/mod.rs" \\
  || miss "missing critical file: src/diff/mod.rs"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 32 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~2d)"
else
  miss "last commit was $days_since_last days ago β€” artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) β€” safe to trust"
else
  echo "artifact has $fail stale claim(s) β€” regenerate at https://repopilot.app/r/Wilfred/difftastic"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚑TL;DR

Difftastic is a structural diff tool written in Rust that parses files into syntax trees and compares them semantically rather than line-by-line. It understands 30+ programming languages (via tree-sitter parsers in vendored_parsers/) and shows exactly which syntax elements changed, intelligently handling reformatting and whitespace. The key innovation is that it produces human-readable diffs that respect language semantics instead of treating code as plain text. Monolithic binary crate (Cargo.toml defines single 'difftastic' package). src/ contains the core engine: parse trees, diff logic, and display formatting. vendored_parsers/ holds tree-sitter grammar files (*.scm) and compiled C parser sources. demo_files/1 and demo_files/2 are snapshot examples showing before/after diffs. build.rs compiles C parser code at build time.

πŸ‘₯Who it's for

Software developers and version control users who want meaningful diffs that reflect actual code changes rather than line positions. Particularly valuable for code reviewers, git users integrating difftastic as a difftool (see difft.1.md), and teams working in languages where formatting changes are frequent. Repository maintainers and language tool developers interested in tree-sitter integration also use it as a reference.

🌱Maturity & risk

Actively developed and production-ready: published on crates.io (v0.70.0), has comprehensive CI in .github/workflows/ (test.yml, coverage.yml, release.yml), and includes vendored parser sources. However, the README candidly lists known issuesβ€”performance scales poorly on large changesets and memory usage is high. Last activity appears recent based on version number and maintained release pipeline.

Single maintainer (Wilfred) poses long-term sustainability risk. The codebase is notably complex due to tree-sitter parsing and diff algorithms, making contributions challenging for newcomers. Performance and robustness issues are acknowledged in READMEβ€”crashes occur regularly enough to warrant fixes in releases. Dependency surface is moderate (~15 direct deps) but includes external parsers that require C compilation (vendored_parsers/-src/**/.c).

Active areas of work

Active maintenance and releases (0.70.0 is recent). CHANGELOG.md tracks updates. CI/CD is comprehensive with automated testing, coverage reporting (.codecov.yml), and release artifacts. The repo supports multiple platforms (x86_64-pc-windows-msvc override in Cargo.toml indicates Windows-specific builds). Manual localization exists (zh-CN support visible in README badge).

πŸš€Get running

git clone https://github.com/Wilfred/difftastic.git
cd difftastic
cargo build --release
./target/release/difft --help

Test it with: cargo test (requires Rust 1.85.0+, specified in rust-version field).

Daily commands: cargo run -- <file1> <file2> or ./target/release/difft <file1> <file2>. Set RUST_LOG=debug for verbose output (log crate + pretty_env_logger configured). See difft.1.md for full command-line options.

πŸ—ΊοΈMap of the codebase

  • src/main.rs β€” Entry point for the difftastic CLI; orchestrates file parsing, diffing, and output formatting for all commands.
  • Cargo.toml β€” Defines all dependencies (tree-sitter parsers, display formatting, etc.) and Rust version requirements that shape the entire build.
  • manual/src/SUMMARY.md β€” Navigation hub for the mdBook manual; essential for understanding documented features, parser support, and configuration options.
  • src/parse/mod.rs β€” Core parsing abstraction layer that selects and invokes language-specific tree-sitter parsers and handles syntax tree construction.
  • src/diff/mod.rs β€” Core tree-diffing algorithm that compares syntax trees and produces the structural diff output that is difftastic's primary value.
  • build.rs β€” Build script that vendorizes or configures tree-sitter parsers at compile time; critical for cross-platform binary distribution.
  • src/display/mod.rs β€” Formats and renders diffs for terminal output with syntax highlighting, line wrapping, and side-by-side views.

🧩Components & responsibilities

  • Parsing Layer (parse/mod.rs) (Tree-sitter C bindings, Rust FFI) β€” Detect file language by extension; load appropriate tree-sitter grammar; extract and normalize syntax tree.
    • Failure mode: Parser crashes on malformed input β†’ gracefully fall back to line-based diff; unsupported language β†’ error message with supported list.
  • Diffing Engine (diff/mod.rs) (Tree-edit distance algorithm, memoization) β€” Compare two syntax trees; identify matching subtrees; compute minimum edit distance; produce edit operations (add/delete/modify).
    • Failure mode: Algorithm timeout on deeply nested structures β†’ configurable limits; memory exhaustion β†’ OOM killer; incorrect matching β†’ wrong diffs (rare).
  • Display Layer (display/mod.rs) (ANSI color codes, terminal size detection) β€” Render edit operations as formatted text with syntax highlighting; support multiple output formats (unified, side-by-side); handle terminal dimensions.
    • Failure mode: Unsupported terminal β†’ fallback to plaintext; wide lines β†’ truncation or word-wrap; broken ANSI β†’ garbled output in pipes.
  • CLI (main.rs) (clap (arg parsing) or manual parsing) β€” Parse command-line arguments; route to appropriate diff mode (files, directories, git); coordinate parsing, diffing, and output.
    • Failure mode: File not found β†’ exit with error code; invalid flags β†’ print usage; permission denied β†’ exit with error.

πŸ”€Data flow

  • User CLI input (file paths, flags) β†’ main. β€” undefined

πŸ› οΈHow to make changes

Add Support for a New Programming Language

  1. Register a new tree-sitter language grammar in the Cargo.toml under [dependencies] (Cargo.toml)
  2. Add a new language variant to the parser registration function (src/parse/parser.rs)
  3. Create a language detection rule (file extension mapping) (src/parse/mod.rs)
  4. Test the parser with sample files in demo_files/ and verify diff output (demo_files/1/index.html)
  5. Document the new language in the supported languages list (manual/src/languages_supported.md)

Modify Diff Output Formatting

  1. Identify the desired output format (unified, side-by-side, etc.) (src/display/mod.rs)
  2. Add or modify the display formatter for the chosen output style (src/display/unified.rs)
  3. Update the CLI argument parser to expose the new display option (src/main.rs)
  4. Test with sample files in demo_files/ and update manual if user-facing (manual/src/usage.md)

Improve the Tree-Diffing Algorithm

  1. Review the current diffing algorithm and identify optimization opportunities (src/diff/mod.rs)
  2. Modify node matching or edit sequence calculation logic (src/syntax.rs)
  3. Test changes against demo files to ensure correctness (demo_files/)
  4. Update the architecture documentation if algorithm changes are significant (manual/src/diffing.md)

πŸ”§Why these technologies

  • Tree-sitter β€” Provides fast, incremental parsing for 30+ languages with a unified C API; enables syntax-aware diffing without maintaining multiple language parsers.
  • Rust β€” Memory-safe systems language with zero-cost abstractions and excellent performance; critical for fast tree traversal and diffing on large files.
  • mdBook β€” Simple Markdown-based documentation generation; lowers barrier to contribution and keeps docs in sync with source.
  • Terminal color/formatting (Termcolor, similar) β€” Cross-platform colored terminal output support; essential for readable visual diffs without external dependencies.

βš–οΈTrade-offs already made

  • Structural diff over line-based diff

    • Why: Syntax-aware diffing provides far better context and identifies true changes rather than cosmetic reformatting.
    • Consequence: Slower than traditional line diffs on very large files; requires parsing overhead; may be unintuitive for users expecting unified diff format.
  • Vendored tree-sitter grammars vs external dependency

    • Why: Ensures reproducible builds and simplifies distribution; users get a single self-contained binary.
    • Consequence: Larger binary size; build time increases; grammar updates require manual maintenance.
  • Terminal-based output only (no GUI)

    • Why: Simpler architecture, integrates naturally with Unix pipelines and git workflows, minimal dependencies.
    • Consequence: Limited visual polish compared to IDE integrations; complex diffs harder to navigate without filtering/search.

🚫Non-goals (don't propose these)

  • Does not provide real-time collaboration or merge conflict resolution UI.
  • Does not support custom user-defined syntax rules or grammar extensions at runtime.
  • Does not track file history or version control integration beyond being a git difftool.
  • Does not support binary files or image diffs.
  • Does not provide IDE plugins or GUI applications (command-line only).

πŸͺ€Traps & gotchas

C Compilation: build.rs invokes C compiler for vendored tree-sitter parsers; requires C toolchain (gcc/clang/MSVC). Memory & Performance: Large files with many changes will hang or OOM; RUST_LOG=debug can help diagnose. MSRV Strict: rust-version = 1.85.0 is enforcedβ€”newer features will fail on older toolchains. Tree-sitter Quirks: Parsers must be correctly vendored; malformed .scm grammar files silently fail. Cross-Platform Bits: Windows needs special handling in Cargo.toml overrides (see pkg-fmt = 'zip').

πŸ—οΈArchitecture

πŸ’‘Concepts to learn

  • Syntax Tree Matching (Tree Edit Distance) β€” Difftastic's core algorithm maps AST nodes between old and new files; understanding tree edit distance and longest common subsequence on trees is essential to modifying the diff engine
  • Tree-Sitter Incremental Parsing β€” Tree-sitter is the parser framework used for all 30+ languages; incremental parsing allows partial re-parsing on changes, critical for understanding performance implications
  • Wu Diff Algorithm (Linear Diff) β€” The wu-diff crate implements O(n*d) diff computation; difftastic uses this for line-level fallback and understanding it helps optimize memory usage on large files
  • Arena Allocation (bumpalo) β€” AST nodes are allocated in typed-arena and bumpalo pools rather than per-node malloc; this is a key optimization for memory layout and GC patterns in a Rust diff tool
  • Semantic Whitespace Awareness β€” Difftastic distinguishes syntactically-significant whitespace (Python indents, JSON separators) from formatting noise; this requires language-specific rules encoded in diff logic
  • MSRV (Minimum Supported Rust Version) Policy β€” rust-version = 1.85.0 is strict; understanding MSRV rationale (Debian stable support, packager compatibility) helps avoid adding features that break downstream consumers
  • Cross-Compilation & vendored-src Pattern β€” C parser source is vendored and compiled at build-time via build.rs; this enables reproducible builds and offline compilation but requires careful management of compiler flags across platforms (see Windows MSVC override)
  • tree-sitter/tree-sitter β€” Underlying parsing library that difftastic depends on via vendored parsers; understanding tree-sitter is essential for extending language support
  • so-fancy/diff-so-fancy β€” Alternative diff highlighter for git; operates on line-diff output whereas difftastic replaces diff entirely, making it a philosophical competitor
  • dandavison/delta β€” Modern git diff pager with syntax highlighting; complements difftastic as a display layer (difftastic can pipe to delta) or alternative for simpler use cases
  • mergiraf/mergiraf β€” Related tool mentioned in README for AST-aware merging; built on similar syntax-tree principles but solves the merge problem difftastic explicitly does not
  • gitleaks/gitleaks β€” Uses similar diff-scanning infrastructure for security scanning; difftastic could potentially integrate as a better diff backend for secret detection

πŸͺ„PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for language-specific syntax diffing

The repo supports 20+ languages (Rust, Python, JavaScript, Go, Haskell, etc.) based on the SVG icons in homepage/home_img/, but there's no visible test directory with comprehensive language-specific test cases. Each language parser should have dedicated test files showing correct syntax-aware diffs work for edge cases (nested structures, comments, string literals). This ensures the tree-sitter parsers in vendored_parsers/ produce correct diffs across language-specific constructs.

  • [ ] Create tests/language_specific/ directory structure with subdirs for each major language
  • [ ] Add test files (e.g., tests/language_specific/rust_syntax_test.rs) that parse demo_files/1/src/ and demo_files/2/src/ variants
  • [ ] Verify diffs correctly handle language-specific constructs: comments in Python, type annotations in TypeScript, macros in Rust, etc.
  • [ ] Run tests against vendored_parsers/*-src tree-sitter definitions to ensure parser output quality

Add GitHub Actions workflow for testing against vendored parser updates

The .github/workflows/ contains test.yml, coverage.yml, release.yml, and deploy_docs.yml, but no workflow to validate vendored_parsers/ tree-sitter grammars when they're updated. Tree-sitter parsers in vendored_parsers/*-src/ are critical to correctness but have no dedicated CI validation. A workflow should verify that grammar changes don't break difftastic's parsing on real-world code samples.

  • [ ] Create .github/workflows/parser_validation.yml that runs on changes to vendored_parsers/
  • [ ] Add test steps that compile vendored parsers and run against demo_files/ to ensure no regressions
  • [ ] Include a check that validates .scm highlight queries in vendored_parsers/highlights/ are syntactically valid
  • [ ] Document in CONTRIBUTING.md (or create one) the process for updating vendored parsers

Create CONTRIBUTING.md with architecture documentation and language support matrix

The repo lacks a CONTRIBUTING.md file. Given the complexity (tree-sitter vendored parsers, language-specific handling, structural diffing algorithm), new contributors need clear guidance on how to add a new language or extend an existing one. The 20+ language icons suggest broad support, but there's no single source of truth documenting which languages are fully supported vs. partially supported.

  • [ ] Create CONTRIBUTING.md documenting: architecture overview (src/ modules, parser integration, diff algorithm)
  • [ ] Add a language support matrix table showing each language from homepage/home_img/.svg with: parser source, highlight query file location (vendored_parsers/highlights/.scm), and support level
  • [ ] Include step-by-step guide for adding a new language: how to integrate a tree-sitter grammar, write highlight queries, and add tests
  • [ ] Link to this from README.md and reference it in .github/ISSUE_TEMPLATE/bug_report.md for language-specific issues

🌿Good first issues

  • Add test coverage for vendored_parsers/ β€” currently no visible tests in demo_files/ for specific language parsing; writing round-trip tests (parse β†’ display β†’ parse) for new languages would catch grammar bugs early
  • Improve performance documentation β€” README mentions 'scales relatively poorly' but offers no guidance on file size limits or optimization flags; profile a real slow case and add PERFORMANCE.md with concrete numbers
  • Localize manual pages beyond zh-CN β€” difft.1.md is English-only but homepage suggests i18n support; translate difft.1.md to Spanish or French following the pattern in homepage/

⭐Top contributors

Click to expand

πŸ“Recent commits

Click to expand
  • a81564d β€” Link directly to manual from release notes, not the homepage (Wilfred)
  • b7a80fc β€” Roll version (Wilfred)
  • 90a0f1b β€” Remove Hack parser (Wilfred)
  • 445402e β€” Suppress a clippy warning (Wilfred)
  • 9985c28 β€” Add debug logging of check-attr output (Wilfred)
  • c7a5a68 β€” Pin GitHub Actions to commit hashes (Wilfred)
  • e1ceb94 β€” Check change_map current value before mutating it (Wilfred)
  • 580277b β€” Correct MSRV in manual (Wilfred)
  • 765e0a6 β€” Use mdbook admonition (Wilfred)
  • 5f082d7 β€” Use symlink to version placeholder script to avoid duplication (Wilfred)

πŸ”’Security observations

The codebase demonstrates generally good security practices as a command-line utility focused on diff functionality. No critical vulnerabilities were identified. The main concerns are around dependency version management (overly permissive ranges for 'ignore' crate) and a potentially malformed Cargo.toml file. The application does not appear to handle network requests, databases, or sensitive data processing based on the provided information, which reduces attack surface. Recommended actions: pin the 'ignore' dependency to a specific version, fix the incomplete Cargo.toml, and periodically audit dependencies for known vulnerabilities using tools like 'cargo audit'.

  • Medium Β· Overly Permissive Dependency Version Constraint β€” Cargo.toml - ignore dependency. The 'ignore' dependency uses a range constraint ('>= 0.4, < 0.4.24') that allows versions between 0.4.0 and 0.4.23. While there is an upper bound, this still permits a wide range of versions that could contain security vulnerabilities. Fix: Pin to a specific known-good version (e.g., 'ignore = "0.4.23"') or use a more restrictive constraint based on security advisories for the ignore crate.
  • Low Β· Incomplete Dependency Declaration β€” Cargo.toml - end of file. The Cargo.toml file appears truncated at the end with 'ha' on the last line, which may indicate a malformed or incomplete dependency declaration. This could cause build issues or dependencies not being properly declared. Fix: Complete or fix the malformed dependency declaration and verify the file is valid TOML.
  • Low Β· Lazy Static Usage β€” Cargo.toml - lazy_static dependency. The codebase uses 'lazy_static' crate for lazy initialization. While not inherently insecure, this pattern can mask race conditions or initialization order issues if not carefully managed. Fix: Review usage of lazy_static in the codebase for potential concurrency issues. Consider migrating to 'once_cell' (now part of std library in recent Rust versions) for better safety guarantees.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals β€” see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals Β· Wilfred/difftastic β€” RepoPilot