RepoPilotOpen in app →

pest-parser/pest

The Elegant Parser

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 3w ago
  • 41+ active contributors
  • Distributed ownership (top contributor 44% of recent commits)
Show all 6 evidence items →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/pest-parser/pest)](https://repopilot.app/r/pest-parser/pest)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/pest-parser/pest on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: pest-parser/pest

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/pest-parser/pest shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 3w ago
  • 41+ active contributors
  • Distributed ownership (top contributor 44% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live pest-parser/pest repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/pest-parser/pest.

What it runs against: a local clone of pest-parser/pest — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in pest-parser/pest | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 49 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>pest-parser/pest</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of pest-parser/pest. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/pest-parser/pest.git
#   cd pest
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of pest-parser/pest and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "pest-parser/pest(\\.git)?\\b" \\
  && ok "origin remote is pest-parser/pest" \\
  || miss "origin remote is not pest-parser/pest (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "pest/src" \\
  && ok "pest/src" \\
  || miss "missing critical file: pest/src"
test -f "meta/src" \\
  && ok "meta/src" \\
  || miss "missing critical file: meta/src"
test -f "derive/src/lib.rs" \\
  && ok "derive/src/lib.rs" \\
  || miss "missing critical file: derive/src/lib.rs"
test -f "generator/src/generator.rs" \\
  && ok "generator/src/generator.rs" \\
  || miss "missing critical file: generator/src/generator.rs"
test -f "Cargo.toml" \\
  && ok "Cargo.toml" \\
  || miss "missing critical file: Cargo.toml"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 49 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~19d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/pest-parser/pest"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

pest is a Parsing Expression Grammar (PEG) parser generator written in Rust that compiles human-readable grammar definitions (.pest files) into fast, correct parsers with built-in error reporting. It solves the problem of writing parsers without hand-coding recursive descent logic or dealing with ambiguous context-free grammars—users define grammar rules and pest generates the parser code automatically. Monorepo with pest/ as the core parser engine, derive/ providing procedural macros for Rust integration, meta/ handling grammar meta-language parsing, bootstrap/ for bootstrapping the compiler, generator/ for code generation, and debugger/ for interactive grammar testing. Grammar files (.pest) are declarative DSL; generated Rust code produces typed AST via derive macros.

👥Who it's for

Rust developers building domain-specific languages (DSLs), configuration file parsers, or command-line tools who need a declarative, maintainable alternative to hand-written parsers or regex-based parsing. Compiler writers, linters, and query language implementers use pest to avoid boilerplate.

🌱Maturity & risk

Production-ready and actively maintained. The project has comprehensive CI/CD via GitHub Actions (ci.yml, cifuzz.yml, release.yml), dual Apache/MIT licensing, security policy (SECURITY.md), and Rust 1.83.0+ baseline. The monorepo structure with 8 workspace members (pest, derive, generator, meta, bootstrap, debugger, grammars, vm) indicates a mature, battle-tested ecosystem with ongoing development.

Low risk for core parsing use; the single-crate pest dependency is well-maintained and has no heavy external dependencies. The derive macro system (derive/) depends on proc_macro but is well-tested. Main risk: the PEG model itself has limitations (no left recursion, greedy matching) that may require grammar restructuring for some languages. No obvious single-maintainer bottleneck based on CODEOWNERS structure.

Active areas of work

Active development with CI fuzzing (cifuzz.yml), code coverage tracking (codecov.yml), and coordinated releases via bors.toml. The derive macro system is being actively used and tested (derive/tests/ has multiple test suites for grammar.pest, lists.pest, implicit.pest, oneormore.pest). Recent work includes calc and help-menu examples in derive/examples/.

🚀Get running

git clone https://github.com/pest-parser/pest.git
cd pest
cargo build
cargo test

Then explore derive/examples/calc.rs to see a working calculator parser, or read CONTRIBUTING.md and the book at https://pest.rs/book.

Daily commands: Development: cargo build --workspace && cargo test --workspace runs all members. Examples: cargo run --manifest-path derive/examples/calc.rs (calc example). Debugger: cargo run --bin pest-debugger (interactive grammar testing). Bootstrap: cargo run --manifest-path bootstrap/src/main.rs (regenerate codegen if grammar.pest changes).

🗺️Map of the codebase

  • pest/src — Core pest parser library containing the fundamental parsing engine and trait definitions that all other modules depend on
  • meta/src — Implements pest's own grammar parser and bootstrapping; essential for understanding how pest parses .pest grammar files
  • derive/src/lib.rs — Procedural macro implementation that generates parser code from #[derive(Parser)] annotations; foundational to pest's ergonomic API
  • generator/src/generator.rs — The code generation engine that transforms parsed grammar rules into executable Rust parser code
  • Cargo.toml — Workspace manifest defining all member crates, their dependencies, and configuration; shows how the modular architecture is organized
  • grammars/src/grammars — Reference grammar implementations (JSON, TOML, HTTP, SQL) demonstrating pest patterns and serving as canonical examples
  • vm/src — Virtual machine-based parser execution engine; alternative to code generation for runtime parsing

🛠️How to make changes

Add a new grammar file for parsing a language/format

  1. Create a new .pest file in grammars/src/grammars/ following the naming convention (e.g. yaml.pest for YAML parser) (grammars/src/grammars/yaml.pest)
  2. Define grammar rules using pest syntax: rules with = for normal rules, @pattern for special rules, and { } for rule bodies (grammars/src/grammars/json.pest)
  3. Add parsing module integration in grammars/src/lib.rs with inline_mod! macro or explicit pub mod declaration (grammars/src/lib.rs)
  4. Create integration test file in grammars/tests/yaml.rs that uses the generated parser and tests against example files (grammars/tests/json.rs)
  5. Add example test data files in grammars/resources/test/ (e.g. yamlfuzzsample1.yaml, yamlfuzzsample2.yaml) (grammars/resources/test)
  6. Optional: add performance benchmark in grammars/benches/yaml.rs following the json.rs pattern (grammars/benches/json.rs)

Create a derived parser for a custom grammar using the macro API

  1. Create a new Rust source file (e.g. src/parser.rs) and define a struct with #[derive(Parser)] attribute (derive/examples/calc.rs)
  2. Annotate the struct with #[grammar = "path/to/grammar.pest"] specifying your grammar file location (derive/examples/calc.rs)
  3. Define your .pest grammar file in the referenced location, using rule names that match your Rust identifier naming (derive/examples/calc.pest)
  4. Call generated parser methods like Parser::parse(Rule::rule_name, input) to parse input strings (derive/examples/calc.rs)
  5. Add test cases in src/tests.rs or similar, using derive/tests/ structure as a template with .pest and corresponding .rs test file (derive/tests/grammar.rs)

Extend the pest derive macro with custom attributes or behavior

  1. Modify the attribute parsing logic in derive/src/lib.rs to recognize new macro attributes on #[grammar(...)] or #[derive(Parser)] (derive/src/lib.rs)
  2. Update generator/src/parse_derive.rs to handle new attribute variants and store configuration in the derived parser struct (generator/src/parse_derive.rs)
  3. Modify code generation in generator/src/generator.rs to emit different Rust code based on the new attributes (generator/src/generator.rs)
  4. Add test cases in derive/tests/ with a new .pest file and corresponding .rs test demonstrating the new behavior (derive/tests/grammar.rs)
  5. Document the attribute usage in CONTRIBUTING.md and update derive/_README.md with examples (derive/_README.md)

Optimize parser performance or add a new parsing strategy

  1. Profile existing parsers using grammars/benches/ benchmarks to identify bottlenecks (grammars/benches/json.rs)
  2. If implementing VM-based parsing, modify vm/src/lib.rs to add new bytecode operations or optimization passes (vm/src)
  3. If modifying code generation, update generator/src/generator.rs to emit optimized patterns (e.g., memoization, backtracking pruning) (generator/src/generator.rs)
  4. Add new benchmarks in grammars/benches/ to measure the performance impact of changes (grammars/benches/main.rs)
  5. Run fuzzing with grammars/fuzz to ensure correctness is maintained across the optimization (grammars/fuzz)

🔧Why these technologies

  • Rust — Provides memory safety without GC, enabling fast parsing with predictable performance and safe parser code generation
  • Procedural Macros (syn/quote) — Allows zero-runtime-cost compile-time parser generation; grammars become native Rust code with full IDE support
  • PEG (Parsing Expression Grammar) — Provides clear, declarative grammar syntax with ordered choice semantics; avoids ambiguity and conflicts common in BNF/LALR
  • Dual Architecture (Code Gen + VM) — Code generation gives maximum performance; VM allows runtime grammar loading and easier embedding in dynamic scenarios

⚖️Trade-offs already made

  • PEG over LALR/LR parsers

    • Why: PEG provides simpler semantics and ordered choice; no shift/reduce conflicts or disambiguation rules
    • Consequence: Left-recursion not directly supported (must use indirect recursion or repetition operators); some grammar styles less natural
  • Compile-time code generation via macros rather than runtime parsing

    • Why: Maximizes performance; generated parsers are native Rust code with zero interpretation overhead; enables IDE/LSP support
    • Consequence: Grammar must be known at compile time; slower compile times for large grammars; grammar changes require recompilation
  • Self-hosting (pest parses .pest files using pest itself)

    • Why: Validates pest's own power; simplifies meta-grammar maintenance; bootstrapping works once
    • Consequence: Requires bootstrap executable to compile initially; circular dependency in development (mitigated by bootstrap crate)
  • Separate VM crate alongside code generation

    • Why: Provides flexibility for runtime grammar loading and dynamic use cases; research vehicle for optimization
    • Consequence: Two code paths to maintain; VM slower than generated code; risk of semantic diverg

🪤Traps & gotchas

Grammar recursion limits: PEG is greedy and non-backtracking; left-recursive grammars (e.g., expr = expr '+' term) will infinite-loop. Must use repetition operators (+, *) instead. Macro expansion timing: derive macros run at compile-time and read .pest file via include_str!(), so grammar files must be adjacent to #[derive(Parser)] struct definitions. Error messages are eager: pest captures parse state at first failure point; ambiguous grammars may report unhelpful errors. No regex support: character ranges like 'a'..'z' are syntax sugar; complex matching requires manual rule composition.

🏗️Architecture

💡Concepts to learn

  • Parsing Expression Grammar (PEG) — PEG is the core formalism pest implements; understanding ordered choice, greedy matching, and lack of backtracking is essential to writing correct grammars and debugging parsing failures
  • Procedural Macros (proc_macro) — pest's derive/src/lib.rs uses Rust procedural macros to transform #[derive(Parser)] annotations into parser code at compile-time; fundamental to how pest integrates with Rust
  • Memoization / Packrat Parsing — pest internally uses packrat parsing to cache parse results and avoid exponential backtracking; understanding this explains performance characteristics and memory trade-offs
  • Recursive Descent Parsing — pest generates recursive descent parsers (one function per grammar rule); understanding call stacks and rule composition is key to debugging parse failures and stack overflows
  • Abstract Syntax Tree (AST) / Pair Types — pest returns Pair and Pairs types (defined in pest/src/) representing matched parse trees; transforming these into domain ASTs is the post-parse step users must implement
  • Bootstrapping — pest's meta-language parser (meta/) is itself written in pest grammar; changes to grammar syntax require running bootstrap/src/main.rs to regenerate the codegen. Understanding this self-hosting model is critical for core contributor work
  • Lookahead / Negative Lookahead — PEG's lookahead operators (&, !) allow zero-width assertions (check-without-consume); pest grammar relies heavily on these to implement keywords and context-sensitive rules without ambiguity
  • gvanrossum/peg — Original PEG parser generator reference implementation (Python); pest's syntax and semantics are directly inspired by this
  • tree-sitter/tree-sitter — Alternative incremental parser generator with wider language support; competes in the DSL-parsing space but uses GLR rather than PEG
  • lalrpop/lalrpop — Rust LALR(1) parser generator; alternative to pest for grammar-based parsing with different trade-offs (LALR vs PEG, handles left recursion natively)
  • nom-rs/nom — Parser combinator library for Rust; complementary approach to pest (hand-written combinators vs declarative grammar), used when fine-grained control needed
  • pest-parser/pest-templates — Community template collection for common grammars (JSON, TOML, SQL); reference implementations for users starting with pest

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for the derive macro with edge cases

The derive/ crate has examples and tests (grammar.rs, lists.rs, etc.), but there are no explicit tests for error cases, malformed grammars, or edge cases in proc-macro expansion. This would catch regressions in the macro code generator and improve robustness. Looking at derive/src/lib.rs, adding comprehensive tests for invalid syntax, nested rules, and attribute combinations would strengthen the derive crate.

  • [ ] Create derive/tests/edge_cases.pest with intentionally complex/nested rules
  • [ ] Create derive/tests/edge_cases.rs with test cases for grammar errors and macro edge cases
  • [ ] Add tests for invalid attribute combinations in derive/tests/
  • [ ] Run tests with cargo test -p pest_derive and document findings

Add CI workflow for MSRV (Minimum Supported Rust Version) validation

The README specifies rustc 1.83.0+ as the minimum version, but there is no CI workflow (in .github/workflows/) that validates the MSRV. The existing ci.yml likely tests only the latest stable. Adding an MSRV CI job would prevent accidental introduction of newer Rust features and ensure compatibility guarantees are met.

  • [ ] Review .github/workflows/ci.yml to confirm no MSRV testing exists
  • [ ] Create a new workflow .github/workflows/msrv.yml or add job to ci.yml
  • [ ] Configure the job to test with rustc 1.83.0 using rustup with fixed toolchain
  • [ ] Test all workspace members (pest, derive, generator, vm, etc.) against MSRV
  • [ ] Document the MSRV policy in CONTRIBUTING.md

Add benchmarking suite and CI integration for performance regressions

The repo has multiple parser implementations (bootstrap, vm, derive) but no visible benchmarking infrastructure in .github/workflows/ or root directories. A benchmarking suite would help maintain parser performance across versions and catch regressions early. This is especially valuable for a parser library where performance is critical.

  • [ ] Create benches/ directory with criterion.rs benchmarks for core parsing operations
  • [ ] Add benchmark targets for: rule matching, PEG precedence, whitespace handling, pest_derive codegen
  • [ ] Create .github/workflows/benchmark.yml to run benchmarks on PR (comparing against main)
  • [ ] Document how to run benchmarks locally in CONTRIBUTING.md
  • [ ] Configure criterion to fail CI if regression exceeds acceptable threshold (e.g., 10%)

🌿Good first issues

  • Add integration tests for the debugger/ crate under debugger/tests/ to catch regressions in interactive grammar testing; currently no test suite exists for the debugger binary itself.
  • Expand derive/tests/ with a new test file reduce.pest covering operator precedence and associativity patterns (left-assoc, right-assoc chains), which are common but undertested use cases.
  • Document the bootstrap process in CONTRIBUTING.md: explain when/how to regenerate vm/src/generated.rs using bootstrap/src/main.rs and why it matters for grammar changes.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 02c5ad8 — docs: add Panics sections to line_col() and PrattParser::parse() (#1167) (MasterOfFeelingFish)
  • 2a69a98 — fix(test): make miette_error test independent of terminal (#1164) (MasterOfFeelingFish)
  • d1a5bef — fix doc indentation check (#1166) (tomtau)
  • 52f49b1 — bump default CI toolchain to 1.86 (#1165) (tomtau)
  • 64111e5 — Keep grammar.rs in tree (#1159) (konstin)
  • 587f8e5 — bump toolchain to 1.85 (#1160) (tomtau)
  • 79dd30d — bump version to 2.8.6 (#1152) (tomtau)
  • 182ffb4 — Fix: change display impl to match as str (#1151) (Teolhyn)
  • a70abdb — New debugging span (using range). #1123 (#1148) (re-masashi)
  • a4a6f55 — Fix memchr pest dependency to 2.4.0 (#1147) (nyurik)

🔒Security observations

The pest parser codebase demonstrates good security practices including: published SECURITY.md with vulnerability reporting procedures, use of GitHub private security advisories, integration with rustsec/advisory-db, active CI/CD pipeline, and fuzzing infrastructure. However, there are concerns regarding narrow version support windows (only latest minor version), and the inherent complexity of parser implementations which create a larger attack surface for ReDoS or resource exhaustion attacks. No hardcoded secrets, injection vulnerabilities, or obvious misconfigurations were detected in the provided file structure. The main security focus should be on continuous fuzzing and resource limitation enforcement in the parser.

  • Low · Fuzzing Infrastructure Underfunded — SECURITY.md, grammars/fuzz/. SECURITY.md indicates the codebase has fuzzing infrastructure set up (cargo fuzz) but acknowledges it 'would benefit from more runtime, targets and corpora.' This suggests potential undiscovered vulnerabilities in parser logic that could be exposed through fuzzing. Fix: Increase fuzzing efforts with dedicated runtime and expand fuzz targets and test corpora. Consider CI integration for continuous fuzzing.
  • Low · Narrow Version Support Window — SECURITY.md. SECURITY.md indicates only the most recent minor version (2.7.x) is supported. Versions < 2.7.x receive no security updates. This creates a risk for users on older versions who may not update promptly. Fix: Consider supporting at least 2-3 recent minor versions with backported security patches, or establish a clear deprecation timeline and communicate it widely to users.
  • Low · Parser Complexity Attack Surface — pest/, bootstrap/, derive/, generator/, vm/. The codebase is a PEG parser generator with multiple implementations (bootstrap, derive, generator, vm modules). Parser implementations are complex and can be vulnerable to ReDoS, stack exhaustion, or memory exhaustion attacks if not carefully validated. Fix: Implement resource limits (max recursion depth, input size limits, timeout mechanisms). Add comprehensive input validation and sanitization. Maintain active fuzzing for edge cases.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · pest-parser/pest — RepoPilot