RepoPilotOpen in app →

google/re2

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Healthy

Healthy across all four use cases

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 4mo ago
  • 6 active contributors
  • Distributed ownership (top contributor 49% of recent commits)
Show 4 more →
  • BSD-3-Clause licensed
  • CI configured
  • Tests present
  • Slowing — last commit 4mo ago

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/google/re2)](https://repopilot.app/r/google/re2)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/google/re2 on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: google/re2

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/google/re2 shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • Last commit 4mo ago
  • 6 active contributors
  • Distributed ownership (top contributor 49% of recent commits)
  • BSD-3-Clause licensed
  • CI configured
  • Tests present
  • ⚠ Slowing — last commit 4mo ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live google/re2 repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/google/re2.

What it runs against: a local clone of google/re2 — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in google/re2 | Confirms the artifact applies here, not a fork | | 2 | License is still BSD-3-Clause | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 137 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>google/re2</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of google/re2. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/google/re2.git
#   cd re2
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of google/re2 and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "google/re2(\\.git)?\\b" \\
  && ok "origin remote is google/re2" \\
  || miss "origin remote is not google/re2 (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
  && ok "license is BSD-3-Clause" \\
  || miss "license drift — was BSD-3-Clause at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "re2/re2.h" \\
  && ok "re2/re2.h" \\
  || miss "missing critical file: re2/re2.h"
test -f "re2/regexp.h" \\
  && ok "re2/regexp.h" \\
  || miss "missing critical file: re2/regexp.h"
test -f "re2/prog.h" \\
  && ok "re2/prog.h" \\
  || miss "missing critical file: re2/prog.h"
test -f "re2/parse.cc" \\
  && ok "re2/parse.cc" \\
  || miss "missing critical file: re2/parse.cc"
test -f "re2/compile.cc" \\
  && ok "re2/compile.cc" \\
  || miss "missing critical file: re2/compile.cc"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 137 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~107d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/google/re2"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

RE2 is a C++ regular expression library that guarantees linear-time matching on any input by using finite automata instead of backtracking. It trades some regex feature support (no backreferences or lookarounds) for absolute safety: untrusted regex patterns cannot cause catastrophic backtracking or exponential blowup. Used in production at Google since 2006, it prioritizes security and predictability over feature completeness. Multi-language polyrepo: the core C++ engine lives in /re2/ (parsed from lib/git structure), Python bindings in /python/ with _re2.cc wrapping the core, TypeScript/JavaScript web app in /app/ (including app.ts and _re2.cc wasm bindings), build system supports Bazel (BUILD.bazel, WORKSPACE.bazel, MODULE.bazel) and CMake (CMakeLists.txt, .github/cmake.sh), with benchmark infrastructure in /benchlog/.

👥Who it's for

Systems engineers and infrastructure teams who need to safely parse untrusted user-supplied regex patterns (logging systems, protocol parsers, security filters), plus language binding developers who need a robust regex engine for Python, JavaScript, or other language runtimes built on RE2's C++ core.

🌱Maturity & risk

Highly mature and production-ready: the project has been battle-tested in Google infrastructure since 2006, maintains active CI across Bazel, CMake, and Python builds (.github/workflows/ shows bazel.yml, cmake.sh, python.yml), and has comprehensive documentation. Recent activity visible in the module ecosystem (MODULE.bazel, .bcr/ templates suggest ongoing Bazel Central Registry maintenance), indicating continued active development rather than maintenance-only mode.

Low risk overall. Single primary engine written in pure C++ with no external dependencies (no third-party libs in the file list), strong test infrastructure via CI workflows, and a conservative feature set that avoids chasing every regex dialect. However, as a mature library with a fixed feature scope, breaking changes would be rare; the risk is more about whether it supports your specific regex feature rather than instability.

Active areas of work

Active ecosystem expansion: visible work on Bazel Central Registry integration (.bcr/ directory with metadata.template.json and presubmit.yml), ongoing Python binding maintenance (.github/workflows/python.yml), and WASM/web support (app/ directory with rollup config and TypeScript). CI workflows suggest regular testing across multiple build systems and platforms.

🚀Get running

git clone https://github.com/google/re2.git
cd re2
# For C++ core:
make test
# For Python bindings:
cd python && python setup.py test
# For web WASM build:
cd app && npm install && npm run build

Daily commands: For C++ library: make or bazel build :re2 (BUILD.bazel defines targets). For Python: pip install . from /python/ directory or cd python && python setup.py install. For WASM web app: cd app && npm install && npm run build (uses rollup, see rollup.config.js). See Makefile for C++ recipe details and .github/workflows/ for CI patterns.

🗺️Map of the codebase

  • re2/re2.h — Public API header defining the RE2 class and Match interface that all users and internal consumers depend on.
  • re2/regexp.h — Core internal representation of parsed regular expressions; essential for understanding the compilation pipeline.
  • re2/prog.h — Compiled finite automaton representation; central to execution engines (DFA, NFA, onepass).
  • re2/parse.cc — Regex syntax parser that converts raw pattern strings into the internal Regexp AST; foundational for all pattern handling.
  • re2/compile.cc — Compiler that transforms parsed Regexp into executable Prog bytecode; bridges AST to execution engines.
  • re2/dfa.cc — Deterministic finite automaton execution engine optimized for linear-time matching on most patterns.
  • re2/nfa.cc — Nondeterministic finite automaton engine with backtracking support for complex patterns; critical for correctness on edge cases.

🛠️How to make changes

Add support for a new regex operator

  1. Add the token type to the Parser enum and update the tokenizer in parse.cc to recognize the operator syntax (re2/parse.cc)
  2. Add a new Regexp node type to the Op enum in regexp.h and update the Regexp constructor/factory (re2/regexp.h)
  3. Implement parsing logic in parse.cc to construct the new Regexp node from matched tokens (re2/parse.cc)
  4. Implement the compilation logic in compile.cc to emit appropriate Prog instructions for the new operator (re2/compile.cc)
  5. Add test cases in re2/testing/parse_test.cc and compile_test.cc to validate syntax and execution (re2/testing/parse_test.cc)
  6. If needed, update execution engines (dfa.cc, nfa.cc) to handle the new Prog instruction types (re2/dfa.cc)

Add a new matching mode or search strategy

  1. Define new search method in re2/re2.h with appropriate signature (PartialMatch, Search, etc.) (re2/re2.h)
  2. Implement the matching logic in re2/re2.cc, creating a new Prog execution path if needed (re2/re2.cc)
  3. If creating a specialized execution engine, implement it in a new file or extend dfa.cc/nfa.cc (re2/dfa.cc)
  4. Add bindings in python/_re2.cc or app/_re2.cc if exposing to other languages (python/_re2.cc)
  5. Write tests in re2/testing/re2_arg_test.cc or a new test file to validate the new mode (re2/testing/re2_arg_test.cc)

Optimize pattern matching for a specific use case

  1. Analyze the pattern structure in prefilter.cc to extract fast-path conditions (fixed strings, character classes) (re2/prefilter.cc)
  2. Enhance the prefilter tree in prefilter_tree.cc to prune non-matching inputs earlier (re2/prefilter_tree.cc)
  3. Add pattern-specific optimizations in simplify.cc to reduce AST complexity before compilation (re2/simplify.cc)
  4. Optionally extend compile.cc to recognize special patterns and generate specialized Prog code (re2/compile.cc)
  5. Benchmark the optimization using the test harnesses in re2/testing/exhaustive_test.cc and benchlog (re2/testing/exhaustive_test.cc)

Add Python language bindings for a new feature

  1. Ensure the C++ feature is fully implemented and tested in re2/ (e.g., re2/re2.h, re2/re2.cc) (re2/re2.h)
  2. Add Python SWIG/ctypes bindings in python/_re2.cc, exposing the C++ method with type conversion (python/_re2.cc)
  3. Wrap the binding in a Pythonic interface in python/re2.py with proper docstrings and error handling (python/re2.py)
  4. Add test cases in python/re2_test.py to verify the Python API matches expected behavior (python/re2_test.py)
  5. Update python/setup.py if build dependencies or compiler flags change (python/setup.py)

🪤Traps & gotchas

No external dependencies (intentional for safety), but this means no lock file in traditional sense—version pinning happens via Bazel WORKSPACE.bazel or CMake toolchain files, not package.json-style. The /benchlog/ directory contains custom benchmark formats (.c2, .mini, .r70, .wreck) with no standard parsing docs—benchmarking is not immediately accessible without reading benchplot.py. WASM build in /app/ requires Emscripten toolchain separately; npm install alone won't create the web version. Python bindings require system C++ compiler and Python dev headers (python3-dev on Linux). No single 'master' branch indicator visible; check GitHub UI for the actual default branch before assuming main/master.

🏗️Architecture

💡Concepts to learn

  • Finite Automata (NFA/DFA) — RE2's core strategy: converts regex patterns into finite state machines that guarantee linear-time matching, avoiding the exponential backtracking that makes other engines unsafe for untrusted input.
  • Thompson NFA Construction — The specific algorithm RE2 uses to convert regex syntax into non-deterministic finite automata; understanding this is essential for modifying the parse.cc and compile.cc modules.
  • Deterministic Finite Automaton (DFA) Subset Construction — RE2 uses DFA on-the-fly compilation from NFA to avoid exponential state explosion; the dfa.cc module implements this, making it critical for understanding how matching actually executes.
  • Memoization (Caching NFA States) — RE2 caches partial DFA states during matching to avoid recomputation; this is why the engine is both memory-bounded and linear-time, a key trade-off to understand.
  • Backtracking Regex Pathology (Catastrophic Backtracking) — The fundamental problem RE2 solves: why patterns like (a+)+ on non-matching input cause exponential runtime in PCRE/Perl; RE2's design is defined by eliminating this class of bug.
  • POSIX vs. Perl Regex Semantics — RE2 supports both POSIX (egrep-style) and Perl regex modes with different greedy/non-greedy and ordering semantics; the parse.cc module branches on this flag, and understanding the difference is essential for using RE2 correctly.
  • WebAssembly (WASM) FFI and Emscripten — The /app/ directory demonstrates compiling the C++ regex engine to WASM for browser execution; critical for understanding how RE2 reaches JavaScript without reimplementation.
  • google/abseil-cpp — Abseil C++ library is RE2's preferred dependency for base types and utilities (strings, status codes); understanding how RE2 integrates with Abseil idioms is essential for C++ modifications.
  • facebook/folly — Alternative high-performance C++ utility library; used in some regex-heavy systems as an alternative to Abseil, useful comparison for understanding design choices in RE2.
  • chromium/v8 — V8 JavaScript engine includes a regex matcher; V8 developers study RE2 for inspiration on safe regex semantics, and the web community often benchmarks RE2 against V8's approach.
  • oniguruma/oniguruma — Production regex library used in Ruby and other languages; represents the opposite design philosophy (backtracking-based, feature-complete) and a common alternative when RE2's feature limits are too restrictive.
  • rust-lang/regex — Rust's regex crate inspired by RE2's DFA-based design; shows how RE2 concepts have influenced modern regex library design in other languages.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add WebAssembly performance benchmarking suite to app/

The app/ directory contains a TypeScript/WebAssembly build of RE2 (app/_re2.cc, app/app.ts, app/index.html) but lacks any performance benchmarking infrastructure. This is critical for a regex library where performance is a core value proposition. A new contributor could create a benchmark suite similar to benchlog/ but for the WASM/web build, allowing measurement of pattern compilation time, match performance, and memory usage in browser environments.

  • [ ] Create app/benchmarks.ts with harness for common regex patterns
  • [ ] Add benchmark result visualization to app/index.html
  • [ ] Create GitHub Action workflow (.github/workflows/benchmark-wasm.yml) to track performance regressions on commits
  • [ ] Document benchmark methodology in app/README or top-level CONTRIBUTING.md

Expand Python bindings test coverage and add type stubs (py.typed)

The python/ directory has re2_test.py and _re2.cc bindings, but lacks comprehensive type hints and a py.typed marker file. Modern Python tooling (mypy, pyright) cannot properly type-check code using the re2 module. This is especially important for a safety-focused library. Adding type stubs (re2.pyi) and the py.typed marker would enable static type checking for library users.

  • [ ] Create python/re2.pyi with comprehensive type annotations for all public APIs in python/re2.py
  • [ ] Add python/py.typed marker file (empty sentinel file)
  • [ ] Extend python/re2_test.py with test cases for type-checkable patterns
  • [ ] Update python/setup.py to include py.typed and .pyi files in package data

Add sanitizer coverage tracking and fuzz testing automation in CI

The repo contains re2/fuzzing/re2_fuzzer.cc but there is no visible CI integration for continuous fuzzing or coverage reporting. The existing fuzzing code is orphaned from the main CI workflows (.github/workflows/ci.yml, ci-bazel.yml, ci-cmake.yml). A new contributor could integrate OSS-Fuzz or libFuzzer into the CI pipeline with coverage tracking, ensuring regex parsing and matching robustness against malicious input—core to RE2's safety mission.

  • [ ] Create .github/workflows/ci-fuzzing.yml to run re2/fuzzing/re2_fuzzer.cc on each commit with ASAN/UBSAN
  • [ ] Integrate Codecov or Clang coverage generation for fuzzer code paths
  • [ ] Document fuzzing setup and how-to-run in CONTRIBUTING.md (add fuzzing section if missing)
  • [ ] Update BUILD.bazel and CMakeLists.txt to ensure fuzzer targets are always built and tested

🌿Good first issues

  • Add Python 3.12+ type hints to /python/_re2.cc and the Python-facing interface; the existing bindings lack stub files (.pyi) that would enable type checkers and IDE autocomplete.
  • Expand /benchlog/ documentation: create README.md explaining the .c2, .mini, .r70, .wreck benchmark formats and add comment headers to benchplot.py explaining how to add new benchmark suites.
  • Write integration tests in /python/ that cross-check RE2 behavior against PCRE edge cases for anchors (^, $, \b) and character classes to document known incompatibilities.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 972a15c — re2: remove unnecessary & in MutexLock usage (rsc)
  • 019bed1 — re2: make CI checks also PR checks (rsc)
  • e7aec59 — python: update minimum version in packaging metadata (rsc)
  • 927f5d5 — 2025-11-05 (rsc)
  • aee5886 — .github/workflows: delete pr.yml (rsc)
  • cedd1b5 — CMakeLists: allow disabling 'make install' rules (rsc)
  • 11c643a — .github/workflows: drop Python 3.9, add Python 3.14 (rsc)
  • 6b6f716 — re2: prepare to tag 2025-11-05 (rsc)
  • 880c551 — re2: hoist a few loads out of BitState ShouldVisit (rsc)
  • dcd1f64 — BUILD: fix for Bazel 9.0.0rc1 (rsc)

🔒Security observations

The RE2 codebase demonstrates strong security foundations with explicit safety goals and a dedicated security reporting process (SECURITY.md). However, the web application component (app/) has moderate security concerns primarily related to dependency management and version pinning. The use of overly permissive npm dependency versions with wildcards and EOL software (Rollup v2) presents elevated risk for transitive vulnerability exposure. The core C++ library itself (re2/) appears well-architected with explicit bounds checking and memory budgeting to prevent DoS and stack overflow attacks. Recommendations focus on tightening dependency version constraints, upgrading to supported versions, and implementing standard web security practices for the web application layer.

  • High · Overly Permissive npm Dependency Versions — app/package.json. The package.json uses wildcard () and tilde (~) version specifiers for dependencies, which allows installation of potentially vulnerable versions. Specifically: 'lit': '' allows any version including major versions with breaking changes or security issues. '@rollup/plugin-node-resolve': '', '@rollup/plugin-terser': '', '@web/rollup-plugin-html': '', '@web/rollup-plugin-import-meta-assets': '', 'tslib': '', and 'typescript': '' all use permissive versioning. Fix: Pin dependencies to specific versions (e.g., 'lit': '3.1.0') or use caret (^) with upper bound constraints. Implement a lock file strategy (package-lock.json or yarn.lock) and regularly audit dependencies using 'npm audit' or 'yarn audit'. Consider using 'npm ci' in CI/CD pipelines instead of 'npm install'.
  • Medium · Rollup v2 Approaching End of Life — app/package.json. The package.json specifies 'rollup': '~2', which restricts to Rollup v2.x. Rollup v2 has reached end of life and no longer receives security updates. Version 3 and 4 contain important security fixes and improvements. Fix: Upgrade to the latest stable version of Rollup (currently v4). Update 'rollup': '~2' to 'rollup': '^4' and verify compatibility with related plugins like @rollup/plugin-node-resolve and @rollup/plugin-terser.
  • Medium · Potential XSS Risk in Web Application — app/. The presence of rollup configuration with @web/rollup-plugin-html and app/index.html, combined with TypeScript compilation (app/app.ts), suggests a web application that could be vulnerable to XSS if user input is not properly sanitized. The use of 'lit' library should mitigate some XSS risks through template escaping, but this needs verification. Fix: Ensure all user input is properly escaped and sanitized before rendering. Verify that the Lit library's templating is used correctly to prevent XSS. Implement Content Security Policy (CSP) headers. Perform security testing including manual and automated XSS testing.
  • Low · Missing Security Headers Configuration — app/build.sh, app/rollup.config.js. No evidence of security headers configuration (such as HSTS, X-Content-Type-Options, X-Frame-Options, CSP) in the visible build configuration files or application setup. Fix: Configure appropriate security headers in the web server or application middleware. At minimum, implement: X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Strict-Transport-Security header, and Content-Security-Policy.
  • Low · Build Script Visibility — app/build.sh. The app/build.sh script is stored in the repository and executed during build. If this script contains sensitive information or is modified maliciously, it could compromise the build process. Fix: Ensure build scripts do not contain secrets or credentials. Use environment variables for sensitive data. Implement code review requirements for build script changes. Consider using signed commits and branch protection rules.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · google/re2 — RepoPilot