rouge-ruby/rouge

Item: rouge-ruby/rouge
Rating: 5
Author: RepoPilot

A pure Ruby code highlighter that is compatible with Pygments

Healthy

Healthy across the board

worst of 4 axes

Use as dependencyConcerns

non-standard license (Other)

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 5d ago
✓32+ active contributors
✓Distributed ownership (top contributor 38% of recent commits)

Show 4 more →

✓Other licensed
✓CI configured
✓Tests present
⚠Non-standard license (Other) — review terms

What would change the summary?

→Use as dependency Concerns → Mixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/rouge-ruby/rouge)](https://repopilot.app/r/rouge-ruby/rouge)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/rouge-ruby/rouge on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: rouge-ruby/rouge

Generated by RepoPilot · 2026-05-10 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rouge-ruby/rouge shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 5d ago
32+ active contributors
Distributed ownership (top contributor 38% of recent commits)
Other licensed
CI configured
Tests present
⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live rouge-ruby/rouge repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/rouge-ruby/rouge.

What it runs against: a local clone of rouge-ruby/rouge — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in rouge-ruby/rouge | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 35 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>rouge-ruby/rouge</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rouge-ruby/rouge. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/rouge-ruby/rouge.git
#   cd rouge
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of rouge-ruby/rouge and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rouge-ruby/rouge(\\.git)?\\b" \\
  && ok "origin remote is rouge-ruby/rouge" \\
  || miss "origin remote is not rouge-ruby/rouge (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "lib/rouge.rb" \\
  && ok "lib/rouge.rb" \\
  || miss "missing critical file: lib/rouge.rb"
test -f "lib/rouge/cli.rb" \\
  && ok "lib/rouge/cli.rb" \\
  || miss "missing critical file: lib/rouge/cli.rb"
test -f "lib/rouge/lexer.rb" \\
  && ok "lib/rouge/lexer.rb" \\
  || miss "missing critical file: lib/rouge/lexer.rb"
test -f "lib/rouge/formatters/html.rb" \\
  && ok "lib/rouge/formatters/html.rb" \\
  || miss "missing critical file: lib/rouge/formatters/html.rb"
test -f "lib/rouge/token.rb" \\
  && ok "lib/rouge/token.rb" \\
  || miss "missing critical file: lib/rouge/token.rb"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 35 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~5d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rouge-ruby/rouge"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Rouge is a pure Ruby syntax highlighter that tokenizes source code in 200+ languages and outputs styled HTML or ANSI 256-color terminal text. It's Pygments-compatible, meaning HTML output integrates directly with Pygments CSS themes—this is the default highlighter for Jekyll static sites. Monolithic gem structure: lib/rouge/ contains the core (lexers, formatters, themes); lib/rouge/lexers/ holds 200+ language-specific tokenizers; lib/rouge/formatters/ provides HTML/ANSI/HTMLInline output adapters; lib/rouge/themes/ contains pre-built color schemes. bin/rougify is the CLI entry point. lib/rouge/demos/ stores sample code for each supported language.

👥Who it's for

Jekyll users and Ruby developers who need syntax highlighting without external Python dependencies. Documentation sites, blogs, and static site generators relying on Jekyll use Rouge to highlight code snippets in generated HTML. Gem maintainers also use Rouge when they want to avoid shipping Pygments as a dependency.

🌱Maturity & risk

Highly mature and production-ready. The project has CI/CD via GitHub Actions (ruby.yml workflow), YARD documentation, rubocop linting configured (.rubocop.yml present), semantic versioning via Gem releases, and active maintenance visible in the CHANGELOG. It's the de facto standard syntax highlighter in the Ruby ecosystem.

Low risk overall. Single-maintainer concern exists (typical for mature OSS gems), but the feature set is stable—new risk comes from the lexer ecosystem (200+ language parsers require ongoing maintenance as language grammars evolve). No complex external service dependencies visible; pure Ruby means no compiled extension maintenance burden.

Active areas of work

Active development on lexer coverage (demo files exist for 50+ languages suggesting ongoing additions). GitHub workflows are configured for automated testing. The CHANGELOG and issue templates (lexer-bug, lexer-request, library-bug, enhancement-request) suggest the team actively triages both lexer gaps and core functionality requests.

🚀Get running

git clone https://github.com/rouge-ruby/rouge.git
cd rouge
bundle install
bundle exec rougify foo.rb

Or to develop: bundle exec guard watches for changes (Guardfile present). Run tests via bundle exec rake.

Daily commands: As a library: require 'rouge' then Rouge::Lexers::Ruby.new.lex(code). As CLI: rougify file.rb or rougify style monokai.sublime > output.css. As dev: bundle exec guard for watch-mode testing, or bundle exec rougify --help to explore options.

🗺️Map of the codebase

lib/rouge.rb — Entry point that loads the entire Rouge library; defines the main API surface for lexer discovery and usage
lib/rouge/cli.rb — Command-line interface implementation; essential for understanding how Rouge is invoked outside the library
lib/rouge/lexer.rb — Base Lexer class that all 200+ language lexers inherit from; core abstraction for tokenization logic
lib/rouge/formatters/html.rb — HTML formatter implementation; primary output format and compatibility layer with Pygments
lib/rouge/token.rb — Token class that represents syntax elements; fundamental data structure flowing through all lexers
lib/rouge/regex_lexer.rb — RegexLexer base class used by most lexers; implements state machine pattern for tokenization
Contributing.md — Contributor guide covering lexer development conventions and submission process; required reading for lexer additions

🧩Components & responsibilities

CLI Handler (Ruby OptionParser) — Parses command-line arguments, orchestrates lexer/formatter selection, handles file I/O
- Failure mode: Syntax errors in CLI invocation or unsupported language/format combinations
Lexer Registry (Ruby class metadata (ancestors, name resolution)) — Maintains mapping of language names/aliases to lexer classes; enables dynamic lookup
- Failure mode: Lexer class not found or multiple conflicting registrations for same language
RegexLexer (Ruby Regex engine, String scanning) — Implements state machine pattern for tokenization; manages regex rule evaluation and state transitions
- Failure mode: Infinite loops in regex patterns, stack overflow in deeply nested states, performance degradation on pathological input
Token Stream (Ruby Enumerator for lazy evaluation) — Represents sequence of lexical elements flowing from lexer to formatter
- Failure mode: Malformed tokens (invalid type/value), incorrect token boundaries, missing tokens
HTML Formatter (String concatenation, HTML entity escaping) — Converts token stream to HTML with CSS classes; manages span nesting and attribute escaping
- Failure mode: HTML injection vulnerabilities if token values not properly escaped, malformed span structure
Theme Engine (Ruby class hierarchy for theme composition) — Provides color palettes and styling rules for token types; supports multiple theme variants
- Failure mode: Missing color definition for token type, theme inheritance conflicts

🔀Data flow

User input (source code + language) → CLI Handler — User provides code to highlight and specifies language/format via command line or library API
CLI Handler → undefined — undefined

🛠️How to make changes

Add a new language lexer

Create a new lexer file in lib/rouge/lexers/ following naming convention (language_name.rb) (lib/rouge/lexers/newlang.rb)
Inherit from RegexLexer or Lexer base class and define state machine using token rules (lib/rouge/lexer.rb)
Register lexer with 'title', 'aliases', and 'filenames' metadata (lib/rouge/lexers/newlang.rb)
Add demo code file for the language in lib/rouge/demos/ (lib/rouge/demos/newlang)
Submit pull request with tests; reference Contributing.md for guidelines (Contributing.md)

Add a new output formatter

Create new formatter class inheriting from Formatter base class (lib/rouge/formatters/formatter.rb)
Implement the 'format(tokens)' method to convert token stream to target format (lib/rouge/formatters/custom_formatter.rb)
Register formatter in lib/rouge.rb or appropriate registry if one exists (lib/rouge.rb)

Add a new color theme

Create theme class inheriting from Theme base class with color palette defined (lib/rouge/themes/mytheme.rb)
Define token type colors using the Theme DSL (style method) (lib/rouge/theme.rb)
Register theme with name and make available via theme lookup (lib/rouge/themes/mytheme.rb)

🔧Why these technologies

Pure Ruby implementation — Provides seamless integration with Ruby projects without native dependencies; easy distribution via RubyGems
State machine regex lexing — Balances simplicity and performance; regex-based approach is maintainable while covering most language syntax patterns
Token stream architecture — Decouples lexers from formatters; same tokens can be output as HTML, ANSI, LaTeX, or custom formats
Pygments compatibility layer — Allows migration from Pygments and reuse of existing CSS stylesheets designed for Pygments output

⚖️Trade-offs already made

Regex-based lexing instead of AST parsing
- Why: Faster development of new lexers and lower CPU overhead for most use cases
- Consequence: Less precise tokenization for complex language grammars; some edge cases may be incorrectly highlighted
State machine via regex patterns instead of compiler-like architecture
- Why: Simpler for contributors to write lexers without compiler knowledge
- Consequence: Some advanced language features may be difficult to express; less suitable for real-time IDE-grade highlighting
No external dependencies beyond Ruby stdlib
- Why: Simplifies installation and reduces dependency conflicts
- Consequence: Cannot leverage specialized lexing libraries; all logic must be implemented in pure Ruby

🚫Non-goals (don't propose these)

Real-time incremental highlighting for IDEs
Full semantic understanding of source code (e.g., cross-file type checking)
Language server protocol (LSP) implementation
Compilation of source code or validation beyond syntax
Desktop IDE integration or plugins
Handling of obscure dialect variants outside main language specifications

🪤Traps & gotchas

Lexer state machine complexity: The tokens DSL in lexers uses stateful regex matching with push/pop/goto state transitions—subtle bugs occur when regex ordering matters or when token groups overlap. Unicode handling: Ruby versions <2.6 have inconsistent Unicode regex behavior; lexers targeting special chars must test across versions. Demo file format: lib/rouge/demos/ files must match expected language syntax exactly—malformed demos cause test failures silently. Theme precedence: Inline styles (HTMLInline formatter) override theme colors; ensure consistent application. No external dependencies: Pure Ruby means regex edge cases can't be delegated to a proper parser generator—lexers are hand-crafted state machines prone to missing edge cases.

🏗️Architecture

💡Concepts to learn

Lexer state machine — Rouge lexers use regex-based state machines (push/pop/goto transitions) to tokenize code—understanding states is essential for writing or debugging lexers or fixing edge-case tokenization bugs
Token stream / token types — Rouge converts source code into a stream of (token_type, text) tuples (e.g., (Keyword, 'def'), (Name, 'foo'))—formatters style these based on type, so lexer correctness depends on assigning right types
Pygments compatibility — Rouge's HTML output uses CSS class names compatible with Pygments stylesheets—understanding this compatibility is critical for theme migration and CSS reuse
Syntax tree vs regex-based tokenization — Rouge uses regex state machines, not AST-based parsing—this is fast and works for most languages but misses context-dependent syntax (e.g., ambiguous symbols); knowing this limitation explains why some edge cases fail
ANSI 256-color codes — The ANSI formatter outputs terminal escape sequences for 256 colors; understanding the format and limitations is needed to debug terminal output or add new color features
CSS class naming conventions (BEM-like) — Rouge's HTML output uses flat class names like 'highlight', 'c', 'k' (from Pygments spec)—understanding the naming scheme is required to write custom CSS or extend themes
Ruby regex anchors and lookahead/lookbehind — Lexer tokens are defined using Ruby regex with lookahead (?=...) and lookbehind (?<=...) assertions to avoid consuming characters—mastering these is essential for writing correct lexer rules

rouge-ruby/rougify-web — Official web interface for Rouge; lets users paste code and choose lexer/theme in browser instead of CLI
jekyll/jekyll — Jekyll static site generator; uses Rouge as the default syntax highlighter (ships rouge gem as dependency)
github-linguist/linguist — GitHub's language detection library; complementary tool that identifies file language, often paired with Rouge for highlighting
tmm1/pygments.rb — Ruby wrapper around Pygments (Python); alternative to Rouge if you need an exact Pygments feature not yet ported to pure Ruby
chriskempson/base16 — Color scheme standard that Rouge's theme system is based on; source of the base16 themes shipped in lib/rouge/themes/

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive test coverage for lexer demos in lib/rouge/demos/

The repo has 200+ language lexers with demo files in lib/rouge/demos/, but there's no visible test suite validating that each demo file actually tokenizes correctly without errors. This would catch regressions when lexers are modified and ensure demo files remain valid.

[ ] Create test/test_demos.rb that iterates through all files in lib/rouge/demos/
[ ] For each demo file, instantiate the corresponding lexer and verify it can tokenize the demo content without raising exceptions
[ ] Add assertions for token count > 0 to ensure meaningful output
[ ] Run this test in the GitHub Actions CI workflow (.github/workflows/ruby.yml) to catch breaking changes

Create a GitHub Action workflow for testing lexer output consistency across Ruby versions

The existing .github/workflows/ruby.yml likely tests the library, but there's no visible workflow that validates lexer token output consistency (important for ensuring HTML/ANSI output doesn't change unexpectedly between minor versions). This would prevent subtle tokenization regressions.

[ ] Create .github/workflows/lexer-output-validation.yml that tests multiple Ruby versions (2.7, 3.0, 3.1, 3.2+)
[ ] For a subset of critical lexers (Ruby, Python, JavaScript, etc.), generate token output and compare against baseline snapshots
[ ] Store baseline snapshots in test/lexer_output_snapshots/ and allow reviewers to update them when intentional changes are made
[ ] Add a job that fails if unexpected changes are detected in token types or structure

Document lexer implementation patterns in docs/LexerDevelopment.md with concrete examples

docs/LexerDevelopment.md exists but the repo has 200+ lexers with varying complexity levels. New contributors struggle to understand which patterns to use. Adding annotated examples of simple, medium, and complex lexers would dramatically reduce PR review cycles.

[ ] Expand docs/LexerDevelopment.md with three worked examples: a simple lexer (e.g., lib/rouge/lexers/conf.rb), a stateful lexer (e.g., lib/rouge/lexers/ruby.rb), and one with custom callbacks
[ ] Document common pitfalls when writing regex-based vs. stateful lexers with specific code references
[ ] Add a section on testing lexers locally with examples of how to use bin/rougify to validate output
[ ] Include a checklist template that contributors can use when submitting new lexer PRs

🌿Good first issues

Audit lexer coverage: The file list shows 50+ demo files but 200+ supported languages—create a script in lib/ that lists lexers without corresponding demo files and open an issue proposing demo PRs for the top 10 missing ones (e.g., lib/rouge/lexers/julia.rb may lack a demo).
Fix Rubocop violations: .rubocop.yml and .rubocop_todo.yml exist, suggesting known style issues—run bundle exec rubocop locally, pick one violation from the todo file (e.g., line length in a specific lexer), and fix it with tests. Good way to learn lexer structure.
Add theme documentation: docs/LexerDevelopment.md exists but docs/ThemeDevelopment.md likely doesn't—create a guide explaining the Theme class, color slots (e.g., String, Keyword, Comment), and walk through creating a custom theme by copying lib/rouge/themes/base16.rb as reference.

⭐Top contributors

Click to expand

@jneen — 38 commits
@tancnle — 12 commits
@larouxn — 11 commits
@nsfisis — 8 commits
@UlyssesZh — 3 commits

📝Recent commits

Click to expand

bb51c1a — Add a lexer for Apache Thrift (#2284) (kpumuk)
821f59b — c/cpp: Update keywords/builtin types (#2283) (nsfisis)
effe9d4 — fix regex syntax for biml (#2280) (jneen)
3668f1e — limit error checking patterns in console lexer to 20, and fix rubocop (#2278) (jneen)
7123c43 — Maint.igor pro manual builtins (#2255) (jneen)
b30b188 — enable aria-hidden for line numbers in html_table (#2275) (UlyssesZh)
6ad058f — nest <code> in <pre> (#2276) (UlyssesZh)
d26ea45 — add formatter tag for html_linewise (#2273) (UlyssesZh)
cc1f395 — BIML: properly delegate to C# within strings (#2267) (jneen)
c3dfa0a — limit number of prompt strings to 20 (#2268) (jneen)

🔒Security observations

Rouge is a well-maintained open-source syntax highlighting library with a moderate security posture. The primary concern is ensuring XSS prevention when outputting HTML in web contexts. The library itself has minimal attack surface as it's primarily a text processing tool. No hardcoded secrets, exposed credentials, or critical infrastructure issues were identified. The lack of visible dependencies and security policy documentation represents minor gaps. The project would benefit from establishing a security policy and regularly auditing dependencies.

Medium · Potential Code Injection via Lexer Input — lib/rouge/cli.rb, lib/rouge (lexer implementations). Rouge is a syntax highlighter that processes arbitrary code input from users. If the highlighting output is used in web contexts without proper escaping, there's a risk of XSS attacks. The library processes user-supplied code that could contain malicious payloads. Fix: Ensure that all HTML output from Rouge is properly escaped before rendering in web contexts. Validate and sanitize user inputs before passing to lexers. Use context-aware output encoding.
Low · No Explicit Security Policy — .github/. The repository lacks a SECURITY.md file or security policy documentation for reporting vulnerabilities responsibly. Fix: Create a SECURITY.md file in the root directory with clear instructions for responsible disclosure of security vulnerabilities.
Low · Dependency Management Not Visible — Gemfile. The Gemfile content was not provided for analysis. Without visibility into dependencies, transitive vulnerabilities cannot be assessed. Fix: Review Gemfile.lock for outdated dependencies. Run bundle audit regularly to identify known vulnerabilities in dependencies. Consider using Dependabot for automated dependency updates.
Low · CLI Tool Potential Input Validation — bin/rougify, lib/rouge/cli.rb. The CLI tool (bin/rougify) processes user input that should be validated to prevent abuse or unexpected behavior. Fix: Implement input validation and sanitization for all CLI arguments. Add bounds checking for file size inputs and process timeouts.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

rouge-ruby/rouge

Embed the "Healthy" badge

Onboarding doc

Onboarding: rouge-ruby/rouge

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🧩Components & responsibilities

🔀Data flow

🛠️How to make changes

Add a new language lexer

Add a new output formatter

Add a new color theme

🔧Why these technologies

⚖️Trade-offs already made

🚫Non-goals (don't propose these)

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive test coverage for lexer demos in lib/rouge/demos/

Create a GitHub Action workflow for testing lexer output consistency across Ruby versions

Document lexer implementation patterns in docs/LexerDevelopment.md with concrete examples

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next