rouge-ruby/rouge
A pure Ruby code highlighter that is compatible with Pygments
Healthy across the board
worst of 4 axesnon-standard license (Other)
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 5d ago
- ✓32+ active contributors
- ✓Distributed ownership (top contributor 38% of recent commits)
Show 4 more →Show less
- ✓Other licensed
- ✓CI configured
- ✓Tests present
- ⚠Non-standard license (Other) — review terms
What would change the summary?
- →Use as dependency Concerns → Mixed if: clarify license terms
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/rouge-ruby/rouge)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/rouge-ruby/rouge on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: rouge-ruby/rouge
Generated by RepoPilot · 2026-05-10 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/rouge-ruby/rouge shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 5d ago
- 32+ active contributors
- Distributed ownership (top contributor 38% of recent commits)
- Other licensed
- CI configured
- Tests present
- ⚠ Non-standard license (Other) — review terms
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live rouge-ruby/rouge
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/rouge-ruby/rouge.
What it runs against: a local clone of rouge-ruby/rouge — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in rouge-ruby/rouge | Confirms the artifact applies here, not a fork |
| 2 | License is still Other | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 35 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of rouge-ruby/rouge. If you don't
# have one yet, run these first:
#
# git clone https://github.com/rouge-ruby/rouge.git
# cd rouge
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of rouge-ruby/rouge and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "rouge-ruby/rouge(\\.git)?\\b" \\
&& ok "origin remote is rouge-ruby/rouge" \\
|| miss "origin remote is not rouge-ruby/rouge (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
&& ok "license is Other" \\
|| miss "license drift — was Other at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "lib/rouge.rb" \\
&& ok "lib/rouge.rb" \\
|| miss "missing critical file: lib/rouge.rb"
test -f "lib/rouge/cli.rb" \\
&& ok "lib/rouge/cli.rb" \\
|| miss "missing critical file: lib/rouge/cli.rb"
test -f "lib/rouge/lexer.rb" \\
&& ok "lib/rouge/lexer.rb" \\
|| miss "missing critical file: lib/rouge/lexer.rb"
test -f "lib/rouge/formatters/html.rb" \\
&& ok "lib/rouge/formatters/html.rb" \\
|| miss "missing critical file: lib/rouge/formatters/html.rb"
test -f "lib/rouge/token.rb" \\
&& ok "lib/rouge/token.rb" \\
|| miss "missing critical file: lib/rouge/token.rb"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 35 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~5d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/rouge-ruby/rouge"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Rouge is a pure Ruby syntax highlighter that tokenizes source code in 200+ languages and outputs styled HTML or ANSI 256-color terminal text. It's Pygments-compatible, meaning HTML output integrates directly with Pygments CSS themes—this is the default highlighter for Jekyll static sites. Monolithic gem structure: lib/rouge/ contains the core (lexers, formatters, themes); lib/rouge/lexers/ holds 200+ language-specific tokenizers; lib/rouge/formatters/ provides HTML/ANSI/HTMLInline output adapters; lib/rouge/themes/ contains pre-built color schemes. bin/rougify is the CLI entry point. lib/rouge/demos/ stores sample code for each supported language.
👥Who it's for
Jekyll users and Ruby developers who need syntax highlighting without external Python dependencies. Documentation sites, blogs, and static site generators relying on Jekyll use Rouge to highlight code snippets in generated HTML. Gem maintainers also use Rouge when they want to avoid shipping Pygments as a dependency.
🌱Maturity & risk
Highly mature and production-ready. The project has CI/CD via GitHub Actions (ruby.yml workflow), YARD documentation, rubocop linting configured (.rubocop.yml present), semantic versioning via Gem releases, and active maintenance visible in the CHANGELOG. It's the de facto standard syntax highlighter in the Ruby ecosystem.
Low risk overall. Single-maintainer concern exists (typical for mature OSS gems), but the feature set is stable—new risk comes from the lexer ecosystem (200+ language parsers require ongoing maintenance as language grammars evolve). No complex external service dependencies visible; pure Ruby means no compiled extension maintenance burden.
Active areas of work
Active development on lexer coverage (demo files exist for 50+ languages suggesting ongoing additions). GitHub workflows are configured for automated testing. The CHANGELOG and issue templates (lexer-bug, lexer-request, library-bug, enhancement-request) suggest the team actively triages both lexer gaps and core functionality requests.
🚀Get running
git clone https://github.com/rouge-ruby/rouge.git
cd rouge
bundle install
bundle exec rougify foo.rb
Or to develop: bundle exec guard watches for changes (Guardfile present). Run tests via bundle exec rake.
Daily commands:
As a library: require 'rouge' then Rouge::Lexers::Ruby.new.lex(code). As CLI: rougify file.rb or rougify style monokai.sublime > output.css. As dev: bundle exec guard for watch-mode testing, or bundle exec rougify --help to explore options.
🗺️Map of the codebase
lib/rouge.rb— Entry point that loads the entire Rouge library; defines the main API surface for lexer discovery and usagelib/rouge/cli.rb— Command-line interface implementation; essential for understanding how Rouge is invoked outside the librarylib/rouge/lexer.rb— Base Lexer class that all 200+ language lexers inherit from; core abstraction for tokenization logiclib/rouge/formatters/html.rb— HTML formatter implementation; primary output format and compatibility layer with Pygmentslib/rouge/token.rb— Token class that represents syntax elements; fundamental data structure flowing through all lexerslib/rouge/regex_lexer.rb— RegexLexer base class used by most lexers; implements state machine pattern for tokenizationContributing.md— Contributor guide covering lexer development conventions and submission process; required reading for lexer additions
🧩Components & responsibilities
- CLI Handler (Ruby OptionParser) — Parses command-line arguments, orchestrates lexer/formatter selection, handles file I/O
- Failure mode: Syntax errors in CLI invocation or unsupported language/format combinations
- Lexer Registry (Ruby class metadata (ancestors, name resolution)) — Maintains mapping of language names/aliases to lexer classes; enables dynamic lookup
- Failure mode: Lexer class not found or multiple conflicting registrations for same language
- RegexLexer (Ruby Regex engine, String scanning) — Implements state machine pattern for tokenization; manages regex rule evaluation and state transitions
- Failure mode: Infinite loops in regex patterns, stack overflow in deeply nested states, performance degradation on pathological input
- Token Stream (Ruby Enumerator for lazy evaluation) — Represents sequence of lexical elements flowing from lexer to formatter
- Failure mode: Malformed tokens (invalid type/value), incorrect token boundaries, missing tokens
- HTML Formatter (String concatenation, HTML entity escaping) — Converts token stream to HTML with CSS classes; manages span nesting and attribute escaping
- Failure mode: HTML injection vulnerabilities if token values not properly escaped, malformed span structure
- Theme Engine (Ruby class hierarchy for theme composition) — Provides color palettes and styling rules for token types; supports multiple theme variants
- Failure mode: Missing color definition for token type, theme inheritance conflicts
🔀Data flow
User input (source code + language)→CLI Handler— User provides code to highlight and specifies language/format via command line or library APICLI Handler→undefined— undefined
🛠️How to make changes
Add a new language lexer
- Create a new lexer file in lib/rouge/lexers/ following naming convention (language_name.rb) (
lib/rouge/lexers/newlang.rb) - Inherit from RegexLexer or Lexer base class and define state machine using token rules (
lib/rouge/lexer.rb) - Register lexer with 'title', 'aliases', and 'filenames' metadata (
lib/rouge/lexers/newlang.rb) - Add demo code file for the language in lib/rouge/demos/ (
lib/rouge/demos/newlang) - Submit pull request with tests; reference Contributing.md for guidelines (
Contributing.md)
Add a new output formatter
- Create new formatter class inheriting from Formatter base class (
lib/rouge/formatters/formatter.rb) - Implement the 'format(tokens)' method to convert token stream to target format (
lib/rouge/formatters/custom_formatter.rb) - Register formatter in lib/rouge.rb or appropriate registry if one exists (
lib/rouge.rb)
Add a new color theme
- Create theme class inheriting from Theme base class with color palette defined (
lib/rouge/themes/mytheme.rb) - Define token type colors using the Theme DSL (style method) (
lib/rouge/theme.rb) - Register theme with name and make available via theme lookup (
lib/rouge/themes/mytheme.rb)
🔧Why these technologies
- Pure Ruby implementation — Provides seamless integration with Ruby projects without native dependencies; easy distribution via RubyGems
- State machine regex lexing — Balances simplicity and performance; regex-based approach is maintainable while covering most language syntax patterns
- Token stream architecture — Decouples lexers from formatters; same tokens can be output as HTML, ANSI, LaTeX, or custom formats
- Pygments compatibility layer — Allows migration from Pygments and reuse of existing CSS stylesheets designed for Pygments output
⚖️Trade-offs already made
-
Regex-based lexing instead of AST parsing
- Why: Faster development of new lexers and lower CPU overhead for most use cases
- Consequence: Less precise tokenization for complex language grammars; some edge cases may be incorrectly highlighted
-
State machine via regex patterns instead of compiler-like architecture
- Why: Simpler for contributors to write lexers without compiler knowledge
- Consequence: Some advanced language features may be difficult to express; less suitable for real-time IDE-grade highlighting
-
No external dependencies beyond Ruby stdlib
- Why: Simplifies installation and reduces dependency conflicts
- Consequence: Cannot leverage specialized lexing libraries; all logic must be implemented in pure Ruby
🚫Non-goals (don't propose these)
- Real-time incremental highlighting for IDEs
- Full semantic understanding of source code (e.g., cross-file type checking)
- Language server protocol (LSP) implementation
- Compilation of source code or validation beyond syntax
- Desktop IDE integration or plugins
- Handling of obscure dialect variants outside main language specifications
🪤Traps & gotchas
Lexer state machine complexity: The tokens DSL in lexers uses stateful regex matching with push/pop/goto state transitions—subtle bugs occur when regex ordering matters or when token groups overlap. Unicode handling: Ruby versions <2.6 have inconsistent Unicode regex behavior; lexers targeting special chars must test across versions. Demo file format: lib/rouge/demos/ files must match expected language syntax exactly—malformed demos cause test failures silently. Theme precedence: Inline styles (HTMLInline formatter) override theme colors; ensure consistent application. No external dependencies: Pure Ruby means regex edge cases can't be delegated to a proper parser generator—lexers are hand-crafted state machines prone to missing edge cases.
🏗️Architecture
💡Concepts to learn
- Lexer state machine — Rouge lexers use regex-based state machines (push/pop/goto transitions) to tokenize code—understanding states is essential for writing or debugging lexers or fixing edge-case tokenization bugs
- Token stream / token types — Rouge converts source code into a stream of (token_type, text) tuples (e.g., (Keyword, 'def'), (Name, 'foo'))—formatters style these based on type, so lexer correctness depends on assigning right types
- Pygments compatibility — Rouge's HTML output uses CSS class names compatible with Pygments stylesheets—understanding this compatibility is critical for theme migration and CSS reuse
- Syntax tree vs regex-based tokenization — Rouge uses regex state machines, not AST-based parsing—this is fast and works for most languages but misses context-dependent syntax (e.g., ambiguous symbols); knowing this limitation explains why some edge cases fail
- ANSI 256-color codes — The ANSI formatter outputs terminal escape sequences for 256 colors; understanding the format and limitations is needed to debug terminal output or add new color features
- CSS class naming conventions (BEM-like) — Rouge's HTML output uses flat class names like 'highlight', 'c', 'k' (from Pygments spec)—understanding the naming scheme is required to write custom CSS or extend themes
- Ruby regex anchors and lookahead/lookbehind — Lexer tokens are defined using Ruby regex with lookahead (?=...) and lookbehind (?<=...) assertions to avoid consuming characters—mastering these is essential for writing correct lexer rules
🔗Related repos
rouge-ruby/rougify-web— Official web interface for Rouge; lets users paste code and choose lexer/theme in browser instead of CLIjekyll/jekyll— Jekyll static site generator; uses Rouge as the default syntax highlighter (ships rouge gem as dependency)github-linguist/linguist— GitHub's language detection library; complementary tool that identifies file language, often paired with Rouge for highlightingtmm1/pygments.rb— Ruby wrapper around Pygments (Python); alternative to Rouge if you need an exact Pygments feature not yet ported to pure Rubychriskempson/base16— Color scheme standard that Rouge's theme system is based on; source of the base16 themes shipped in lib/rouge/themes/
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive test coverage for lexer demos in lib/rouge/demos/
The repo has 200+ language lexers with demo files in lib/rouge/demos/, but there's no visible test suite validating that each demo file actually tokenizes correctly without errors. This would catch regressions when lexers are modified and ensure demo files remain valid.
- [ ] Create test/test_demos.rb that iterates through all files in lib/rouge/demos/
- [ ] For each demo file, instantiate the corresponding lexer and verify it can tokenize the demo content without raising exceptions
- [ ] Add assertions for token count > 0 to ensure meaningful output
- [ ] Run this test in the GitHub Actions CI workflow (.github/workflows/ruby.yml) to catch breaking changes
Create a GitHub Action workflow for testing lexer output consistency across Ruby versions
The existing .github/workflows/ruby.yml likely tests the library, but there's no visible workflow that validates lexer token output consistency (important for ensuring HTML/ANSI output doesn't change unexpectedly between minor versions). This would prevent subtle tokenization regressions.
- [ ] Create .github/workflows/lexer-output-validation.yml that tests multiple Ruby versions (2.7, 3.0, 3.1, 3.2+)
- [ ] For a subset of critical lexers (Ruby, Python, JavaScript, etc.), generate token output and compare against baseline snapshots
- [ ] Store baseline snapshots in test/lexer_output_snapshots/ and allow reviewers to update them when intentional changes are made
- [ ] Add a job that fails if unexpected changes are detected in token types or structure
Document lexer implementation patterns in docs/LexerDevelopment.md with concrete examples
docs/LexerDevelopment.md exists but the repo has 200+ lexers with varying complexity levels. New contributors struggle to understand which patterns to use. Adding annotated examples of simple, medium, and complex lexers would dramatically reduce PR review cycles.
- [ ] Expand docs/LexerDevelopment.md with three worked examples: a simple lexer (e.g., lib/rouge/lexers/conf.rb), a stateful lexer (e.g., lib/rouge/lexers/ruby.rb), and one with custom callbacks
- [ ] Document common pitfalls when writing regex-based vs. stateful lexers with specific code references
- [ ] Add a section on testing lexers locally with examples of how to use bin/rougify to validate output
- [ ] Include a checklist template that contributors can use when submitting new lexer PRs
🌿Good first issues
- Audit lexer coverage: The file list shows 50+ demo files but 200+ supported languages—create a script in
lib/that lists lexers without corresponding demo files and open an issue proposing demo PRs for the top 10 missing ones (e.g.,lib/rouge/lexers/julia.rbmay lack a demo). - Fix Rubocop violations:
.rubocop.ymland.rubocop_todo.ymlexist, suggesting known style issues—runbundle exec rubocoplocally, pick one violation from the todo file (e.g., line length in a specific lexer), and fix it with tests. Good way to learn lexer structure. - Add theme documentation:
docs/LexerDevelopment.mdexists butdocs/ThemeDevelopment.mdlikely doesn't—create a guide explaining theThemeclass, color slots (e.g.,String,Keyword,Comment), and walk through creating a custom theme by copyinglib/rouge/themes/base16.rbas reference.
⭐Top contributors
Click to expand
Top contributors
- @jneen — 38 commits
- @tancnle — 12 commits
- @larouxn — 11 commits
- @nsfisis — 8 commits
- @UlyssesZh — 3 commits
📝Recent commits
Click to expand
Recent commits
bb51c1a— Add a lexer for Apache Thrift (#2284) (kpumuk)821f59b— c/cpp: Update keywords/builtin types (#2283) (nsfisis)effe9d4— fix regex syntax for biml (#2280) (jneen)3668f1e— limit error checking patterns in console lexer to 20, and fix rubocop (#2278) (jneen)7123c43— Maint.igor pro manual builtins (#2255) (jneen)b30b188— enable aria-hidden for line numbers in html_table (#2275) (UlyssesZh)6ad058f— nest <code> in <pre> (#2276) (UlyssesZh)d26ea45— add formatter tag for html_linewise (#2273) (UlyssesZh)cc1f395— BIML: properly delegate to C# within strings (#2267) (jneen)c3dfa0a— limit number of prompt strings to 20 (#2268) (jneen)
🔒Security observations
Rouge is a well-maintained open-source syntax highlighting library with a moderate security posture. The primary concern is ensuring XSS prevention when outputting HTML in web contexts. The library itself has minimal attack surface as it's primarily a text processing tool. No hardcoded secrets, exposed credentials, or critical infrastructure issues were identified. The lack of visible dependencies and security policy documentation represents minor gaps. The project would benefit from establishing a security policy and regularly auditing dependencies.
- Medium · Potential Code Injection via Lexer Input —
lib/rouge/cli.rb, lib/rouge (lexer implementations). Rouge is a syntax highlighter that processes arbitrary code input from users. If the highlighting output is used in web contexts without proper escaping, there's a risk of XSS attacks. The library processes user-supplied code that could contain malicious payloads. Fix: Ensure that all HTML output from Rouge is properly escaped before rendering in web contexts. Validate and sanitize user inputs before passing to lexers. Use context-aware output encoding. - Low · No Explicit Security Policy —
.github/. The repository lacks a SECURITY.md file or security policy documentation for reporting vulnerabilities responsibly. Fix: Create a SECURITY.md file in the root directory with clear instructions for responsible disclosure of security vulnerabilities. - Low · Dependency Management Not Visible —
Gemfile. The Gemfile content was not provided for analysis. Without visibility into dependencies, transitive vulnerabilities cannot be assessed. Fix: Review Gemfile.lock for outdated dependencies. Runbundle auditregularly to identify known vulnerabilities in dependencies. Consider using Dependabot for automated dependency updates. - Low · CLI Tool Potential Input Validation —
bin/rougify, lib/rouge/cli.rb. The CLI tool (bin/rougify) processes user input that should be validated to prevent abuse or unexpected behavior. Fix: Implement input validation and sanitization for all CLI arguments. Add bounds checking for file size inputs and process timeouts.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.