highlightjs/highlight.js
JavaScript syntax highlighter with language auto-detection and zero dependencies.
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 6mo ago
- ✓70+ active contributors
- ✓Distributed ownership (top contributor 19% of recent commits)
- ✓BSD-3-Clause licensed
- ✓CI configured
- ✓Tests present
- ⚠Slowing — last commit 6mo ago
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the “Healthy” badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/highlightjs/highlight.js)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/highlightjs/highlight.js on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: highlightjs/highlight.js
Generated by RepoPilot · 2026-05-06 · Source
Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/highlightjs/highlight.js shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
Verdict
GO — Healthy across all four use cases
- Last commit 6mo ago
- 70+ active contributors
- Distributed ownership (top contributor 19% of recent commits)
- BSD-3-Clause licensed
- CI configured
- Tests present
- ⚠ Slowing — last commit 6mo ago
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live highlightjs/highlight.js
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/highlightjs/highlight.js.
What it runs against: a local clone of highlightjs/highlight.js — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in highlightjs/highlight.js | Confirms the artifact applies here, not a fork |
| 2 | License is still BSD-3-Clause | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 221 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of highlightjs/highlight.js. If you don't
# have one yet, run these first:
#
# git clone https://github.com/highlightjs/highlight.js.git
# cd highlight.js
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of highlightjs/highlight.js and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "highlightjs/highlight.js(\\.git)?\\b" \\
&& ok "origin remote is highlightjs/highlight.js" \\
|| miss "origin remote is not highlightjs/highlight.js (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
&& ok "license is BSD-3-Clause" \\
|| miss "license drift — was BSD-3-Clause at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "src/highlight.js" \\
&& ok "src/highlight.js" \\
|| miss "missing critical file: src/highlight.js"
test -f "src/core.d.ts" \\
&& ok "src/core.d.ts" \\
|| miss "missing critical file: src/core.d.ts"
test -f "src/languages/bash.js" \\
&& ok "src/languages/bash.js" \\
|| miss "missing critical file: src/languages/bash.js"
test -f "package.json" \\
&& ok "package.json" \\
|| miss "missing critical file: package.json"
test -f "docs/language-guide.rst" \\
&& ok "docs/language-guide.rst" \\
|| miss "missing critical file: docs/language-guide.rst"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 221 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~191d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/highlightjs/highlight.js"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
TL;DR
Highlight.js is a JavaScript syntax highlighter with zero external dependencies that runs in browsers and Node.js. It automatically detects 190+ programming languages and applies semantic color-coding to code blocks via CSS classes. Core strength: small bundle size (~50KB minified), automatic language detection without requiring explicit language specification, and a declarative regex-based pattern system for defining language grammars. Monorepo-like structure: /src likely contains the core highlighter engine and grammar system, /languages contains 190+ language definition files, /demo contains a runnable browser demo in demo/index.html and demo/demo.js, /docs contains Sphinx documentation (Makefile, conf.py, .readthedocs.yaml for automated builds), and /build contains distribution artifacts. Main entry point inferred to be compiled from /src into minified dist files consumed via CDN (README.CDN.md) or npm.
Who it's for
Web developers building documentation sites, code review tools, blogging platforms, and educational software who need client-side syntax highlighting without jQuery/jQuery plugins. Also used by static site generators (Hugo, Jekyll) and JavaScript-based documentation frameworks (Docusaurus, Sphinx with readthedocs theme per .readthedocs.yaml). Contributors are language maintainers adding support for niche languages and core maintainers fixing detection edge cases.
Maturity & risk
Highly mature and production-ready. The repository shows ~1.5M lines of JavaScript code, active CI/CD with lint.yml and tests.js.yml workflows, comprehensive test coverage via .mocharc.json, release automation in .github/workflows/release.yml, and documented breaking changes (VERSION_10_BREAKING_CHANGES.md, VERSION_11_UPGRADE.md indicate established versioning discipline). Weekly npm downloads in the millions and active Discord community indicate stable, widely-adopted codebase.
Low technical risk but maintenance-heavy: zero dependencies eliminates supply chain risk, but supporting 190+ languages means high surface area for bug reports and language grammar edge cases. Single-maintainer risk is mitigated by MAINTAINERS_GUIDE.md and issue templates, but language-specific bugs may go unfixed if contributors abandon their grammar definitions. No specific breaking change alerts visible, though VERSION_10_BREAKING_CHANGES.md and VERSION_11_UPGRADE.md documents exist suggesting breaking changes are managed deliberately.
Active areas of work
Active release cycle with .github/workflows/release.yml automation and size tracking via size_report_comment.yml and size_report_create.yml (bundle size is tracked as breaking change metric). Pull request template at .github/pull_request_template.md suggests active review process. Language additions and bug fixes are primary work items based on issue templates: 1_incorrect-syntax-highlighting.md and z_request_language_repo.md are top templates, indicating common requests are language coverage and detection accuracy.
Get running
git clone https://github.com/highlightjs/highlight.js.git && cd highlight.js && npm install && npm test (inferred from .mocharc.json test configuration). For development: npm run build or make (Makefile in /docs suggests make is used for docs builds). Browser demo: open demo/index.html or npm run demo (exact command requires package.json inspection).
Daily commands: npm install && npm test (run test suite via Mocha). For demo: check package.json scripts (likely npm run demo or open demo/index.html directly in browser). For docs build locally: cd docs && make html (Makefile present). For release builds: npm run build (exact build tool inferred to be in package.json scripts, likely Webpack given size_report workflows).
Map of the codebase
src/highlight.js— Main entry point and core syntax highlighting engine; defines the public API and orchestrates language loading and highlighting logic.src/core.d.ts— TypeScript type definitions for the core API; essential for understanding the public interface and IDE support.src/languages/bash.js— Reference language definition implementation; demonstrates the canonical pattern for writing a language grammar file.package.json— Build configuration, dependencies, and npm scripts; defines the build and release process for the project.docs/language-guide.rst— Complete documentation on writing language definitions; critical reference for anyone adding or maintaining a language..eslintrc.lang.js— Linting rules specific to language definitions; ensures code style consistency across 180+ language files.CONTRIBUTING.md— Contribution guidelines covering language submission, testing, and PR process; required reading before submitting changes.
Components & responsibilities
- Core Engine (src/highlight.js) (JavaScript, regex, token stacking) — Orchestrates the highlighting pipeline; manages language registry, calls lexer, handles auto-detection, returns HTML output.
- Failure mode: If a language grammar is malformed or regex causes catastrophic backtracking, highlighting may fail or hang; no fallback except plain text.
- Language Definitions (src/languages/) (JavaScript DSL (modes, keywords, contains, regex)) — Defines tokenization rules for each supported language using regex patterns, keywords, and nested modes.
- Failure mode: Incorrect regex or incomplete keyword list leads to mis-highlighting; poor design causes false positives/negatives or performance issues.
- Lexer (JavaScript, regex execution) — Tokenizes code by applying language grammar rules in sequence; manages nested scopes and token stacking.
- Failure mode: Infinite loops or exponential backtracking in regex;
How to make changes
Add a new programming language
- Create a new language definition file in src/languages/ following the canonical pattern from src/languages/bash.js; define keywords, regex patterns, and token types (
src/languages/newlang.js) - Write test cases for your language in the test suite (typically a fixture with sample code and expected token classifications) (
.mocharc.json) - Run ESLint on your file to ensure it passes language-specific linting rules (
.eslintrc.lang.js) - Update SUPPORTED_LANGUAGES.md to document your language with category, aliases, and usage example (
SUPPORTED_LANGUAGES.md) - Submit a PR with your language file; maintainers will review for correctness, performance, and compliance with the language-guide.rst patterns (
CONTRIBUTING.md)
Fix incorrect syntax highlighting for a language
- Open the language definition file in src/languages/ and identify the regex pattern or keyword list causing the mismatch (
src/languages/javascript.js) - Consult docs/language-guide.rst for the DSL syntax (modes, keywords, contains, regex) and refer to nearby language files for similar patterns (
docs/language-guide.rst) - Update the pattern and verify the fix with test cases in the test suite (
.mocharc.json) - Run the full test suite with npm test to ensure no regressions in other languages (
package.json)
Build a custom or minified highlight.js bundle
- Edit package.json build scripts or use the Node.js build API to specify which languages and plugins to include (
package.json) - Run npm run build to generate optimized bundles in the build/ directory (
package.json) - Reference the generated highlight.min.js in your HTML or bundle it with your application (
demo/index.html)
Why these technologies
- JavaScript (zero dependencies) — Runs in browsers and Node.js without external dependencies; lightweight and fast for syntax highlighting tasks; used by millions of developers.
- Regex-based lexer with token stacking — Simple enough for language authors to write, efficient for parsing, and proven effective across 180+ languages; avoids complexity of full AST parsing.
- CSS class-based tokenization — Decouples highlighting from styling; allows themes to be swapped independently and integrates seamlessly with existing CSS frameworks.
- npm + CDN distribution — Reaches both Node.js developers (npm) and browser users (jsDelivr CDN); no build required for end users.
Trade-offs already made
-
Regex-based parsing instead of full AST
- Why: Reduces library size and keeps language definitions simple for contributors; enables fast syntax highlighting for interactive use cases.
- Consequence: Cannot perform semantic analysis; some ambiguous syntax may be mis-highlighted; edge cases in complex languages (e.g., preprocessor directives) require manual tokenization rules.
-
No external dependencies
- Why: Minimizes bundle size, reduces supply-chain risk, and simplifies deployment across diverse environments.
- Consequence: Must implement all utilities in-house; limits advanced features like incremental parsing or complex grammar composition.
-
Language definitions as JavaScript objects
- Why: Flexible, composable, and integrates with Node.js module system; allows dynamic language registration.
- Consequence: Higher learning curve than declarative formats (YAML, JSON); requires JavaScript knowledge from language contributors.
-
Single-pass lexer with no lookahead
- Why: Fast and memory-efficient; suitable for real-time syntax highlighting in editors.
- Consequence: Some context-dependent syntax cannot be highlighted correctly; language author must design patterns to work within single-pass constraints.
Non-goals (don't propose these)
- Does not perform semantic analysis or type checking
- Does not generate executable code or compile output
- Does not support real-time incremental parsing (must re-highlight from scratch)
- Does not handle bidirectional text or RTL languages specially
- Does not provide IDE features like code completion, refactoring, or linting
Traps & gotchas
Language definitions must match regex patterns in .eslintrc.lang.js or CI will fail—not all valid JavaScript patterns are allowed. Bundle size is actively tracked (size_report workflows) and increasing bundle size by >1KB may block PR merge. Auto-detection algorithm has priority rules (visible in core)—adding a new language can break detection of existing languages if patterns overlap; test carefully with highlightAuto(). CSS class names are part of the public API—changing them breaks downstream themes. Breaking changes require VERSION_X_BREAKING_CHANGES.md documentation and may require major version bump.
Architecture
Concepts to learn
- Regex-based tokenization — Highlight.js uses compiled regular expressions to tokenize code instead of building ASTs; understanding capture groups, lookahead/lookbehind, and pattern precedence is critical when writing or debugging language grammars in /languages/*.js
- Language auto-detection heuristics — The library can identify code language without explicit hints; detection algorithm scores matches across patterns, keyword density, and common language signatures—understanding priority rules prevents false positives when adding new languages
- CSS class-based theming — Highlight.js emits semantic CSS classes (hljs-string, hljs-function, etc.) not inline styles; downstream users apply color via external stylesheets, making themes composable and swappable without recompiling
- Nested scope stacking in grammar rules — Language definitions use begin/end patterns with scope nesting to handle contexts (e.g., strings containing interpolations); the core engine maintains a scope stack—critical for correct highlighting in nested language features
- Zero-dependency bundle size optimization — Bundle size is actively tracked by CI (size_report workflows); every language definition and core utility contributes to final output—understanding treeshaking, conditional exports, and per-language builds is essential for contributions
- Language grammar priority and specificity — When multiple patterns match the same text (e.g., Python f-strings vs. regular strings), pattern order in the grammar definition and specificity rules determine which highlighting wins; conflicts cause incorrect highlighting
- Mocha test harness with language-specific test files — Each language has a .test.js snapshot file that validates expected highlighting output; tests compare token stream against stored snapshots, making regression detection automatic when patterns change
Related repos
shikijs/shiki— Alternative syntax highlighter using TextMate grammars; competes for same use case but depends on external grammar formatPrismJS/prism— Competing JavaScript syntax highlighter with similar no-dependency promise and plugin architecture; older ecosystemhighlightjs/cdn-release— Official CDN distribution repository for highlight.js; where compiled bundles are published to jsDelivr and other CDNshighlightjs/highlight.js-obsidian— Community plugin integrating highlight.js into Obsidian markdown notes; shows real-world integration patternmicrosoft/TypeScript— Reference implementation for language detection and AST-based syntax analysis; highlight.js regex-based approach is intentionally simpler for performance
PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive test coverage for language definition validation in src/languages/
The repo has 100+ language files in src/languages/ but no visible dedicated test suite validating language definition structure, regex patterns, and required fields. A test suite could catch invalid patterns, missing required properties, and regressions when contributors add new languages. This would improve quality control for the frequent language contribution PRs.
- [ ] Create test/languages/ directory structure mirroring src/languages/
- [ ] Build a language schema validator that checks for required fields (keywords, patterns, aliases, etc.)
- [ ] Add test file that validates all language definitions against the schema
- [ ] Add tests for common regex pitfalls (catastrophic backtracking, invalid escapes)
- [ ] Integrate test into .github/workflows/tests.js.yml
- [ ] Document in CONTRIBUTING.md how to validate a new language contribution
Create missing GitHub Action workflow for automated SUPPORTED_LANGUAGES.md validation
The repo maintains SUPPORTED_LANGUAGES.md as source of truth for documented language support, but there's no automated workflow to ensure it stays in sync with actual src/languages/ directory contents. Contributors frequently forget to update this file. A workflow could auto-validate or auto-update this on PRs touching language files.
- [ ] Create .github/workflows/language-list-sync.yml workflow
- [ ] Build script to generate language list from src/languages/ directory
- [ ] Add validation step that compares generated list against SUPPORTED_LANGUAGES.md
- [ ] Configure workflow to run on PRs that modify src/languages/
- [ ] Add option to auto-commit updates or fail with helpful error message
- [ ] Document process in CONTRIBUTING.md
Add integration tests for CDN build output in src/highlight.js core functionality
The repo references README.CDN.md and .github/workflows/size_report_*.yml, indicating CDN is critical, but there's no visible test suite validating that the core highlight.js API works correctly after bundling (auto-detection, language registration, token generation). Add tests that verify the public API surface works as documented.
- [ ] Create test/api/ directory with tests for src/core.d.ts public interface
- [ ] Add tests for highlight() function with various options (language, auto-detect, etc.)
- [ ] Add tests for registerLanguage() and getLanguage() APIs
- [ ] Add tests for theme CSS class generation and token output format
- [ ] Add tests for language auto-detection against diverse code samples
- [ ] Integrate into .github/workflows/tests.js.yml to validate against built/minified output
Good first issues
- Add missing test coverage for a language in /languages that has no .test.js file—examine test structure in existing language tests and mirror it for an undertested language like Groovy or Hy (both present in codebase but may lack full test suites).
- Fix language detection conflicts: if Dockerfile and Shell syntax overlap in pattern matching, refine regex boundaries in both language definitions to reduce false positives—test with highlightAuto() against code samples in demo/.
- Document a language's grammar in docs/language-guide.rst by creating a case-study section showing how a specific language's .js file maps to highlighting rules—choose an underexplored language from SUPPORTED_LANGUAGES.md.
Top contributors
- @joshgoebel — 19 commits
- @Dxuian — 4 commits
- @witch-factory — 3 commits
- @vast — 3 commits
- @kshetline — 2 commits
Recent commits
5697ae5— (enh) Add 3rd party Abc Notation grammar to Supported Languages (#4291) (NriotHrreion)e1360be— (docs) JSON entries were duplicated (#4287) (LukeCz)98b649f— Add 3rd party L4 grammar to Supported Languages (#4282) (serrynaimo)c7d9973— Fix for CSS unicode-range support, issue #4253. (#4254) (kshetline)4d386f8— fix(csharp) Support digit separators(highlightjs#4258) (#4280) (te-ing)f661c55— (docs) add Bicep to SUPPORTED_LANGUAGES (#4273) (anthony-c-martin)e7041c4— (enh) added VS-Dark theme (#4272) (Twineee1)4a1baca— (fix) cybertopia themes render correctly in shadow DOM (#4277) (hbgl)3ec4894— Add support for JSON5 as extension of json.js. (#4259) (kshetline)2ceb591— Add Prisma as third-party language (#4252) (witch-factory)
Security observations
The highlight.js codebase has a moderate security posture with several remediable issues. The most critical concern is the use of outdated, end-of-life dependencies (Node.js 12 and old Python packages) that no longer receive security updates. Documentation gaps in the security policy and missing security configurations in Docker/nginx setup present additional risks. The core JavaScript library itself appears well-structured with no obvious injection vulnerabilities visible in the file structure, but the infrastructure and dependency management require immediate attention. Recommendation: Prioritize updating Node.js to LTS version, refresh Python dependencies, and complete security policy documentation.
- High · Outdated Node.js Base Image —
Dockerfile, line 1. The Dockerfile uses 'node:12-slim' which is an end-of-life Node.js version (EOL since April 2022). This version no longer receives security updates, exposing the container to known vulnerabilities in the Node.js runtime and its dependencies. Fix: Update to a current LTS version such as 'node:18-slim' or 'node:20-slim' to receive ongoing security patches and bug fixes. - High · Outdated Python Dependencies —
docs/requirements.txt (Dependencies). The docs/requirements.txt specifies 'myst-parser == 0.15.0' (from 2021) and 'sphinx-rtd-theme == 0.5.2' (from 2020). These pinned versions are significantly outdated and likely contain known security vulnerabilities. No minimum version constraints are documented for newer versions. Fix: Update to recent versions: 'myst-parser>=0.18.0' and 'sphinx-rtd-theme>=1.3.0'. Use flexible version constraints to allow security patches while preventing major breaking changes. - Medium · Incomplete Security Policy Documentation —
SECURITY.md. The SECURITY.md file appears truncated or incomplete. The section 'Reporting a Vulnerability' ends mid-sentence: 'Many vulnerabilities can simply be reported (and tracked) via o' - this leaves critical security reporting procedures undefined and may confuse users attempting to report vulnerabilities responsibly. Fix: Complete the security reporting procedure documentation. Include clear instructions for responsible disclosure, contact methods (security@, GitHub security advisory, etc.), expected response times, and any bug bounty information. - Medium · Unnecessary Port Exposure in Docker —
Dockerfile, line 18. The Dockerfile exposes port 80 (HTTP) by default running nginx. No HTTPS/TLS configuration is visible, and the service could be accessed over unencrypted HTTP if exposed to networks. Fix: Configure nginx to enforce HTTPS, implement TLS termination with a reverse proxy, or document security implications in deployment guides. Consider using port 443 for HTTPS by default. - Low · Missing Security Headers Configuration —
Dockerfile / nginx configuration. No nginx configuration files are visible in the provided file structure. The default nginx setup likely lacks security headers (X-Frame-Options, X-Content-Type-Options, CSP, etc.) that would protect against common web vulnerabilities. Fix: Create an nginx configuration file with security headers: X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Content-Security-Policy, Strict-Transport-Security, and X-XSS-Protection headers. - Low · No Runtime Security Scanning in CI/CD —
.github/workflows/. The GitHub workflows (.github/workflows/) are not detailed, but there's no evidence of dependency security scanning (npm audit), container scanning, or static analysis security testing (SAST) in the visible workflow files. Fix: Integrate security scanning: npm audit in lint workflow, Dependabot for automated updates, container scanning for Docker images, and consider tools like Snyk or WhiteSource for comprehensive vulnerability detection.
LLM-derived; treat as a starting point, not a security audit.
Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.