oracle/opengrok
OpenGrok is a fast and usable source code search and cross reference engine, written in Java
Mixed signals — read the receipts
weakest axisnon-standard license (Other); no tests detected
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 1d ago
- ✓7 active contributors
- ✓Other licensed
Show all 7 evidence items →Show less
- ✓CI configured
- ⚠Concentrated ownership — top contributor handles 73% of recent commits
- ⚠Non-standard license (Other) — review terms
- ⚠No test directory detected
What would change the summary?
- →Use as dependency Concerns → Mixed if: clarify license terms
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/oracle/opengrok)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/oracle/opengrok on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: oracle/opengrok
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/oracle/opengrok shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Mixed signals — read the receipts
- Last commit 1d ago
- 7 active contributors
- Other licensed
- CI configured
- ⚠ Concentrated ownership — top contributor handles 73% of recent commits
- ⚠ Non-standard license (Other) — review terms
- ⚠ No test directory detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live oracle/opengrok
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/oracle/opengrok.
What it runs against: a local clone of oracle/opengrok — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in oracle/opengrok | Confirms the artifact applies here, not a fork |
| 2 | License is still Other | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of oracle/opengrok. If you don't
# have one yet, run these first:
#
# git clone https://github.com/oracle/opengrok.git
# cd opengrok
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of oracle/opengrok and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "oracle/opengrok(\\.git)?\\b" \\
&& ok "origin remote is oracle/opengrok" \\
|| miss "origin remote is not oracle/opengrok (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
&& ok "license is Other" \\
|| miss "license drift — was Other at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/oracle/opengrok"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
OpenGrok is a fast, full-text source code search and cross-reference engine written in Java that indexes multiple programming languages (40+) and version control systems to enable developers to navigate, search, and understand large codebases. It provides syntax-highlighted code browsing, symbol cross-referencing, and search capabilities comparable to tools like LXR or Krugle, but optimized for speed and usability across heterogeneous repositories. Multi-module Maven monorepo: opengrok-dist (distribution root) depends on opengrok (core indexing/search engine), opengrok-web (WAR web UI), and tools (Python scripting layer). Language analyzers are in the core module alongside the Lucene-based index. The web frontend (JavaScript/HTML under src) wraps REST APIs. Dev tooling in /dev includes Checkstyle (style.xml), FindBugs filters, and Docker build scripts.
👥Who it's for
Enterprise development teams and open-source projects that maintain large, polyglot codebases and need a self-hosted code search/navigation platform; DevOps engineers deploying OpenGrok via Docker; contributors to projects using OpenGrok who want faster code exploration than grep/IDE search.
🌱Maturity & risk
Highly mature and production-ready: currently at version 1.14.11 with consistent semantic versioning, strong CI/CD via GitHub Actions (build, CodeQL, Docker, release workflows), active maintenance evidenced by Dependabot integration and recent Docker/release automation. The project originated at Sun Microsystems and is now maintained by Oracle, indicating long-term institutional backing.
Standard open source risks apply.
Active areas of work
Active maintenance on release automation (dev/release.sh), Docker container support (Dockerfile, workflows/docker.yml), and code quality gates (CodeQL security scanning, SonarCloud integration visible in README badges). Recent focus areas include Dependabot-managed dependency updates and standardized pull request templates (.github/PULL_REQUEST_TEMPLATE.md).
🚀Get running
Clone and build with Maven: git clone https://github.com/oracle/opengrok.git && cd opengrok && mvn clean package -DskipTests. For Docker: docker build -t opengrok . && docker run -p 8080:8080 opengrok (see docker/README.md). Development environment setup documented in CONTRIBUTING.md and dev/README.
Daily commands:
Development: mvn clean install && java -jar opengrok/target/opengrok.jar with indexing configuration via IndexDatabase class. Web UI: runs on embedded Jetty at http://localhost:8080/opengrok by default post-deployment. Indexing: opengrok/bin/OpenGrok wrapper script (shell/Python hybrid in dev/). Full setup: see https://github.com/oracle/opengrok/wiki/How-to-setup-OpenGrok.
🗺️Map of the codebase
- opengrok/src/org/opengrok/indexer/analysis: Core language analyzer implementations; adding new language support requires extending Analyzer here
- opengrok/src/org/opengrok/indexer/index/IndexDatabase.java: Main indexing orchestrator using Lucene; critical for understanding how repositories are crawled and indexed
- opengrok-web/src/main/webapp: Web UI JSPs and JavaScript; where search results rendering and UI interactivity logic lives
- dev/checkstyle/style.xml: Project code style enforcement; must follow before contributing Java code
- .github/workflows: CI/CD pipeline definitions; shows how builds, tests, and releases are automated
- distribution/pom.xml: Distribution assembly configuration; controls packaging of final deployable artifacts
- Dockerfile: Container image definition for Docker deployment; critical for containerized deployments
🛠️How to make changes
For new language support: add analyzer to opengrok/src/org/opengrok/indexer/analysis/ matching existing language pattern (Java example: opengrok/src/org/opengrok/indexer/analysis/java/JavaAnalyzer.java). For UI changes: edit opengrok-web/src/main/webapp/ (JSP, HTML, JavaScript). For search features: modify opengrok/src/org/opengrok/indexer/search/. For CLI improvements: update dev/ shell/Python scripts and opengrok-dist distribution config. Run tests with mvn test before submitting PRs.
🪤Traps & gotchas
- Reindexing requirement: major/minor version bumps require full repository reindex (semantic versioning policy in README section 2.1); not transparent upgrade. 2) Language analyzer performance: adding new language support requires understanding Lucene Analyzer/Tokenizer chain; naive implementations tank indexing speed on large repos. 3) Maven wrapper in .mvn/wrapper/ must be used (not system Maven) for reproducible builds. 4) Web UI paths hardcoded to /opengrok context root in many JSPs; changing deployment context requires config edits. 5) Index filesystem must be shared/mounted across horizontally scaled instances; no built-in distributed indexing. 6) Python tooling scripts (dev/) expect Unix environment; Windows support via WSL or Docker only.
💡Concepts to learn
- Lucene Analyzer/Tokenizer Chain — Every language parser in OpenGrok extends Lucene's Analyzer; understanding token streams, filters, and field indexing is essential for adding language support or tuning search precision
- Cross-reference Graph / Symbol Resolution — OpenGrok builds a symbol table across files to enable 'find all usages' and 'go to definition'; this is distinct from full-text search and requires language-aware parsing and scope tracking
- Version Control Abstraction Layer — OpenGrok supports Git, Mercurial, SVN, Bazaar, etc. via a pluggable Repository interface; understanding this abstraction is needed to add new VCS support or fix blame/history features
- Incremental Indexing — Full reindex on every repository change is slow; OpenGrok tracks file mtimes and incremental changes to avoid re-tokenizing unchanged files, critical for performance on live repos
- Lex/Yacc-style Lexical Analysis — OpenGrok includes hand-written .lex lexer definitions (380K lines) for precise token boundary detection in specialized languages; understanding lexical scanning is necessary for debugging parsing issues
- WAR / Servlet Deployment Model — OpenGrok web UI is a traditional WAR artifact deployed to Tomcat/Jetty; understanding servlet lifecycle, JSP compilation, and context paths is needed for web tier modifications
- Semantic Versioning with Index Compatibility — OpenGrok's versioning scheme encodes index compatibility breaks; major bumps break index format, minor bumps require reindex, micro bumps are safe redeploys—this is a non-standard semantic versioning convention specific to this project
🔗Related repos
ctags/ctags— Universal-ctags is the tag/symbol parser OpenGrok uses for cross-referencing; dev/install-universal_ctags.sh explicitly depends on itapache/lucene— Apache Lucene is OpenGrok's underlying full-text indexing engine; all search/index operations delegate to Lucene's APIElasticSearch/elasticsearch— Alternative enterprise code search solution built on Lucene; relevant for organizations evaluating centralized vs. self-hosted code searchsourcegraph/sourcegraph— Modern cloud-native code search platform; primary commercial competitor to OpenGrok with different architecture (distributed indexing, cloud-first)github/gitignore— OpenGrok respects .gitignore for repository crawling; .gitignore parsing logic integrated into VCS abstraction layer
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add GitHub Actions workflow for Python code quality in docker/ module
The docker/ directory contains Python scripts (docker/start.py, docker/periodic_timer.py, docker/requirements.txt) but there's no dedicated CI workflow for Python linting, type checking, or testing. The .github/workflows directory has build.yml, codeql-analysis.yml, and docker.yml, but none specifically validate Python code quality. This would catch issues early in Python-based Docker tooling and maintenance scripts.
- [ ] Create .github/workflows/python-lint.yml with steps for: black formatting check, pylint/flake8 on docker/*.py files
- [ ] Add mypy type checking configuration for docker/start.py and docker/periodic_timer.py
- [ ] Reference docker/.isort.cfg in the workflow for import sorting validation
- [ ] Add workflow trigger on push/PR to docker/ directory changes
- [ ] Document Python setup requirements in dev/install-python-packages.sh if not already complete
Add integration tests for opengrok-indexer maven module
The opengrok-indexer/pom.xml exists as a key module but the file structure shows only opengrok-indexer/index/dirty and opengrok-indexer/src/main/java paths. There's likely minimal or no integration test coverage for the indexer functionality. Given this is the core indexing engine, integration tests validating index creation, cross-reference generation, and search queries would significantly improve reliability.
- [ ] Create opengrok-indexer/src/test/java/org/opengrok/indexer/ directory structure for integration tests
- [ ] Write test class for basic index creation workflow with sample source files
- [ ] Add tests for xref (cross-reference) generation validation
- [ ] Create test fixtures in opengrok-indexer/src/test/resources/ with small sample projects
- [ ] Update opengrok-indexer/pom.xml to include maven-failsafe-plugin for integration test execution
Document and add tests for existing checkstyle/PMD configuration enforcement
The dev/checkstyle/style.xml, suppressions.xml, fileheader.txt and dev/pmd_ruleset.xml files exist but there's no visible CI workflow enforcing these rules (codeql-analysis.yml is for security scanning, not style). The README.md and CONTRIBUTING.md don't document code style requirements. This creates inconsistent contributions and hidden technical debt.
- [ ] Create .github/workflows/code-quality.yml to run maven-checkstyle-plugin and maven-pmd-plugin on PR
- [ ] Update CONTRIBUTING.md with section on 'Code Style Requirements' linking to dev/checkstyle/style.xml rules
- [ ] Add section in CONTRIBUTING.md explaining the fileheader requirement (dev/checkstyle/fileheader.txt)
- [ ] Configure workflow to fail on checkstyle/PMD violations and provide clear error messages
- [ ] Add local enforcement documentation: 'Run
mvn checkstyle:check pmd:checkbefore submitting PR' in dev/README
🌿Good first issues
- Add missing language analyzer tests: Language-specific test classes missing for several supported formats (Lua, Verilog, HCL mentioned in file stats but no obvious test files). Create LuaAnalyzerTest.java in opengrok/src/test/org/opengrok/indexer/analysis/ with tokenization assertions.
- Expand Dockerfile documentation: docker/README.md exists but lacks examples for custom configuration volumes, reindex workflows, and multi-container compose setups. Add Docker Compose example and configuration env var reference.
- Polish error messages in IndexDatabase.java: Current exception handling is verbose; create user-friendly error catalog with actionable guidance (e.g., 'Repository not readable' → 'Check file permissions and OPENGROK_JAVA_OPTS').
⭐Top contributors
Click to expand
Top contributors
- @vladak — 73 commits
- @dependabot[bot] — 17 commits
- @gaborbernat — 6 commits
- @urunsiyabend — 1 commits
- @vidya381 — 1 commits
📝Recent commits
Click to expand
Recent commits
bc58fb4— allow headers_file directive to be used at the top level (#4945) (vladak)76502c4— remove localhost bypass for API checks (#4944) (vladak)59f7692— Resolve #4384 : add COBOL language analyzer (urunsiyabend)fec1022— 🐛 fix(search-api): stable result order and configurable per-file hit limit (#4935) (gaborbernat)e337812— Bump github/codeql-action (dependabot[bot])8e7ef6d— change security-events permissions to write (#4939) (vladak)317d041— add security-events: read permission (#4938) (vladak)6a67eeb— check Github actions with Macaron (#4936) (vladak)5bcca5a— Bump actions/upload-artifact from 6 to 7 (dependabot[bot])1f817cc— Bump tomcat from 10.1.52-jdk21 to 10.1.54-jdk21 (dependabot[bot])
🔒Security observations
OpenGrok demonstrates a reasonable security posture with established security reporting procedures (SECURITY.md) and proper use of build automation. However, there are areas for improvement: the distribution pom.xml has a malformed configuration that needs correction, the Dockerfile should use pinned base image digests for reproducibility, and comprehensive analysis of the web module's security headers configuration is needed. The project uses Maven for dependency management with appropriate plugin versions. No hardcoded credentials or secrets were detected in the provided snippets. Overall, the codebase appears well-maintained with GitHub Actions CI/CD, code quality monitoring via SonarQube, and coverage tracking.
- Medium · Incomplete Maven Dependency Plugin Configuration —
distribution/pom.xml. The pom.xml file for the distribution module has a truncated maven-dependency-plugin configuration. The closing tag for 'overWriteReleases' appears to be malformed (over/over instead of /over), which could cause build failures or unexpected behavior during dependency resolution. Fix: Complete and validate the Maven plugin configuration. Ensure all XML tags are properly closed and the configuration is well-formed. - Medium · Potential Insecure Dockerfile Base Image —
Dockerfile. The Dockerfile uses 'ubuntu:jammy' without pinning to a specific image digest. While Ubuntu is generally maintained, using unversioned base images can lead to unexpected updates and potential security issues. The image should be pinned to a specific digest for reproducibility. Fix: Pin the Ubuntu base image to a specific digest: 'FROM ubuntu:jammy@sha256:...' to ensure consistent builds and prevent unexpected security updates or image changes. - Low · Maven Wrapper Permissions Not Explicitly Set —
Dockerfile. The Dockerfile copies mvnw and .mvn directory but does not explicitly verify or set executable permissions. While this may work in Docker context, it could be a source of confusion in development environments. Fix: Add explicit RUN chmod +x /mvn/mvnw to ensure the Maven wrapper script has proper execute permissions across all environments. - Low · Missing Security Headers Configuration —
opengrok-web module (not fully analyzed). Based on the file structure showing an opengrok-web module, typical web applications require security headers (Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, etc.). No evidence of security header configuration is visible in the provided snippets. Fix: Implement security headers in the web module's filter chain or web.xml configuration. Add CSP, HSTS, and other OWASP-recommended headers. - Low · Incomplete POM.xml Snippet —
distribution/pom.xml. The provided pom.xml snippet is truncated at the maven-dependency-plugin configuration, making it impossible to verify the complete dependency list and plugin configurations for potential security issues. Fix: Provide complete pom.xml files for thorough security analysis of all dependencies, plugins, and configurations.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.