RepoPilotOpen in app →

oracle/opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java

Mixed

Mixed signals — read the receipts

weakest axis
Use as dependencyConcerns

non-standard license (Other); no tests detected

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 1d ago
  • 7 active contributors
  • Other licensed
Show all 7 evidence items →
  • CI configured
  • Concentrated ownership — top contributor handles 73% of recent commits
  • Non-standard license (Other) — review terms
  • No test directory detected
What would change the summary?
  • Use as dependency ConcernsMixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Forkable
[![RepoPilot: Forkable](https://repopilot.app/api/badge/oracle/opengrok?axis=fork)](https://repopilot.app/r/oracle/opengrok)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/oracle/opengrok on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: oracle/opengrok

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/oracle/opengrok shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Mixed signals — read the receipts

  • Last commit 1d ago
  • 7 active contributors
  • Other licensed
  • CI configured
  • ⚠ Concentrated ownership — top contributor handles 73% of recent commits
  • ⚠ Non-standard license (Other) — review terms
  • ⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live oracle/opengrok repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/oracle/opengrok.

What it runs against: a local clone of oracle/opengrok — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in oracle/opengrok | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>oracle/opengrok</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of oracle/opengrok. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/oracle/opengrok.git
#   cd opengrok
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of oracle/opengrok and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "oracle/opengrok(\\.git)?\\b" \\
  && ok "origin remote is oracle/opengrok" \\
  || miss "origin remote is not oracle/opengrok (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/oracle/opengrok"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

OpenGrok is a fast, full-text source code search and cross-reference engine written in Java that indexes multiple programming languages (40+) and version control systems to enable developers to navigate, search, and understand large codebases. It provides syntax-highlighted code browsing, symbol cross-referencing, and search capabilities comparable to tools like LXR or Krugle, but optimized for speed and usability across heterogeneous repositories. Multi-module Maven monorepo: opengrok-dist (distribution root) depends on opengrok (core indexing/search engine), opengrok-web (WAR web UI), and tools (Python scripting layer). Language analyzers are in the core module alongside the Lucene-based index. The web frontend (JavaScript/HTML under src) wraps REST APIs. Dev tooling in /dev includes Checkstyle (style.xml), FindBugs filters, and Docker build scripts.

👥Who it's for

Enterprise development teams and open-source projects that maintain large, polyglot codebases and need a self-hosted code search/navigation platform; DevOps engineers deploying OpenGrok via Docker; contributors to projects using OpenGrok who want faster code exploration than grep/IDE search.

🌱Maturity & risk

Highly mature and production-ready: currently at version 1.14.11 with consistent semantic versioning, strong CI/CD via GitHub Actions (build, CodeQL, Docker, release workflows), active maintenance evidenced by Dependabot integration and recent Docker/release automation. The project originated at Sun Microsystems and is now maintained by Oracle, indicating long-term institutional backing.

Standard open source risks apply.

Active areas of work

Active maintenance on release automation (dev/release.sh), Docker container support (Dockerfile, workflows/docker.yml), and code quality gates (CodeQL security scanning, SonarCloud integration visible in README badges). Recent focus areas include Dependabot-managed dependency updates and standardized pull request templates (.github/PULL_REQUEST_TEMPLATE.md).

🚀Get running

Clone and build with Maven: git clone https://github.com/oracle/opengrok.git && cd opengrok && mvn clean package -DskipTests. For Docker: docker build -t opengrok . && docker run -p 8080:8080 opengrok (see docker/README.md). Development environment setup documented in CONTRIBUTING.md and dev/README.

Daily commands: Development: mvn clean install && java -jar opengrok/target/opengrok.jar with indexing configuration via IndexDatabase class. Web UI: runs on embedded Jetty at http://localhost:8080/opengrok by default post-deployment. Indexing: opengrok/bin/OpenGrok wrapper script (shell/Python hybrid in dev/). Full setup: see https://github.com/oracle/opengrok/wiki/How-to-setup-OpenGrok.

🗺️Map of the codebase

🛠️How to make changes

For new language support: add analyzer to opengrok/src/org/opengrok/indexer/analysis/ matching existing language pattern (Java example: opengrok/src/org/opengrok/indexer/analysis/java/JavaAnalyzer.java). For UI changes: edit opengrok-web/src/main/webapp/ (JSP, HTML, JavaScript). For search features: modify opengrok/src/org/opengrok/indexer/search/. For CLI improvements: update dev/ shell/Python scripts and opengrok-dist distribution config. Run tests with mvn test before submitting PRs.

🪤Traps & gotchas

  1. Reindexing requirement: major/minor version bumps require full repository reindex (semantic versioning policy in README section 2.1); not transparent upgrade. 2) Language analyzer performance: adding new language support requires understanding Lucene Analyzer/Tokenizer chain; naive implementations tank indexing speed on large repos. 3) Maven wrapper in .mvn/wrapper/ must be used (not system Maven) for reproducible builds. 4) Web UI paths hardcoded to /opengrok context root in many JSPs; changing deployment context requires config edits. 5) Index filesystem must be shared/mounted across horizontally scaled instances; no built-in distributed indexing. 6) Python tooling scripts (dev/) expect Unix environment; Windows support via WSL or Docker only.

💡Concepts to learn

  • Lucene Analyzer/Tokenizer Chain — Every language parser in OpenGrok extends Lucene's Analyzer; understanding token streams, filters, and field indexing is essential for adding language support or tuning search precision
  • Cross-reference Graph / Symbol Resolution — OpenGrok builds a symbol table across files to enable 'find all usages' and 'go to definition'; this is distinct from full-text search and requires language-aware parsing and scope tracking
  • Version Control Abstraction Layer — OpenGrok supports Git, Mercurial, SVN, Bazaar, etc. via a pluggable Repository interface; understanding this abstraction is needed to add new VCS support or fix blame/history features
  • Incremental Indexing — Full reindex on every repository change is slow; OpenGrok tracks file mtimes and incremental changes to avoid re-tokenizing unchanged files, critical for performance on live repos
  • Lex/Yacc-style Lexical Analysis — OpenGrok includes hand-written .lex lexer definitions (380K lines) for precise token boundary detection in specialized languages; understanding lexical scanning is necessary for debugging parsing issues
  • WAR / Servlet Deployment Model — OpenGrok web UI is a traditional WAR artifact deployed to Tomcat/Jetty; understanding servlet lifecycle, JSP compilation, and context paths is needed for web tier modifications
  • Semantic Versioning with Index Compatibility — OpenGrok's versioning scheme encodes index compatibility breaks; major bumps break index format, minor bumps require reindex, micro bumps are safe redeploys—this is a non-standard semantic versioning convention specific to this project
  • ctags/ctags — Universal-ctags is the tag/symbol parser OpenGrok uses for cross-referencing; dev/install-universal_ctags.sh explicitly depends on it
  • apache/lucene — Apache Lucene is OpenGrok's underlying full-text indexing engine; all search/index operations delegate to Lucene's API
  • ElasticSearch/elasticsearch — Alternative enterprise code search solution built on Lucene; relevant for organizations evaluating centralized vs. self-hosted code search
  • sourcegraph/sourcegraph — Modern cloud-native code search platform; primary commercial competitor to OpenGrok with different architecture (distributed indexing, cloud-first)
  • github/gitignore — OpenGrok respects .gitignore for repository crawling; .gitignore parsing logic integrated into VCS abstraction layer

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add GitHub Actions workflow for Python code quality in docker/ module

The docker/ directory contains Python scripts (docker/start.py, docker/periodic_timer.py, docker/requirements.txt) but there's no dedicated CI workflow for Python linting, type checking, or testing. The .github/workflows directory has build.yml, codeql-analysis.yml, and docker.yml, but none specifically validate Python code quality. This would catch issues early in Python-based Docker tooling and maintenance scripts.

  • [ ] Create .github/workflows/python-lint.yml with steps for: black formatting check, pylint/flake8 on docker/*.py files
  • [ ] Add mypy type checking configuration for docker/start.py and docker/periodic_timer.py
  • [ ] Reference docker/.isort.cfg in the workflow for import sorting validation
  • [ ] Add workflow trigger on push/PR to docker/ directory changes
  • [ ] Document Python setup requirements in dev/install-python-packages.sh if not already complete

Add integration tests for opengrok-indexer maven module

The opengrok-indexer/pom.xml exists as a key module but the file structure shows only opengrok-indexer/index/dirty and opengrok-indexer/src/main/java paths. There's likely minimal or no integration test coverage for the indexer functionality. Given this is the core indexing engine, integration tests validating index creation, cross-reference generation, and search queries would significantly improve reliability.

  • [ ] Create opengrok-indexer/src/test/java/org/opengrok/indexer/ directory structure for integration tests
  • [ ] Write test class for basic index creation workflow with sample source files
  • [ ] Add tests for xref (cross-reference) generation validation
  • [ ] Create test fixtures in opengrok-indexer/src/test/resources/ with small sample projects
  • [ ] Update opengrok-indexer/pom.xml to include maven-failsafe-plugin for integration test execution

Document and add tests for existing checkstyle/PMD configuration enforcement

The dev/checkstyle/style.xml, suppressions.xml, fileheader.txt and dev/pmd_ruleset.xml files exist but there's no visible CI workflow enforcing these rules (codeql-analysis.yml is for security scanning, not style). The README.md and CONTRIBUTING.md don't document code style requirements. This creates inconsistent contributions and hidden technical debt.

  • [ ] Create .github/workflows/code-quality.yml to run maven-checkstyle-plugin and maven-pmd-plugin on PR
  • [ ] Update CONTRIBUTING.md with section on 'Code Style Requirements' linking to dev/checkstyle/style.xml rules
  • [ ] Add section in CONTRIBUTING.md explaining the fileheader requirement (dev/checkstyle/fileheader.txt)
  • [ ] Configure workflow to fail on checkstyle/PMD violations and provide clear error messages
  • [ ] Add local enforcement documentation: 'Run mvn checkstyle:check pmd:check before submitting PR' in dev/README

🌿Good first issues

  • Add missing language analyzer tests: Language-specific test classes missing for several supported formats (Lua, Verilog, HCL mentioned in file stats but no obvious test files). Create LuaAnalyzerTest.java in opengrok/src/test/org/opengrok/indexer/analysis/ with tokenization assertions.
  • Expand Dockerfile documentation: docker/README.md exists but lacks examples for custom configuration volumes, reindex workflows, and multi-container compose setups. Add Docker Compose example and configuration env var reference.
  • Polish error messages in IndexDatabase.java: Current exception handling is verbose; create user-friendly error catalog with actionable guidance (e.g., 'Repository not readable' → 'Check file permissions and OPENGROK_JAVA_OPTS').

Top contributors

Click to expand

📝Recent commits

Click to expand
  • bc58fb4 — allow headers_file directive to be used at the top level (#4945) (vladak)
  • 76502c4 — remove localhost bypass for API checks (#4944) (vladak)
  • 59f7692 — Resolve #4384 : add COBOL language analyzer (urunsiyabend)
  • fec1022 — 🐛 fix(search-api): stable result order and configurable per-file hit limit (#4935) (gaborbernat)
  • e337812 — Bump github/codeql-action (dependabot[bot])
  • 8e7ef6d — change security-events permissions to write (#4939) (vladak)
  • 317d041 — add security-events: read permission (#4938) (vladak)
  • 6a67eeb — check Github actions with Macaron (#4936) (vladak)
  • 5bcca5a — Bump actions/upload-artifact from 6 to 7 (dependabot[bot])
  • 1f817cc — Bump tomcat from 10.1.52-jdk21 to 10.1.54-jdk21 (dependabot[bot])

🔒Security observations

OpenGrok demonstrates a reasonable security posture with established security reporting procedures (SECURITY.md) and proper use of build automation. However, there are areas for improvement: the distribution pom.xml has a malformed configuration that needs correction, the Dockerfile should use pinned base image digests for reproducibility, and comprehensive analysis of the web module's security headers configuration is needed. The project uses Maven for dependency management with appropriate plugin versions. No hardcoded credentials or secrets were detected in the provided snippets. Overall, the codebase appears well-maintained with GitHub Actions CI/CD, code quality monitoring via SonarQube, and coverage tracking.

  • Medium · Incomplete Maven Dependency Plugin Configuration — distribution/pom.xml. The pom.xml file for the distribution module has a truncated maven-dependency-plugin configuration. The closing tag for 'overWriteReleases' appears to be malformed (over/over instead of /over), which could cause build failures or unexpected behavior during dependency resolution. Fix: Complete and validate the Maven plugin configuration. Ensure all XML tags are properly closed and the configuration is well-formed.
  • Medium · Potential Insecure Dockerfile Base Image — Dockerfile. The Dockerfile uses 'ubuntu:jammy' without pinning to a specific image digest. While Ubuntu is generally maintained, using unversioned base images can lead to unexpected updates and potential security issues. The image should be pinned to a specific digest for reproducibility. Fix: Pin the Ubuntu base image to a specific digest: 'FROM ubuntu:jammy@sha256:...' to ensure consistent builds and prevent unexpected security updates or image changes.
  • Low · Maven Wrapper Permissions Not Explicitly Set — Dockerfile. The Dockerfile copies mvnw and .mvn directory but does not explicitly verify or set executable permissions. While this may work in Docker context, it could be a source of confusion in development environments. Fix: Add explicit RUN chmod +x /mvn/mvnw to ensure the Maven wrapper script has proper execute permissions across all environments.
  • Low · Missing Security Headers Configuration — opengrok-web module (not fully analyzed). Based on the file structure showing an opengrok-web module, typical web applications require security headers (Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, etc.). No evidence of security header configuration is visible in the provided snippets. Fix: Implement security headers in the web module's filter chain or web.xml configuration. Add CSP, HSTS, and other OWASP-recommended headers.
  • Low · Incomplete POM.xml Snippet — distribution/pom.xml. The provided pom.xml snippet is truncated at the maven-dependency-plugin configuration, making it impossible to verify the complete dependency list and plugin configurations for potential security issues. Fix: Provide complete pom.xml files for thorough security analysis of all dependencies, plugins, and configurations.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Mixed signals · oracle/opengrok — RepoPilot