urbanadventurer/WhatWeb
Next generation web scanner
Mixed signals — read the receipts
worst of 4 axescopyleft license (GPL-2.0) — review compatibility; no tests detected
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 5w ago
- ✓11 active contributors
- ✓GPL-2.0 licensed
Show 4 more →Show less
- ✓CI configured
- ⚠Concentrated ownership — top contributor handles 69% of recent commits
- ⚠GPL-2.0 is copyleft — check downstream compatibility
- ⚠No test directory detected
What would change the summary?
- →Use as dependency Concerns → Mixed if: relicense under MIT/Apache-2.0 (rare for established libs)
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/urbanadventurer/whatweb)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/urbanadventurer/whatweb on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: urbanadventurer/WhatWeb
Generated by RepoPilot · 2026-05-10 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/urbanadventurer/WhatWeb shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Mixed signals — read the receipts
- Last commit 5w ago
- 11 active contributors
- GPL-2.0 licensed
- CI configured
- ⚠ Concentrated ownership — top contributor handles 69% of recent commits
- ⚠ GPL-2.0 is copyleft — check downstream compatibility
- ⚠ No test directory detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live urbanadventurer/WhatWeb
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/urbanadventurer/WhatWeb.
What it runs against: a local clone of urbanadventurer/WhatWeb — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in urbanadventurer/WhatWeb | Confirms the artifact applies here, not a fork |
| 2 | License is still GPL-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 67 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of urbanadventurer/WhatWeb. If you don't
# have one yet, run these first:
#
# git clone https://github.com/urbanadventurer/WhatWeb.git
# cd WhatWeb
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of urbanadventurer/WhatWeb and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "urbanadventurer/WhatWeb(\\.git)?\\b" \\
&& ok "origin remote is urbanadventurer/WhatWeb" \\
|| miss "origin remote is not urbanadventurer/WhatWeb (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(GPL-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"GPL-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is GPL-2.0" \\
|| miss "license drift — was GPL-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "lib/whatweb.rb" \\
&& ok "lib/whatweb.rb" \\
|| miss "missing critical file: lib/whatweb.rb"
test -f "lib/whatweb/scan.rb" \\
&& ok "lib/whatweb/scan.rb" \\
|| miss "missing critical file: lib/whatweb/scan.rb"
test -f "lib/plugins.rb" \\
&& ok "lib/plugins.rb" \\
|| miss "missing critical file: lib/plugins.rb"
test -f "lib/plugin_support.rb" \\
&& ok "lib/plugin_support.rb" \\
|| miss "missing critical file: lib/plugin_support.rb"
test -f "lib/target.rb" \\
&& ok "lib/target.rb" \\
|| miss "missing critical file: lib/target.rb"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 67 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~37d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/urbanadventurer/WhatWeb"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
WhatWeb is a Ruby-based web technology identification scanner that recognizes 1800+ web technologies including CMSs, frameworks, JavaScript libraries, web servers, and embedded devices by analyzing HTTP responses and page content. It answers 'What is that website?' by detecting CMS platforms (WordPress, Drupal), analytics packages, version numbers, email addresses, SQL errors, and other indicators through both passive (single HTTP request) and aggressive scanning modes. Monolithic Ruby application: lib/whatweb.rb is the main module; lib/whatweb/ contains core subsystems (scan.rb handles scanning logic, parser.rb processes responses, redirect.rb tracks redirects, banner.rb generates output). Plugins are stored in two locations: my-plugins/ for custom/tutorial plugins and an implicit main plugins/ directory. lib/logging/ directory contains pluggable output formatters (JSON, XML, MongoDB, Elasticsearch, verbose, brief). Addons in addons/ provide specialized scanning (country detection, Nikto verification).
👥Who it's for
Security researchers, penetration testers, and bug bounty hunters who need to quickly identify web technologies and versions during reconnaissance or vulnerability assessments. Also used by system administrators auditing their own infrastructure and by developers analyzing competitor websites.
🌱Maturity & risk
Production-ready and actively maintained. v0.6.4 released April 3, 2026; GPLv2 licensed; 1824+ plugins in the ecosystem; Travis CI configured (.travis.yml present). Ruby codebase is substantial (3M+ lines) with established plugin architecture and multiple output formats (JSON, XML, MongoDB, Elasticsearch). However, repository shows signs of slower recent development relative to commit patterns.
Low risk for scanning public sites; moderate risk for penetration testing due to single-maintainer pattern (Andrew Horton primary). No visible automated test suite (.simplecov present but minimal test coverage evident in file list). Dependencies managed via Gemfile but Ruby gem dependency count unknown from metadata. Breaking changes unlikely given plugin-based architecture's backward compatibility guarantees.
Active areas of work
Unable to determine from provided file list (no CHANGELOG details, commit history, or PR/issue data included). The stable release v0.6.4 (April 2026) and presence of 1824 plugins suggests ongoing maintenance, but exact current work is unknown. The tutorial plugins (plugin-tutorial-1 through 7) suggest active plugin development and documentation focus.
🚀Get running
git clone https://github.com/urbanadventurer/WhatWeb.git
cd WhatWeb
bundle install # uses Gemfile for dependencies
make install # or follow INSTALL.md
./whatweb http://example.com
Daily commands:
# CLI usage (primary interface)
./whatweb [options] <URL>
# With Makefile
make # likely builds/compiles
make test # if test target exists
# As library
require_relative 'lib/whatweb'
WhatWeb.scan('http://example.com')
🗺️Map of the codebase
lib/whatweb.rb— Main entry point and orchestrator—initializes the scanner, manages plugins, and coordinates the scanning workflow.lib/whatweb/scan.rb— Core scanning logic that executes HTTP requests, applies plugins, and aggregates results for each target.lib/plugins.rb— Plugin loading, registration, and execution framework—every custom detection depends on understanding this loader.lib/plugin_support.rb— Plugin API and helper methods that define the contract between core scanner and individual detection plugins.lib/target.rb— Target abstraction representing a single URL to scan—manages normalization and state throughout the scan lifecycle.lib/extend-http.rb— HTTP client extensions and Net::HTTP monkey-patches that enable WhatWeb's HTTP fingerprinting capabilities.plugins— Detection plugin directory (1824 plugins)—the majority of WhatWeb's detection power resides here via plugin patterns.
🛠️How to make changes
Add a New Detection Plugin
- Create a new Ruby file in plugins/ directory named after the technology (e.g., plugins/my-framework.rb) (
plugins/my-framework.rb) - Define a plugin class inheriting from WhatWeb::Plugin with metadata (author, version, description) (
lib/plugin_support.rb) - Implement detection logic using patterns (regex, string match, MD5 hash) against HTTP response headers, body, or HTML (
plugins/my-framework.rb) - Return matched version, metadata, and confidence; plugin loader will automatically register and execute on next scan (
lib/plugins.rb)
Add a New Output Format
- Create a new logging formatter in lib/logging/myformat.rb inheriting from WhatWeb::Logging base (
lib/logging/myformat.rb) - Implement required methods: initialize, register, and output methods to format scan results (
lib/logging.rb) - Register formatter in the logging router and make it selectable via CLI --log-myformat flag (
lib/logging.rb)
Customize HTTP Behavior or Headers
- Modify lib/extend-http.rb to add custom HTTP headers, User-Agent strings, or request customizations (
lib/extend-http.rb) - Override Net::HTTP methods or add helper methods in the HTTPExtensions module (
lib/extend-http.rb) - Restart WhatWeb to load changes; HTTP customizations apply to all plugin detection requests (
lib/whatweb/scan.rb)
Add Remote Data Integration (e.g., IP Geolocation)
- Create a plugin (e.g., plugins/ip-geolocation.rb) that fetches external data during scanning (
plugins/ip-geolocation.rb) - Use lib/helper.rb utilities or Net::HTTP to call remote APIs; cache results in lib/simple_cookie_jar.rb or custom cache (
lib/helper.rb) - Integrate lookup results into plugin output; consider performance and rate-limiting to avoid scan slowdown (
plugins/ip-geolocation.rb)
🔧Why these technologies
- Ruby — Rapid development, excellent for regex-heavy pattern matching, strong HTTP libraries, large available gem ecosystem for integrations
- Net::HTTP (stdlib) — Built-in HTTP client with monkey-patching support; allows WhatWeb to customize headers, cookies, and low-level behavior without heavy dependencies
- Plugin Architecture (dynamic loading) — Decouples detection logic from core engine; enables community contributions, hot-reloading, and selective enable/disable of detectors
- Multiple Output Formats (JSON, XML, Elasticsearch) — Supports diverse downstream integrations—parsing by security tools, ingestion into SIEM, automated reporting, and data pipelines
⚖️Trade-offs already made
-
Single-threaded scanning (per target processed sequentially in plugins)
- Why: Simplifies plugin state management and avoids race conditions; reduces complexity for plugin developers
- Consequence: Slower scan times for many targets unless parallelized at CLI level (e.g., gnu parallel); may underutilize multi-core systems
-
Plugins bundled with core rather than distributed package-by-package
- Why: Zero-friction out-of-the-box scanning; 1824 detections immediately available without external dependency management
- Consequence: Large binary footprint; slower clone; harder to update individual plugins without releasing entire WhatWeb version
-
Regex patterns and static fingerprints (not ML/dynamic learning)
- Why: Deterministic, reproducible results; no training data required; fast offline execution; easy plugin auditing
- Consequence: Cannot adapt to web app variants not covered by pattern; may have higher false-positive/negative rates vs. ML approaches
-
No built-in HTTP proxy or MITM TLS interception
- Why: Reduces legal/compliance surface; simpler architecture; users can pipe through external proxies
- Consequence: Cannot scan HTTPS endpoints via transparent proxy; requires external setup (Burp, Fiddler, mitmproxy) for some workflows
🚫Non-goals (don't propose these)
- Does not perform active exploitation or payload delivery—passive fingerprinting and metadata extraction only
- Does not handle authentication workflows—no form login, OAuth, or session management beyond simple cookie passing
- Does not support JavaScript rendering—no headless browser integration; cannot detect technologies loaded post-DOM
- Does not provide real-time monitoring or continuous scanning—designed for point-in-time assessments
- Does not include graphical user interface—CLI-only tool
🪤Traps & gotchas
- Plugin loading order matters: Plugins in lib/plugins.rb are loaded in filesystem order; plugin dependencies (if any) are not explicitly declared. 2. HTTP client state persistence: lib/target.rb maintains cookies and redirect chains across multiple requests; may leak data between scans if targets not properly isolated. 3. Aggression levels impact plugin behavior: Plugins check aggression level (likely via options); more aggressive modes execute different code paths. 4. Output formatters are mutually exclusive: Logging system routes to single formatter; cannot output simultaneously to JSON and XML. 5. SimpleCov configured but no tests visible: .simplecov exists but no spec/ or test/ directory in file list; coverage baseline unknown.
🏗️Architecture
💡Concepts to learn
- Plugin Architecture & Dynamic Code Loading — WhatWeb's 1824+ plugins are loaded dynamically at runtime via lib/plugins.rb; understanding how Ruby's require_relative and File.glob discover .rb files in plugins/ is essential to adding detections without recompiling
- Aggression Levels (Stealthy/Aggressive Scanning) — WhatWeb's core selling point is the ability to trade speed for accuracy via aggression modes; plugins behave differently at each level (stealthy=1 HTTP request, aggressive=multiple requests); critical to understanding scanning strategy
- HTTP Fingerprinting & Response Pattern Matching — Plugins detect technologies by regex/pattern matching against HTTP headers, cookies, HTML content, and meta tags; understanding how lib/whatweb/parser.rb extracts these signals and lib/plugin_support.rb matches them is core to plugin development
- Pluggable Output Formatters (Strategy Pattern) — lib/logging/ contains 9 independent formatter classes (JSON, XML, MongoDB, Elasticsearch); understanding this strategy pattern allows adding new output formats without modifying core scanning logic
- HTTP Redirect Chain Following — lib/whatweb/redirect.rb tracks and analyzes redirect chains (301/302/Meta refreshes); technologies can be identified mid-redirect or by analyzing the destination chain; important for detecting reverse proxies and CDNs
- Version Detection & Semantic Versioning Matching — lib/version_class.rb handles parsing and comparing version strings (e.g., 'WordPress 5.8.1'); plugins extract versions from meta tags, headers, or file paths; critical for vulnerability correlations in penetration tests
- Cookie Jar & Session State Management — lib/simple_cookie_jar.rb maintains cookies across multiple HTTP requests to the same target; lib/target.rb uses this for authenticated scanning; understanding state persistence prevents session leaks and enables cookie-based detection
🔗Related repos
projectdiscovery/nuclei— Modern modular scanner for vulnerability detection; similar plugin architecture and output format flexibility to WhatWeb but focuses on active testing vs. passive technology identificationwpscanteam/wpscan— Specialized WordPress security scanner; WhatWeb can identify WordPress, but WPScan dives deeper into version-specific vulnerabilities and plugin enumerationaquasecurity/trivy— General-purpose scanner for containers and artifacts; complements WhatWeb by analyzing built application dependencies; shares similar plugin/detection philosophyurbanadventurer/WhatWeb-addons— Official companion repository for specialized scanning addons (likely contains plugins/modules beyond core 1824) and community-contributed detectionsoffensive-security/kali-linux— WhatWeb is standard tool in Kali penetration testing distribution; ecosystem integration shows maturity and active penetration testing community usage
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for lib/plugins.rb and lib/plugin_support.rb
WhatWeb has 1824+ plugins but there's no evidence of unit tests in the repo structure. The plugin system is critical infrastructure - lib/plugins.rb and lib/plugin_support.rb are the core loaders/managers. Adding tests here would catch regressions in plugin loading, pattern matching, and version detection across the plugin ecosystem.
- [ ] Create spec/lib/plugins_spec.rb with tests for plugin discovery and loading logic in lib/plugins.rb
- [ ] Create spec/lib/plugin_support_spec.rb testing the plugin DSL and pattern matching in lib/plugin_support.rb
- [ ] Add fixtures in spec/fixtures/ with 2-3 minimal test plugins
- [ ] Integrate tests into Rakefile (already present) to run via
rake test - [ ] Ensure .simplecov (already present) tracks coverage for these critical files
Add GitHub Actions CI workflow for Ruby linting and plugin validation
.travis.yml exists (legacy CI) but there's a .rubocop.yml present, suggesting code style enforcement is intended but never automated in CI. Add a modern GitHub Actions workflow that runs RuboCop on every PR and validates plugin syntax by requiring all plugins in plugins-disabled/ and my-plugins/ directories to load without errors.
- [ ] Create .github/workflows/lint-and-validate.yml that runs
rubocopon lib/ and plugins/ directories - [ ] Add a step that validates plugin loading: iterate through my-plugins/.rb and plugins-disabled/.rb, require each, catch SyntaxError/LoadError
- [ ] Configure workflow to trigger on push and pull_request to main/master branches
- [ ] Reference .rubocop.yml configuration in the workflow file
Create comprehensive documentation for lib/whatweb/scan.rb with examples
lib/whatweb/scan.rb is the orchestration engine for the entire scanner but has no dedicated documentation. The repo has plugin-development/plugin-tutorials.txt but nothing explaining the scan pipeline itself. Document the scan flow, HTTP request/response handling, and how plugins integrate with the scanner lifecycle.
- [ ] Review lib/whatweb/scan.rb to understand the main scan orchestration logic
- [ ] Create docs/SCAN_PIPELINE.md explaining: initialization, target resolution, plugin execution order, result aggregation
- [ ] Document the integration points with lib/whatweb/redirect.rb (redirect following) and lib/extend-http.rb (HTTP customization)
- [ ] Add 2-3 code examples showing how to call the scan API programmatically
- [ ] Link this new doc from README.md and INSTALL.md in relevant sections
🌿Good first issues
- Add RSpec unit tests for lib/whatweb/parser.rb since parser is critical but has no visible test coverage; start with testing URL redirect handling via lib/whatweb/redirect.rb
- Create missing documentation for my-plugins/plugin-tutorial-5.rb through plugin-tutorial-7.rb in the Plugin Development Guide; currently only tutorials 1-4 are referenced in README
- Add dry-run mode documentation and CLI flag (--dry-run) to show which plugins will execute for a target without actually running them; useful for debugging plugin selection logic
⭐Top contributors
Click to expand
Top contributors
- @urbanadventurer — 69 commits
- @definity — 10 commits
- @John — 5 commits
- @frapava98 — 4 commits
- @juananpe — 4 commits
📝Recent commits
Click to expand
Recent commits
d279d93— Update version.rb (urbanadventurer)f78efbd— Update README.md (urbanadventurer)dc924b4— Update CHANGELOG.md (urbanadventurer)c42e1b9— Update CHANGELOG for v0.6.4 performance improvements (urbanadventurer)fd7a914— performance improvements!!! (urbanadventurer)17e2fa7— Merge pull request #423 from johndekroon/citrix-netscaler-fix (urbanadventurer)5b32cd5— Merge pull request #421 from johndekroon/mikrotik (urbanadventurer)d47f562— Merge pull request #420 from johndekroon/tplink (urbanadventurer)1a89af5— Merge pull request #419 from johndekroon/ubiquiti-uisp (urbanadventurer)1dc52fd— also match on title (John)
🔒Security observations
Failed to generate security analysis.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.