RepoPilotOpen in app →

megadose/holehe

holehe allows you to check if the mail is used on different sites like twitter, instagram and will retrieve information on sites with the forgotten password function.

Mixed

Stale — last commit 2y ago

weakest axis
Use as dependencyConcerns

copyleft license (GPL-3.0) — review compatibility; last commit was 2y ago…

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • 22+ active contributors
  • GPL-3.0 licensed
  • CI configured
Show all 7 evidence items →
  • Stale — last commit 2y ago
  • Concentrated ownership — top contributor handles 56% of recent commits
  • GPL-3.0 is copyleft — check downstream compatibility
  • No test directory detected
What would change the summary?
  • Use as dependency ConcernsMixed if: relicense under MIT/Apache-2.0 (rare for established libs)

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Forkable
[![RepoPilot: Forkable](https://repopilot.app/api/badge/megadose/holehe?axis=fork)](https://repopilot.app/r/megadose/holehe)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/megadose/holehe on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: megadose/holehe

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/megadose/holehe shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Stale — last commit 2y ago

  • 22+ active contributors
  • GPL-3.0 licensed
  • CI configured
  • ⚠ Stale — last commit 2y ago
  • ⚠ Concentrated ownership — top contributor handles 56% of recent commits
  • ⚠ GPL-3.0 is copyleft — check downstream compatibility
  • ⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live megadose/holehe repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/megadose/holehe.

What it runs against: a local clone of megadose/holehe — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in megadose/holehe | Confirms the artifact applies here, not a fork | | 2 | License is still GPL-3.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 634 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>megadose/holehe</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of megadose/holehe. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/megadose/holehe.git
#   cd holehe
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of megadose/holehe and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "megadose/holehe(\\.git)?\\b" \\
  && ok "origin remote is megadose/holehe" \\
  || miss "origin remote is not megadose/holehe (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(GPL-3\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"GPL-3\\.0\"" package.json 2>/dev/null) \\
  && ok "license is GPL-3.0" \\
  || miss "license drift — was GPL-3.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "holehe/core.py" \\
  && ok "holehe/core.py" \\
  || miss "missing critical file: holehe/core.py"
test -f "holehe/__init__.py" \\
  && ok "holehe/__init__.py" \\
  || miss "missing critical file: holehe/__init__.py"
test -f "holehe/modules/__init__.py" \\
  && ok "holehe/modules/__init__.py" \\
  || miss "missing critical file: holehe/modules/__init__.py"
test -f "holehe/instruments.py" \\
  && ok "holehe/instruments.py" \\
  || miss "missing critical file: holehe/instruments.py"
test -f "holehe/localuseragent.py" \\
  && ok "holehe/localuseragent.py" \\
  || miss "missing critical file: holehe/localuseragent.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 634 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~604d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/megadose/holehe"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Holehe is an OSINT tool that checks if an email address is registered on 120+ websites (social media, CMS, CRM, forums, etc.) by leveraging password reset endpoints, which don't trigger account notifications. It returns partially obfuscated recovery emails, phone numbers, and account existence status in a standardized JSON format for each checked service. Modular architecture: holehe/modules/ organizes checks by category (cms/, crm/, forum/, social_media/, etc.), each containing a Python file for one site. Core orchestration happens in holehe/core.py and holehe/instruments.py. Entry point is holehe/init.py. CLI and async HTTP client integration handled via holehe/core.py using httpx and trio.

👥Who it's for

Security researchers, penetration testers, and OSINT investigators who need to enumerate accounts linked to a target email address without alerting the account owner. Also used by incident responders mapping attack surface during breach investigations.

🌱Maturity & risk

Actively maintained with 120+ integrated site checks across diverse categories (social_media, crm, cms, forum, crowdfunding, company modules). Has PyPI distribution and Docker support, suggesting production readiness. However, no visible test suite in the file list and single-maintainer risk (megadose) are notable concerns.

High risk of site integrations breaking silently—each of 120+ modules relies on reverse-engineered password reset endpoints that can change without notice. Single active maintainer (megadose) with no visible test coverage means regressions propagate easily. Dependencies are not listed in the provided data, making supply-chain risk assessment impossible.

Active areas of work

Cannot determine from provided data—no commit history, PR activity, or issue milestones visible. README indicates active maintenance with mention of Maltego Transform integration and online version at osint.industries, but specific current work is unknown.

🚀Get running

git clone https://github.com/megadose/holehe.git && cd holehe && python3 setup.py install Then run: holehe test@gmail.com

Daily commands: CLI: holehe your-email@example.com Python API: import trio and httpx, instantiate AsyncClient, await individual module coroutines (see Python example in README). Docker: docker build . -t my-holehe-image && docker run my-holehe-image holehe test@gmail.com

🗺️Map of the codebase

  • holehe/core.py — Main orchestration engine that coordinates email checking across all modules and aggregates results; entry point for all checking operations
  • holehe/__init__.py — Package initialization and public API exports; defines what external users interact with
  • holehe/modules/__init__.py — Module loader and registry that dynamically imports and organizes all site-checking modules
  • holehe/instruments.py — HTTP request handling and network utilities; abstracts away request/response patterns used across all modules
  • holehe/localuseragent.py — User-agent management to avoid detection and blocking; critical for stealth checks
  • holehe/modules/mails/google.py — Google/Gmail existence check using password recovery API; demonstrates core forgotten-password technique
  • holehe/modules/mails/protonmail.py — ProtonMail account detection; shows handling of sites with custom security flows

🧩Components & responsibilities

  • holehe/core.py (Python, requests, asyncio/threading) — Orchestrates entire checking flow: loads modules, dispatches email checks, aggregates results into JSON
    • Failure mode: If core fails, no modules execute; all results lost. Mitigation: modules catch and return partial results.
  • holehe/modules/ (site detectors)* — Each module probes one or more target sites by calling password recovery endpoint; returns {exists, emailrecovery}

🛠️How to make changes

Add a new site module

  1. Create new Python file in appropriate category under holehe/modules/ (e.g., holehe/modules/social/twitter.py or holehe/modules/cms/newsite.py) (holehe/modules/social/twitter.py)
  2. Implement a function that takes email and returns dict with 'exists' bool and optional 'emailrecovery' bool field (holehe/modules/social/twitter.py)
  3. Use holehe/instruments.py utilities (requests wrapper, User-Agent rotation) to make HTTP calls (holehe/instruments.py)
  4. Module is auto-discovered by holehe/core.py via dynamic import in holehe/modules/init.py; no manual registration needed (holehe/modules/__init__.py)

Customize HTTP request behavior

  1. Edit holehe/instruments.py to add helper methods for common request patterns (headers, proxies, timeout handling) (holehe/instruments.py)
  2. Update holehe/localuseragent.py to add new User-Agent strings or rotation logic (holehe/localuseragent.py)
  3. Individual modules import from instruments.py and call request wrappers (holehe/modules/mails/google.py)

Add email verification for a new mail provider

  1. Create new file in holehe/modules/mails/ (e.g., holehe/modules/mails/tutanota.py) (holehe/modules/mails/tutanota.py)
  2. Analyze target mail provider's password recovery endpoint to detect if email is registered (holehe/modules/mails/protonmail.py)
  3. Use holehe/instruments.py to handle HTTP requests safely and capture response indicators (holehe/instruments.py)
  4. Return standardized dict: {'exists': bool, 'emailrecovery': bool} for holehe/core.py aggregation (holehe/core.py)

Integrate with CLI or API caller

  1. Call holehe.core.check() or similar with email address parameter (holehe/core.py)
  2. Core orchestrator dynamically loads all modules from holehe/modules/ directory structure (holehe/modules/__init__.py)
  3. Each module runs in parallel (via asyncio or threading) to check email across all sites (holehe/core.py)
  4. Receive aggregated JSON with results per site (exists, emailrecovery flags, metadata) (holehe/__init__.py)

🔧Why these technologies

  • Python 3 with requests library — Simple HTTP automation for password recovery endpoint checks; easy to maintain and extend with new site modules
  • Modular plugin architecture (modules/ subdirectories by category) — Scale to 120+ site checks without monolithic code; auto-discovery allows new sites without registration
  • User-Agent rotation (localuseragent.py) — Evade basic bot detection and rate-limiting when checking multiple sites
  • Synchronous HTTP + optional threading/asyncio — Parallel checking across many sites while maintaining simple request logging and error handling

⚖️Trade-offs already made

  • Use password recovery endpoints instead of login attempts

    • Why: Avoids triggering account lockouts and security alerts; does not alert target email
    • Consequence: Depends on sites maintaining consistent password recovery flow; requires reverse-engineering each site's UI/API
  • Synchronous requests with local User-Agent rotation

    • Why: Simpler code and debugging; lower memory overhead than async for single-threaded CLI use
    • Consequence: Slower on networks with high latency; all 120+ modules execute serially or with basic threading
  • No database or caching layer

    • Why: Keeps tool lightweight and requires no external dependencies; easy to deploy in Docker
    • Consequence: Each email check re-runs all 120+ HTTP requests; no historical results or memoization
  • Minimal error handling per module

    • Why: Sites change frequently; modules fail gracefully and report 'unknown' rather than crash
    • Consequence: False negatives (email exists but check fails); no detailed error logs for debugging module failures

🚫Non-goals (don't propose these)

  • Does not perform authentication or login on target sites
  • Does not retrieve full account details (profile pictures, follower counts, etc.); only confirms existence
  • Does not handle CAPTCHA or interactive challenges
  • Does not support IP proxying or Tor integration (users must configure externally)
  • Not a real-time API service; primarily a CLI/library tool for offline batch checking
  • Does not maintain historical records or trend analysis of email registrations

🪤Traps & gotchas

No visible setup.py or requirements.txt in provided file list—dependency installation method unknown, may differ between PyPI and GitHub clone. Password reset endpoints are fragile and site-specific; changes to obfuscation logic or endpoint URLs will cause silent failures. No rate limiting built into individual modules; 'Rate limit? Change your IP' in README suggests manual workaround expected. Async/trio context required for Python API usage; blocking calls will fail. Docker image may not include all dependencies if Dockerfile is minimal (only 99 bytes provided).

🏗️Architecture

💡Concepts to learn

  • Password Reset Endpoint Abuse / Account Enumeration — Core technique in holehe: password reset flows often reveal account existence and recovery info without triggering login alerts, making them prime OSINT vectors; understanding when/why sites leak recovery data is critical
  • Async/Await with Trio (structured concurrency) — Holehe relies on trio for concurrent HTTP requests across 120+ sites; trio's cancellation and timeout semantics are essential for avoiding hangs when a single module fails
  • User-Agent Rotation & Bot Detection Evasion — localuseragent.py implements UA spoofing to avoid WAF/bot detection; many sites fingerprint requests, making this critical for reliable reconnaissance
  • Rate Limiting & IP Rotation Strategy — Holehe returns rateLimit=true in output but documentation says 'Change your IP'—understanding adaptive rate limiting, backoff strategies, and proxy integration is essential for scaling checks
  • Response Pattern Matching & Heuristic Account Detection — Each site module must detect 'account exists' vs 'not found' via HTTP status codes, response text patterns, or redirect behavior; inconsistent heuristics are a major failure source
  • OSINT Data Aggregation & Normalization — Holehe standardizes outputs (name, exists, emailrecovery, phoneNumber, others) across wildly different site architectures; this normalization allows downstream correlation and visualization
  • Reverse-Engineering Web Endpoints via Browser DevTools — Each holehe module requires manual reverse-engineering of a site's password reset flow using network inspection; understanding request/response pairs and CSRF tokens is prerequisite knowledge
  • soxoj/maigret — Similar OSINT email/username search tool; good reference for multi-site enumeration patterns and concurrent request handling
  • Raikia/UhOh365 — Credited in holehe README; specialized for O365/Azure account enumeration, complementary to holehe's broad site coverage
  • trustedsec/social-engineer-toolkit — SET includes email harvesting and OSINT modules; shares similar reconnaissance goals and password reset endpoint abuse techniques
  • megadose/holehe-maltego — Official Maltego integration for holehe; allows running checks directly from Maltego's graph UI for visual investigation workflows
  • thewhiteh4t/pwnedornot — Companion tool for checking if email appears in known breaches; often used alongside holehe for comprehensive account compromise assessment

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for module detection logic in holehe/core.py

The repo has 120+ modules across multiple categories (cms, crm, crowfunding, forum, jobs, learning) but there's no visible test suite. This is critical for a tool that performs email validation across diverse platforms. Unit tests would catch regressions when modules are added/modified and ensure the core detection logic works correctly across different response scenarios.

  • [ ] Create tests/ directory structure mirroring holehe/modules/ organization
  • [ ] Add unit tests for holehe/core.py focusing on module loading and execution flow
  • [ ] Add sample tests for at least 3 module categories (e.g., cms/wordpress.py, crm/hubspot.py, forum/mybb.py) testing both positive and negative email detection cases
  • [ ] Integrate pytest into GitHub Actions workflow (.github/workflows/) to run on each PR

Create module validation and health check framework

With 120+ modules spread across holehe/modules/, there's no visible mechanism to detect broken modules (site changes, dead endpoints, etc.). A validation framework would help maintainers identify which modules need updates and allow contributors to quickly verify their new modules work correctly before submitting PRs.

  • [ ] Add a new file holehe/module_validator.py with functions to test each module's basic connectivity
  • [ ] Create a CLI command (add to holehe/core.py or instruments.py) that runs all modules against a test email and reports failures
  • [ ] Document the validation process in README.md with a section 'Contributing New Modules'
  • [ ] Add a GitHub Action workflow (.github/workflows/module_health_check.yml) that runs validation weekly to detect broken modules

Extract duplicate request/response handling logic into holehe/instruments.py utilities

Looking at the module structure, individual modules (like holehe/modules/crm/hubspot.py, holehe/modules/cms/wordpress.py) likely contain repeated code for HTTP requests, response parsing, and forgotten password detection. Consolidating these patterns into holehe/instruments.py would reduce code duplication, improve maintainability, and make it easier for contributors to write new modules.

  • [ ] Audit 5-10 existing modules to identify common patterns (request headers, response status checks, forgotten password endpoint detection)
  • [ ] Add utility functions to holehe/instruments.py for common patterns (e.g., check_password_reset_endpoint(), parse_response_for_account_existence())
  • [ ] Refactor 2-3 sample modules (e.g., hubspot.py, wordpress.py, duolingo.py) to use the new utilities
  • [ ] Update documentation/template for contributors on how to use the new utilities when adding modules

🌿Good first issues

  • Add pytest unit tests for holehe/modules/social_media/twitter.py and holehe/modules/crm/hubspot.py to validate response parsing—currently no test files visible in repo structure.
  • Create holehe/modules/payment/ category for PayPal, Stripe, and Square with 3 new site checks, following the signature pattern from existing modules.
  • Document the async function signature contract in a CONTRIBUTING.md file with a working code template, since README Python example uses snapchat but module file structure is not explained.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 14da70f — Merge pull request #194 from secureman/master (megadose)
  • 4df5bf8 — Merge pull request #200 from sujal-choudhary/patch-1 (megadose)
  • 04efb43 — Update facebook.py (sujal-choudhary)
  • e048d38 — Create facebook.py (sujal-choudhary)
  • 7bb246b — new module Duolingo.py (secureman)
  • bdacc5e — Merge pull request #186 from anon-dev-gh/add-dockerfile (megadose)
  • 9bfa409 — Update README.md (megadose)
  • a4c45e7 — Adds Dockerfile and updates README.md (anon-dev-gh)
  • bec2f58 — Update README.md (megadose)
  • 88454a3 — Update README.md (megadose)

🔒Security observations

The holehe OSINT tool has significant security and ethical concerns. Primary risks include: (1) Email

  • High · Potential Information Disclosure via Email Enumeration — holehe/core.py, holehe/modules/*. The holehe tool is designed to check if emails are registered on various websites by using forgotten password functions and other enumeration techniques. This could enable attackers to perform email enumeration attacks, building lists of valid email addresses for targeted attacks, phishing campaigns, or social engineering. Fix: Implement rate limiting, CAPTCHA challenges, and consider adding warnings about responsible use. Document ethical guidelines prominently. Consider implementing signature-based detection to prevent abuse.
  • High · Unvalidated External HTTP Requests — holehe/instruments.py, holehe/modules/*. The codebase makes numerous HTTP requests to external websites (120+ services) to check email registration status. Without visible request validation, input sanitization, or timeout controls, this could be vulnerable to SSRF (Server-Side Request Forgery) attacks or be used to conduct DDoS attacks against third-party services. Fix: Implement strict request validation, URL whitelisting, timeout limits (e.g., 10-30 seconds), rate limiting, and consider using a circuit breaker pattern. Add request signing and user-agent validation.
  • Medium · Missing Dependency Lock File — Project root (missing file). No requirements.txt, poetry.lock, or pipfile.lock provided in the analysis. Without locked dependencies, the tool could be vulnerable to supply chain attacks if transitive dependencies are compromised or updated with malicious code. Fix: Create and maintain a requirements.txt with pinned versions or use poetry/pipenv with lock files. Regularly audit dependencies using tools like safety or pip-audit. Implement automated dependency scanning in CI/CD.
  • Medium · Potential User-Agent Spoofing — holehe/localuseragent.py. The presence of 'localuseragent.py' suggests the tool uses rotating user agents to bypass detection. While this is common in OSINT tools, it could be used to violate websites' terms of service or robots.txt, and may expose the tool to legal liability. Fix: Add explicit user-agent identification that clearly identifies requests as coming from the holehe tool. Respect robots.txt and website terms of service. Document which websites explicitly allow this tool.
  • Medium · Docker Base Image Version Not Pinned — Dockerfile. The Dockerfile uses 'python:3.11-slim-bullseye' which could receive security updates that may introduce breaking changes. Without a specific digest hash, supply chain attacks via base image manipulation are possible. Fix: Pin to specific digest: 'python:3.11-slim-bullseye@sha256:...' and regularly update. Implement image scanning with Trivy or similar tools in CI/CD pipeline.
  • Medium · No Input Validation on Email Parameter — holehe/core.py. The tool accepts email addresses as input but visible validation is not evident in the file structure. Malformed emails or special characters could cause unexpected behavior or errors. Fix: Implement strict email validation using RFC 5322 standards. Validate and sanitize all inputs before use. Add length limits and character restrictions.
  • Low · Missing Security Headers in Web Interface — Web application (not in repo structure). The README mentions an online version at osint.industries. Without visible security configuration, the web interface may lack security headers (CSP, X-Frame-Options, etc.). Fix: Implement security headers: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Strict-Transport-Security, etc. Use HTTPS only with HSTS.
  • Low · No Rate Limiting Visible in Core Logic — holehe/core.py, holehe/instruments.py. The tool can check 120+ websites per email. Without rate limiting between requests, it could be flagged as malicious by WAF/IDS systems or DOS target websites. Fix: Implement exponential backoff and rate limiting (e.g., 1 request per second per domain). Add configurable delays. Implement request queuing to avoid overwhelming target servers.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Mixed signals · megadose/holehe — RepoPilot