megadose/holehe

Item: megadose/holehe
Rating: 3
Author: RepoPilot

holehe allows you to check if the mail is used on different sites like twitter, instagram and will retrieve information on sites with the forgotten password function.

Mixed

Stale — last commit 2y ago

weakest axis

Use as dependencyConcerns

copyleft license (GPL-3.0) — review compatibility; last commit was 2y ago…

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓22+ active contributors
✓GPL-3.0 licensed
✓CI configured

Show all 7 evidence items →

⚠Stale — last commit 2y ago
⚠Concentrated ownership — top contributor handles 56% of recent commits
⚠GPL-3.0 is copyleft — check downstream compatibility
⚠No test directory detected

What would change the summary?

→Use as dependency Concerns → Mixed if: relicense under MIT/Apache-2.0 (rare for established libs)

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/megadose/holehe?axis=fork)](https://repopilot.app/r/megadose/holehe)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/megadose/holehe on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: megadose/holehe

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/megadose/holehe shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Stale — last commit 2y ago

22+ active contributors
GPL-3.0 licensed
CI configured
⚠ Stale — last commit 2y ago
⚠ Concentrated ownership — top contributor handles 56% of recent commits
⚠ GPL-3.0 is copyleft — check downstream compatibility
⚠ No test directory detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live megadose/holehe repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/megadose/holehe.

What it runs against: a local clone of megadose/holehe — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in megadose/holehe | Confirms the artifact applies here, not a fork | | 2 | License is still GPL-3.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 634 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>megadose/holehe</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of megadose/holehe. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/megadose/holehe.git
#   cd holehe
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of megadose/holehe and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "megadose/holehe(\\.git)?\\b" \\
  && ok "origin remote is megadose/holehe" \\
  || miss "origin remote is not megadose/holehe (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(GPL-3\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"GPL-3\\.0\"" package.json 2>/dev/null) \\
  && ok "license is GPL-3.0" \\
  || miss "license drift — was GPL-3.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "holehe/core.py" \\
  && ok "holehe/core.py" \\
  || miss "missing critical file: holehe/core.py"
test -f "holehe/__init__.py" \\
  && ok "holehe/__init__.py" \\
  || miss "missing critical file: holehe/__init__.py"
test -f "holehe/modules/__init__.py" \\
  && ok "holehe/modules/__init__.py" \\
  || miss "missing critical file: holehe/modules/__init__.py"
test -f "holehe/instruments.py" \\
  && ok "holehe/instruments.py" \\
  || miss "missing critical file: holehe/instruments.py"
test -f "holehe/localuseragent.py" \\
  && ok "holehe/localuseragent.py" \\
  || miss "missing critical file: holehe/localuseragent.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 634 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~604d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/megadose/holehe"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Holehe is an OSINT tool that checks if an email address is registered on 120+ websites (social media, CMS, CRM, forums, etc.) by leveraging password reset endpoints, which don't trigger account notifications. It returns partially obfuscated recovery emails, phone numbers, and account existence status in a standardized JSON format for each checked service. Modular architecture: holehe/modules/ organizes checks by category (cms/, crm/, forum/, social_media/, etc.), each containing a Python file for one site. Core orchestration happens in holehe/core.py and holehe/instruments.py. Entry point is holehe/init.py. CLI and async HTTP client integration handled via holehe/core.py using httpx and trio.

👥Who it's for

Security researchers, penetration testers, and OSINT investigators who need to enumerate accounts linked to a target email address without alerting the account owner. Also used by incident responders mapping attack surface during breach investigations.

🌱Maturity & risk

Actively maintained with 120+ integrated site checks across diverse categories (social_media, crm, cms, forum, crowdfunding, company modules). Has PyPI distribution and Docker support, suggesting production readiness. However, no visible test suite in the file list and single-maintainer risk (megadose) are notable concerns.

High risk of site integrations breaking silently—each of 120+ modules relies on reverse-engineered password reset endpoints that can change without notice. Single active maintainer (megadose) with no visible test coverage means regressions propagate easily. Dependencies are not listed in the provided data, making supply-chain risk assessment impossible.

Active areas of work

Cannot determine from provided data—no commit history, PR activity, or issue milestones visible. README indicates active maintenance with mention of Maltego Transform integration and online version at osint.industries, but specific current work is unknown.

🚀Get running

git clone https://github.com/megadose/holehe.git && cd holehe && python3 setup.py install Then run: holehe test@gmail.com

Daily commands: CLI: holehe your-email@example.com Python API: import trio and httpx, instantiate AsyncClient, await individual module coroutines (see Python example in README). Docker: docker build . -t my-holehe-image && docker run my-holehe-image holehe test@gmail.com

🗺️Map of the codebase

holehe/core.py — Main orchestration engine that coordinates email checking across all modules and aggregates results; entry point for all checking operations
holehe/__init__.py — Package initialization and public API exports; defines what external users interact with
holehe/modules/__init__.py — Module loader and registry that dynamically imports and organizes all site-checking modules
holehe/instruments.py — HTTP request handling and network utilities; abstracts away request/response patterns used across all modules
holehe/localuseragent.py — User-agent management to avoid detection and blocking; critical for stealth checks
holehe/modules/mails/google.py — Google/Gmail existence check using password recovery API; demonstrates core forgotten-password technique
holehe/modules/mails/protonmail.py — ProtonMail account detection; shows handling of sites with custom security flows

🧩Components & responsibilities

holehe/core.py (Python, requests, asyncio/threading) — Orchestrates entire checking flow: loads modules, dispatches email checks, aggregates results into JSON
- Failure mode: If core fails, no modules execute; all results lost. Mitigation: modules catch and return partial results.
holehe/modules/ (site detectors)* — Each module probes one or more target sites by calling password recovery endpoint; returns {exists, emailrecovery}

🛠️How to make changes

Add a new site module

Create new Python file in appropriate category under holehe/modules/ (e.g., holehe/modules/social/twitter.py or holehe/modules/cms/newsite.py) (holehe/modules/social/twitter.py)
Implement a function that takes email and returns dict with 'exists' bool and optional 'emailrecovery' bool field (holehe/modules/social/twitter.py)
Use holehe/instruments.py utilities (requests wrapper, User-Agent rotation) to make HTTP calls (holehe/instruments.py)
Module is auto-discovered by holehe/core.py via dynamic import in holehe/modules/init.py; no manual registration needed (holehe/modules/__init__.py)

Customize HTTP request behavior

Edit holehe/instruments.py to add helper methods for common request patterns (headers, proxies, timeout handling) (holehe/instruments.py)
Update holehe/localuseragent.py to add new User-Agent strings or rotation logic (holehe/localuseragent.py)
Individual modules import from instruments.py and call request wrappers (holehe/modules/mails/google.py)

Add email verification for a new mail provider

Create new file in holehe/modules/mails/ (e.g., holehe/modules/mails/tutanota.py) (holehe/modules/mails/tutanota.py)
Analyze target mail provider's password recovery endpoint to detect if email is registered (holehe/modules/mails/protonmail.py)
Use holehe/instruments.py to handle HTTP requests safely and capture response indicators (holehe/instruments.py)
Return standardized dict: {'exists': bool, 'emailrecovery': bool} for holehe/core.py aggregation (holehe/core.py)

Integrate with CLI or API caller

Call holehe.core.check() or similar with email address parameter (holehe/core.py)
Core orchestrator dynamically loads all modules from holehe/modules/ directory structure (holehe/modules/__init__.py)
Each module runs in parallel (via asyncio or threading) to check email across all sites (holehe/core.py)
Receive aggregated JSON with results per site (exists, emailrecovery flags, metadata) (holehe/__init__.py)

🔧Why these technologies

Python 3 with requests library — Simple HTTP automation for password recovery endpoint checks; easy to maintain and extend with new site modules
Modular plugin architecture (modules/ subdirectories by category) — Scale to 120+ site checks without monolithic code; auto-discovery allows new sites without registration
User-Agent rotation (localuseragent.py) — Evade basic bot detection and rate-limiting when checking multiple sites
Synchronous HTTP + optional threading/asyncio — Parallel checking across many sites while maintaining simple request logging and error handling

⚖️Trade-offs already made

Use password recovery endpoints instead of login attempts
- Why: Avoids triggering account lockouts and security alerts; does not alert target email
- Consequence: Depends on sites maintaining consistent password recovery flow; requires reverse-engineering each site's UI/API
Synchronous requests with local User-Agent rotation
- Why: Simpler code and debugging; lower memory overhead than async for single-threaded CLI use
- Consequence: Slower on networks with high latency; all 120+ modules execute serially or with basic threading
No database or caching layer
- Why: Keeps tool lightweight and requires no external dependencies; easy to deploy in Docker
- Consequence: Each email check re-runs all 120+ HTTP requests; no historical results or memoization
Minimal error handling per module
- Why: Sites change frequently; modules fail gracefully and report 'unknown' rather than crash
- Consequence: False negatives (email exists but check fails); no detailed error logs for debugging module failures

🚫Non-goals (don't propose these)

Does not perform authentication or login on target sites
Does not retrieve full account details (profile pictures, follower counts, etc.); only confirms existence
Does not handle CAPTCHA or interactive challenges
Does not support IP proxying or Tor integration (users must configure externally)
Not a real-time API service; primarily a CLI/library tool for offline batch checking
Does not maintain historical records or trend analysis of email registrations

🪤Traps & gotchas

No visible setup.py or requirements.txt in provided file list—dependency installation method unknown, may differ between PyPI and GitHub clone. Password reset endpoints are fragile and site-specific; changes to obfuscation logic or endpoint URLs will cause silent failures. No rate limiting built into individual modules; 'Rate limit? Change your IP' in README suggests manual workaround expected. Async/trio context required for Python API usage; blocking calls will fail. Docker image may not include all dependencies if Dockerfile is minimal (only 99 bytes provided).

🏗️Architecture

💡Concepts to learn

Password Reset Endpoint Abuse / Account Enumeration — Core technique in holehe: password reset flows often reveal account existence and recovery info without triggering login alerts, making them prime OSINT vectors; understanding when/why sites leak recovery data is critical
Async/Await with Trio (structured concurrency) — Holehe relies on trio for concurrent HTTP requests across 120+ sites; trio's cancellation and timeout semantics are essential for avoiding hangs when a single module fails
User-Agent Rotation & Bot Detection Evasion — localuseragent.py implements UA spoofing to avoid WAF/bot detection; many sites fingerprint requests, making this critical for reliable reconnaissance
Rate Limiting & IP Rotation Strategy — Holehe returns rateLimit=true in output but documentation says 'Change your IP'—understanding adaptive rate limiting, backoff strategies, and proxy integration is essential for scaling checks
Response Pattern Matching & Heuristic Account Detection — Each site module must detect 'account exists' vs 'not found' via HTTP status codes, response text patterns, or redirect behavior; inconsistent heuristics are a major failure source
OSINT Data Aggregation & Normalization — Holehe standardizes outputs (name, exists, emailrecovery, phoneNumber, others) across wildly different site architectures; this normalization allows downstream correlation and visualization
Reverse-Engineering Web Endpoints via Browser DevTools — Each holehe module requires manual reverse-engineering of a site's password reset flow using network inspection; understanding request/response pairs and CSRF tokens is prerequisite knowledge

soxoj/maigret — Similar OSINT email/username search tool; good reference for multi-site enumeration patterns and concurrent request handling
Raikia/UhOh365 — Credited in holehe README; specialized for O365/Azure account enumeration, complementary to holehe's broad site coverage
trustedsec/social-engineer-toolkit — SET includes email harvesting and OSINT modules; shares similar reconnaissance goals and password reset endpoint abuse techniques
megadose/holehe-maltego — Official Maltego integration for holehe; allows running checks directly from Maltego's graph UI for visual investigation workflows
thewhiteh4t/pwnedornot — Companion tool for checking if email appears in known breaches; often used alongside holehe for comprehensive account compromise assessment

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for module detection logic in holehe/core.py

The repo has 120+ modules across multiple categories (cms, crm, crowfunding, forum, jobs, learning) but there's no visible test suite. This is critical for a tool that performs email validation across diverse platforms. Unit tests would catch regressions when modules are added/modified and ensure the core detection logic works correctly across different response scenarios.

[ ] Create tests/ directory structure mirroring holehe/modules/ organization
[ ] Add unit tests for holehe/core.py focusing on module loading and execution flow
[ ] Add sample tests for at least 3 module categories (e.g., cms/wordpress.py, crm/hubspot.py, forum/mybb.py) testing both positive and negative email detection cases
[ ] Integrate pytest into GitHub Actions workflow (.github/workflows/) to run on each PR

Create module validation and health check framework

With 120+ modules spread across holehe/modules/, there's no visible mechanism to detect broken modules (site changes, dead endpoints, etc.). A validation framework would help maintainers identify which modules need updates and allow contributors to quickly verify their new modules work correctly before submitting PRs.

[ ] Add a new file holehe/module_validator.py with functions to test each module's basic connectivity
[ ] Create a CLI command (add to holehe/core.py or instruments.py) that runs all modules against a test email and reports failures
[ ] Document the validation process in README.md with a section 'Contributing New Modules'
[ ] Add a GitHub Action workflow (.github/workflows/module_health_check.yml) that runs validation weekly to detect broken modules

Extract duplicate request/response handling logic into holehe/instruments.py utilities

Looking at the module structure, individual modules (like holehe/modules/crm/hubspot.py, holehe/modules/cms/wordpress.py) likely contain repeated code for HTTP requests, response parsing, and forgotten password detection. Consolidating these patterns into holehe/instruments.py would reduce code duplication, improve maintainability, and make it easier for contributors to write new modules.

[ ] Audit 5-10 existing modules to identify common patterns (request headers, response status checks, forgotten password endpoint detection)
[ ] Add utility functions to holehe/instruments.py for common patterns (e.g., check_password_reset_endpoint(), parse_response_for_account_existence())
[ ] Refactor 2-3 sample modules (e.g., hubspot.py, wordpress.py, duolingo.py) to use the new utilities
[ ] Update documentation/template for contributors on how to use the new utilities when adding modules

🌿Good first issues

Add pytest unit tests for holehe/modules/social_media/twitter.py and holehe/modules/crm/hubspot.py to validate response parsing—currently no test files visible in repo structure.
Create holehe/modules/payment/ category for PayPal, Stripe, and Square with 3 new site checks, following the signature pattern from existing modules.
Document the async function signature contract in a CONTRIBUTING.md file with a working code template, since README Python example uses snapchat but module file structure is not explained.

⭐Top contributors

Click to expand

@megadose — 56 commits
@novitae — 9 commits
@byt3loss — 6 commits
@pjh-github — 6 commits
@mark-sercombe — 3 commits

📝Recent commits