RepoPilotOpen in app →

andrewyng/context-hub

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 1w ago
  • 22+ active contributors
  • Distributed ownership (top contributor 47% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the “Healthy” badge

Paste into your README — live-updates from the latest cached analysis.

RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/andrewyng/context-hub)](https://repopilot.app/r/andrewyng/context-hub)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/andrewyng/context-hub on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: andrewyng/context-hub

Generated by RepoPilot · 2026-05-06 · Source

Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/andrewyng/context-hub shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

Verdict

GO — Healthy across the board

  • Last commit 1w ago
  • 22+ active contributors
  • Distributed ownership (top contributor 47% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live andrewyng/context-hub repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/andrewyng/context-hub.

What it runs against: a local clone of andrewyng/context-hub — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in andrewyng/context-hub | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 37 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>andrewyng/context-hub</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of andrewyng/context-hub. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/andrewyng/context-hub.git
#   cd context-hub
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of andrewyng/context-hub and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "andrewyng/context-hub(\\.git)?\\b" \\
  && ok "origin remote is andrewyng/context-hub" \\
  || miss "origin remote is not andrewyng/context-hub (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "cli/src/index.js" \\
  && ok "cli/src/index.js" \\
  || miss "missing critical file: cli/src/index.js"
test -f "cli/src/lib/registry.js" \\
  && ok "cli/src/lib/registry.js" \\
  || miss "missing critical file: cli/src/lib/registry.js"
test -f "cli/src/commands/search.js" \\
  && ok "cli/src/commands/search.js" \\
  || miss "missing critical file: cli/src/commands/search.js"
test -f "cli/src/lib/bm25.js" \\
  && ok "cli/src/lib/bm25.js" \\
  || miss "missing critical file: cli/src/lib/bm25.js"
test -f "cli/src/mcp/server.js" \\
  && ok "cli/src/mcp/server.js" \\
  || miss "missing critical file: cli/src/mcp/server.js"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 37 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~7d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/andrewyng/context-hub"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Context Hub is a CLI tool that provides coding agents with curated, versioned API documentation and skills in markdown format, preventing hallucination and enabling agents to improve over time through local annotations and feedback loops. It includes a package (@aisuite/chub) published to npm with commands like chub search, chub get, and chub annotate to fetch, manage, and improve documentation for LLM agents. Single package structure under /cli with bin/ (chub and chub-mcp executables), src/commands/ (annotate, build, cache, feedback, get, search, update), src/lib/ (utilities for config, cache, BM25 search, frontmatter parsing), src/mcp/ (Model Context Protocol server integration), and cli/skills/ (reusable agent skills like get-api-docs). Built content is compiled to dist/ via prepublish script.

Who it's for

AI/coding agents (like Claude Code) and the engineers building agentic systems who need their agents to have reliable, up-to-date API documentation without hallucinating. Also content maintainers who want to contribute open-source docs that agents can reference.

Maturity & risk

Actively developed but early-stage: v0.1.4 on npm, has GitHub Actions CI/CD pipelines (ci.yml, deploy-content.yml, publish.yml), includes vitest test infrastructure, but limited test presence (only cli/test/e2e.test.js visible). Not yet production-stable but core infrastructure is in place.

Moderate risk: depends on @modelcontextprotocol/sdk (^1.27.1, relatively new ecosystem), small dependency footprint mitigates risk somewhat. Single maintainer (andrewyng) visible. No indication of open issue backlog or breaking changes in the data provided, but early version number (0.1.4) suggests API surface may shift.

Active areas of work

Primary focus appears to be establishing the content distribution pipeline (deploy-content.yml workflow suggests CDN publication to cdn.aichub.org), MCP server integration (chub-mcp binary), and building out agent skill discovery via chub search. Annotation and feedback systems are core features being built to close the self-improving loop.

Get running

Clone the repo and install: git clone https://github.com/andrewyng/context-hub && cd context-hub/cli && npm install. Then run npm run test to verify setup, or ./bin/chub help to see available commands.

Daily commands: CLI commands: chub search [query], chub get <id> [--lang py|js], chub annotate <id> <note>, chub feedback <id> <up|down>. For development: npm run test (vitest), npm run test:watch for file watching, npm run prepublish to build content from ../content directory to dist/.

Map of the codebase

  • cli/src/index.js — Main CLI entry point that dispatches commands; every contributor must understand the command routing pattern
  • cli/src/lib/registry.js — Core abstraction for searching and retrieving documentation; handles multi-source lookups and caching
  • cli/src/commands/search.js — Search command implementation using BM25 ranking; foundational for user-facing search experience
  • cli/src/lib/bm25.js — BM25 full-text search implementation; critical for relevance ranking of documentation
  • cli/src/mcp/server.js — MCP protocol server that exposes tools to AI agents; bridges CLI functionality to LLM integration
  • cli/package.json — Declares CLI entry points (chub, chub-mcp) and build/publish scripts; defines public API surface
  • README.md — Explains the purpose (curated docs for coding agents) and primary use cases; context for all design decisions

How to make changes

Add a new search command or variant

  1. Create a new command file in cli/src/commands/ (cli/src/commands/newcmd.js)
  2. Import registry to perform searches or fetch docs (cli/src/lib/registry.js)
  3. Register the command in the dispatcher (cli/src/index.js)
  4. Add test file in cli/tests/commands/ (cli/tests/commands/newcmd.test.js)

Add new documentation content

  1. Create a new directory under content/ with package name (e.g., content/mylib/) (content/mylib/docs/package/python/DOC.md)
  2. Add markdown files with YAML frontmatter (keywords, tags, etc.) (cli/src/lib/frontmatter.js)
  3. Run build to index the new content (cli/src/commands/build.js)
  4. Verify content is searchable via registry (cli/src/lib/registry.js)

Expose a new tool via MCP

  1. Define tool schema and implementation in tools.js (cli/src/mcp/tools.js)
  2. Register tool in server initialization (cli/src/mcp/server.js)
  3. Implement underlying command or registry method (cli/src/commands/)
  4. Add test to verify tool is discoverable via MCP (cli/tests/mcp/tools.test.js)

Improve search ranking or filtering

  1. Modify BM25 parameters or add filtering logic (cli/src/lib/bm25.js)
  2. Update registry search method to apply new filters (cli/src/lib/registry.js)
  3. Add unit tests for new ranking behavior (cli/tests/lib/bm25.test.js)
  4. Run search command tests to verify no regressions (cli/tests/commands/search.test.js)

Why these technologies

  • Node.js CLI (bin/chub, bin/chub-mcp) — Lightweight, scriptable entry point for agents; easily invokable from shell or as subprocess
  • BM25 full-text search (bm25.js) — Proven relevance ranking algorithm; no external search engine required; computable in-process with minimal latency
  • Markdown + YAML frontmatter — Human-readable, version-control friendly, maintainable in Git; enables transparent auditing of what agents read
  • MCP (Model Context Protocol) — Standardized protocol for exposing CLI tools to LLM agents; enables integration with Claude, other AI models
  • In-process caching (cache.js) — Reduces repeated I/O and index reloads; keeps latency under ~50ms for cache hits

Trade-offs already made

  • Index rebuilt on each build rather than incremental updates

    • Why: Simpler implementation; acceptable because content repo is modest in size and static
    • Consequence: Build time linear with content size; not suitable for millions of docs
  • BM25 implemented from scratch rather than using external search library

    • Why: Eliminates dependencies; keeps CLI lightweight and CLI executable size minimal
    • Consequence: No advanced features (faceted search, fuzzy matching); must maintain ranking algorithm ourselves
  • Single-process, single-threaded architecture

    • Why: Simplicity; fits use case of CLI invocations and single-agent queries
    • Consequence: Not suitable for high-concurrency server deployments
  • Content stored as flat markdown files, not a database

    • Why: Enables Git-based versioning, community contributions, full auditability
    • Consequence: Search index must be pre-built; schema changes require rebuild cycle

Non-goals (don't propose these)

  • Does not provide authentication or access control (all content is public)
  • Does not support real-time document updates or live indexing
  • Does not replace a full documentation platform (e.g., Sphinx, ReadTheDocs)
  • Not designed for private/enterprise documentation (content is open-source only)
  • Does not provide a GUI or web interface

Traps & gotchas

The prepublish script expects a ../content directory parallel to /cli containing source markdown docs. MCP server stdin/stdout lifecycle (stdio-lifecycle.js) requires correct process signal handling. The package uses ES modules exclusively (type: "module"), so CommonJS require() will fail. Telemetry is built in (posthog-node) and sends usage data by default. Language-specific doc variants are selected via --lang flag but require docs to be authored with language variants in frontmatter.

Architecture

Concepts to learn

  • BM25 (Best Matching 25) — Context Hub implements BM25 for full-text search relevance ranking in bm25.js; understanding this probabilistic model is essential to improving search quality and debug ranking issues
  • Model Context Protocol (MCP) — chub exposes itself as an MCP server (chub-mcp binary) so agents like Claude can call it as a native tool; MCP is the integration layer between chub and AI agents
  • YAML Frontmatter — Markdown docs in context-hub use YAML frontmatter to store metadata (language, version, tags); the frontmatter parser (lib/frontmatter.js) is critical to extracting this metadata for indexing and filtering
  • Content Versioning & Language Variants — Agents can request Python or JavaScript versions of the same API doc (--lang flag); the system must track versions and language-specific variants in metadata and serve correct variants on fetch
  • Telemetry & Analytics Loop — chub integrates PostHog telemetry (lib/telemetry.js) and implements a feedback loop (chub feedback up/down); understanding this loop is critical to contributing to the self-improving agent goal
  • Markdown to Tarball Distribution — The prepublish script builds markdown content into a CDN-hosted tarball (dist/) for efficient agent retrieval; understanding the tar.js integration and CDN URL structure is needed to extend content distribution
  • Local Annotation Persistence — Agents can annotate docs locally (chub annotate <id> <note>) and these persist across sessions in the cache; this is the key mechanism enabling self-improvement and requires understanding cache storage patterns

Related repos

  • anthropics/anthropic-sdk-python — Similar agent-friendly tooling model; relevant for understanding how SDKs expose docs to LLM agents
  • openai/openai-python — Production example of versioned, language-specific API documentation that agents need to access reliably
  • modelcontextprotocol/python-sdk — The Model Context Protocol standard that chub-mcp implements; understand MCP to extend context-hub's agent integration
  • langchain-ai/langchain — Large agent framework that could integrate chub as a retrieval tool; potential integration target
  • slm-lang/slm — Alternative agent skill/knowledge management system; competitive approach worth studying for feature gaps

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for cli/src/lib/bm25.js search algorithm

The BM25 search implementation in bm25.js is a core component for the 'chub search' command, but there are no visible unit tests for it in cli/test/. This is critical infrastructure that needs test coverage for edge cases (empty queries, special characters, ranking accuracy). Adding tests would improve reliability and make future refactoring safer.

  • [ ] Create cli/test/bm25.test.js with test cases for tokenization, scoring, and ranking
  • [ ] Add tests for edge cases: empty queries, unicode characters, stemming behavior
  • [ ] Test ranking accuracy with realistic documentation snippets from cli/test/fixtures/
  • [ ] Verify tests pass with 'npm test' and report coverage for bm25.js

Add integration tests for cli/src/mcp/server.js and tools.js (MCP protocol implementation)

The Model Context Protocol server in src/mcp/ enables agents to use chub as an MCP tool, but there are no tests validating the MCP protocol compliance or tool invocation behavior. The existing e2e.test.js focuses on CLI commands, not MCP. This needs coverage to ensure agent integrations work reliably.

  • [ ] Create cli/test/mcp-server.test.js to test MCP server initialization and lifecycle (using stdio-lifecycle.js)
  • [ ] Add tests for tool definitions returned by tools.js (validate schema, required parameters)
  • [ ] Test tool invocation end-to-end: send sample MCP requests and verify response format
  • [ ] Validate that tools.js correctly wraps CLI commands (annotate, search, get) for MCP clients

Add tests for cli/src/lib/config.js and identity.js user configuration handling

The config.js and identity.js modules manage user settings and telemetry identity, but there are no visible tests for configuration loading, persistence, or identity generation. This is important for reliability since misconfiguration could break telemetry or user preferences. Tests should cover config file parsing, defaults, and identity generation.

  • [ ] Create cli/test/config.test.js with tests for loading config from files, environment variables, and defaults
  • [ ] Test config persistence and format validation using zod schemas
  • [ ] Create cli/test/identity.test.js to verify identity generation, storage, and retrieval logic
  • [ ] Add tests for edge cases: corrupted config files, missing directories, permission errors

Good first issues

  • Add missing unit tests for cli/src/lib/frontmatter.js and cli/src/lib/normalize.js — currently only e2e.test.js exists; these utilities parse doc structure and need coverage.
  • Expand cli/test/fixtures/acme/docs with additional versioned-api examples — the fixture only has sparse test data, making it hard to test search relevance ranking across realistic doc hierarchies.
  • Document the content schema: create docs/content-schema.md explaining the required YAML frontmatter fields (language, version, etc.) that authors must follow — currently inferred from code only.

Top contributors

Recent commits

  • 04c149c — Merge pull request #227 from andrewyng/increment-version-0.1.4 (rohitprasad15)
  • 19a2c30 — Bump version to 0.1.4 (rohitprasad15)
  • cb6e74d — Update OpenAI docs for latest model guidance (#226) (Ivanye2509)
  • 9be8ba1 — Add versioned chub help (#225) (Ivanye2509)
  • eeb895b — Merge pull request #224 from ryanchentw/feature/update-get-api-docs-skill (Ivanye2509)
  • 1242e30 — feat: update get-api-docs skill (ryanchentw)
  • 596506e — Merge pull request #179 from landing-ai/landingai-ade-skill-release-20260320 (Ivanye2509)
  • cb21fe6 — Updated docs (cat101)
  • 32373f6 — Updated skills (cat101)
  • c0cc60f — Merge pull request #21 from landing-ai/add-landingai-ade-content (Ivanye2509)

Security observations

Context Hub demonstrates a reasonable security posture for an open-source CLI tool, but has several areas for improvement. The most critical issue is the insecure vulnerability reporting process that exposes security issues publicly. Secondary concerns include telemetry collection practices, input validation across commands, and lack of content integrity verification

  • Medium · Insecure Vulnerability Reporting Process — SECURITY.md. The SECURITY.md file directs users to report vulnerabilities via public GitHub issues with a 'security' label. This exposes vulnerability details publicly before patching, violating responsible disclosure practices. GitHub has a dedicated private vulnerability reporting feature that should be used instead. Fix: Update SECURITY.md to direct users to GitHub's private vulnerability reporting feature (Settings > Security & Analysis > Report a vulnerability) or establish a private security@aichub.org contact email for responsible disclosure.
  • Medium · Telemetry Data Collection Without Explicit Consent — cli/src/lib/telemetry.js, cli/src/lib/analytics.js, package.json (posthog-node dependency). The codebase includes PostHog telemetry (posthog-node dependency) that collects usage data. While SECURITY.md mentions it can be disabled via config, the default behavior appears to be enabled, and users may not be aware they're being tracked. Fix: Ensure telemetry is opt-in rather than opt-out by default. Add prominent documentation during installation and initial setup about data collection. Verify that no sensitive data (command arguments, file paths, API keys) is included in telemetry payloads.
  • Medium · Hashed Machine Identifier Uniqueness Risk — cli/src/lib/identity.js, SECURITY.md. SECURITY.md references a 'hashed machine identifier' for telemetry. If the hashing algorithm is weak or uses insufficient entropy, it could potentially be reversed or collided, compromising user privacy. Fix: Document the hashing algorithm used (should be SHA-256 or stronger). Verify that the identifier includes sufficient entropy (MAC address, hardware ID, and timestamp). Consider using a randomly generated identifier stored locally instead of hardware-based identification.
  • Low · Missing Dependency Version Pinning — cli/package.json (all dependencies). The package.json uses caret (^) versioning for all dependencies, allowing minor and patch version updates automatically. While generally acceptable, this could introduce unexpected breaking changes or security issues in transitive dependencies. Fix: Consider pinning major versions more strictly, or regularly audit dependencies using 'npm audit' and 'npm outdated'. Implement automated dependency scanning in CI/CD (GitHub's Dependabot is already in use based on .github/workflows).
  • Low · No Input Validation on CLI Arguments — cli/src/commands/ (all command files). The codebase includes multiple CLI commands (search, get, annotate, etc.) that accept user input. Without visible input validation in command files, there could be risks for injection attacks or unexpected behavior. Fix: Implement comprehensive input validation and sanitization for all CLI commands using the zod dependency (already present in package.json). Validate file paths, API parameters, and search queries before processing.
  • Low · Tar Archive Extraction Without Validation — cli/src/lib/ (likely in cache.js or registry.js), package.json (tar dependency). The tar dependency is included but there's no visible validation of tar archive contents before extraction. This could potentially allow directory traversal or symlink attacks. Fix: When extracting tar archives, validate that extracted paths stay within the intended directory. Use tar options like --strip-components carefully and validate all paths before extraction. Consider using a library with built-in security validations.
  • Low · Public API Registry Over HTTP Risk — cli/package.json (prepublish script), cli/src/lib/registry.js (likely). The prepublish script references 'https://cdn.aichub.org/v1' and likely fetches docs from api.aichub.org. While HTTPS is used, there's no visible certificate pinning or integrity verification of fetched content. Fix: Implement content integrity verification (cryptographic signatures or subresource integrity hashes) for all fetched documentation. Consider certificate pinning for api.aichub.org. Document expected TLS versions and cipher suites.

LLM-derived; treat as a starting point, not a security audit.

Where to read next


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · andrewyng/context-hub — RepoPilot