RepoPilotOpen in app →

PuerkitoBio/goquery

A little like that j-thing, only in Go.

Healthy

Healthy across the board

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 4w ago
  • 10 active contributors
  • BSD-3-Clause licensed
Show 3 more →
  • CI configured
  • Tests present
  • Concentrated ownership — top contributor handles 61% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/puerkitobio/goquery)](https://repopilot.app/r/puerkitobio/goquery)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/puerkitobio/goquery on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: PuerkitoBio/goquery

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/PuerkitoBio/goquery shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 4w ago
  • 10 active contributors
  • BSD-3-Clause licensed
  • CI configured
  • Tests present
  • ⚠ Concentrated ownership — top contributor handles 61% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live PuerkitoBio/goquery repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/PuerkitoBio/goquery.

What it runs against: a local clone of PuerkitoBio/goquery — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in PuerkitoBio/goquery | Confirms the artifact applies here, not a fork | | 2 | License is still BSD-3-Clause | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 58 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>PuerkitoBio/goquery</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of PuerkitoBio/goquery. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/PuerkitoBio/goquery.git
#   cd goquery
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of PuerkitoBio/goquery and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "PuerkitoBio/goquery(\\.git)?\\b" \\
  && ok "origin remote is PuerkitoBio/goquery" \\
  || miss "origin remote is not PuerkitoBio/goquery (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
  && ok "license is BSD-3-Clause" \\
  || miss "license drift — was BSD-3-Clause at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 58 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~28d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/PuerkitoBio/goquery"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

goquery is a Go library that brings jQuery-like syntax and chainable DOM manipulation to HTML parsing built on Go's net/html and the cascadia CSS selector engine. It enables developers to query, traverse, and extract data from HTML documents using familiar jQuery selectors and method chains, without the stateful DOM manipulation features jQuery provides. Single-package library structured functionally: core query.go exports the Selection type; modular feature files (array.go, filter.go, expand.go, iteration.go, manipulation.go, property.go) each implement one concern; comprehensive test suite mirrors production code (_test.go files); bench/ directory contains historical performance profiles; testdata/ provides HTML fixtures for tests.

👥Who it's for

Go backend developers and CLI tool authors who need to scrape, parse, and extract data from HTML documents with an intuitive fluent API, rather than learning Go's lower-level net/html package directly or writing verbose CSS selector logic.

🌱Maturity & risk

Production-ready and stable. The README explicitly states the API is frozen and will not break. The project has comprehensive test coverage (array_test.go, query_test.go, filter_test.go, etc.), active CI/CD via GitHub Actions (test.yml), and recent maintenance: v1.12.0 released 2026-03-15 requiring Go 1.25+, indicating ongoing dependency updates. The codebase is well-established with historical benchmarks dating back to v0.1.0.

Minimal risk. Dependencies are lean (only cascadia v1.3.3 and golang.org/x/net v0.53.0), both well-maintained libraries. Single-maintainer concern exists (PuerkitoBio), but the stable API and mature codebase reduce regression risk. The Go version requirement (1.25+) is aggressive but documented; users on older Go versions will hit this constraint immediately.

Active areas of work

Recent maintenance focused on Go version compatibility: v1.11.0 and v1.12.0 bumped minimum Go requirements to 1.24+ and 1.25+ respectively, with corresponding updates to go.mod dependencies. CI matrix covers latest 2 Go versions. No breaking API changes—development is stability and dependency-update focused.

🚀Get running

git clone https://github.com/PuerkitoBio/goquery.git
cd goquery
go test
go test -bench=".*"  # optional: run benchmarks

Daily commands: goquery is a library, not a runnable service. For development: run go test (unit tests), go test -bench=".*" (benchmarks, takes ~minutes), or go test -run TestExample (example tests from example_test.go). Tests parse HTML fixtures from testdata/.

🗺️Map of the codebase

  • query.go: Defines the Selection type (core public API) and NewDocumentFromReader/NewDocument entry points that wrap net/html parsing.
  • expand.go: Implements DOM traversal methods (Find, Parent, Parents, Closest, Next, Prev, etc.) that chain Selection operations.
  • filter.go: Provides filtering operations (Filter, Has, Not, Eq) matching jQuery's selector and element filtering semantics.
  • iteration.go: Contains Each and Map iterator methods using Go 1.23+ function-based iteration protocol for chainable loops.
  • property.go: Extracts element attributes (Attr, AttrOr), text content (Text, Html), and DOM properties (Length, Size, Index).
  • array.go: Implements Selection array-like operations (First, Last, Slice, Get) matching jQuery's element access patterns.
  • go.mod: Specifies Go 1.25+ requirement and minimal dependencies: cascadia (CSS selectors) and golang.org/x/net (HTML parsing fallback).
  • testdata/: Contains real HTML files (gotesting.html, gowiki.html, page.html) used as fixtures across all *_test.go files for reproducible testing.

🛠️How to make changes

Adding selectors/queries: extend query.go or add methods to Selection type in query.go. Adding traversal methods: edit expand.go (Find, Closest, Parent) or iteration.go (Each, Map). Adding filters: modify filter.go. Adding element properties/attributes: update property.go. Testing changes: create _test.go companion file mirroring the feature file, use testdata/*.html as fixtures. Run go test ./... after changes.

🪤Traps & gotchas

UTF-8 requirement: goquery depends on Go's net/html which only accepts UTF-8; callers must handle encoding conversion before parsing (README explicitly warns this). No DOM mutation state: unlike jQuery, goquery cannot modify parsed HTML in-memory (no css(), detach(), height()) because net/html returns immutable node trees—this surprises jQuery developers. Cascadia CSS limitations: not all jQuery selectors work; :contains() pseudo-selector syntax differs (use :contains() in cascadia vs jQuery); check cascadia documentation for edge cases. Go 1.25+ hard requirement: v1.12.0 breaks compatibility with Go 1.24 and earlier—no graceful degradation.

💡Concepts to learn

  • CSS Selector Parsing & Cascading Style Sheets Selectors (CSS3) — goquery's entire query engine depends on cascadia implementing CSS selectors; understanding selector syntax (combinators, pseudo-classes, attribute selectors) is essential to write effective Find() and Filter() calls.
  • DOM Tree Traversal & Immutable Node Models — Go's net/html returns immutable trees of html.Node pointers; goquery wraps these without rebuilding or modifying the tree, forcing functional-style navigation rather than jQuery's imperative DOM mutations.
  • Method Chaining & Fluent Interface Pattern — goquery's core API design uses chainable methods (Selection.Find().Filter().Each()) to emulate jQuery's fluent style; understanding receiver methods and return types is essential to write idiomatic goquery code.
  • Iterator Protocol & Function-Based Iterators (Go 1.23+) — goquery's Each() and Map() methods use Go 1.23+ function iterators instead of traditional for loops; this pattern is central to chainable iteration in modern goquery (v1.10+).
  • HTML Parsing & Character Encoding (UTF-8 Requirement) — goquery requires UTF-8 input because net/html enforces it; incorrect or mixed-encoding sources fail silently or parse incorrectly, making encoding detection/conversion a critical preprocessing step for web scraping.
  • jQuery API Surface & jQuery Semantics — goquery intentionally mirrors jQuery method names and behavior (e.g., eq(), index(), attr()) to reduce learning curve for JavaScript developers; understanding jQuery's unintuitive decisions (like index() returning -1 for missing elements) helps predict goquery's exact behavior.
  • andybalholm/cascadia — The CSS selector engine goquery depends on; necessary to understand for debugging selector mismatches or contributing selector enhancements.
  • chromedp/chromedp — Alternative for Go developers needing full browser automation and DOM mutation (JavaScript execution); goquery is lightweight parsing-only.
  • colly/colly — Higher-level Go web scraping framework built partially on goquery-like patterns; offers request queuing, rate limiting, and distributed crawling.
  • golang/net — The stdlib net/html package goquery wraps; understanding html.Node tree structure is essential for advanced goquery usage.
  • jquery/jquery — The original JavaScript library goquery mirrors; API parity is intentional, so jQuery docs clarify goquery method behavior.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive benchmarking CI workflow and baseline metrics

The repo has extensive benchmark files (bench_*.go across 8 different test categories) and historical benchmark data in the bench/ directory, but no automated CI workflow to track performance regressions. This is critical for a DOM parsing library where performance is a key selling point. A new contributor could set up a GitHub Actions workflow that runs benchmarks on each PR, compares against baseline metrics, and fails if regressions exceed a threshold.

  • [ ] Create .github/workflows/benchmark.yml that runs all bench_*_test.go files
  • [ ] Configure benchstat to compare current results against main branch baseline
  • [ ] Add benchmark results as PR comments using benchmark-action or similar
  • [ ] Document baseline thresholds in a BENCHMARKS.md file
  • [ ] Test against the existing bench/ directory historical data to validate the setup

Add missing unit tests for utilities.go functions

The repo has utilities.go (utility helper functions) but utilities_test.go likely has incomplete coverage given that utilities functions are foundational. Cross-referencing the file structure, utilities.go exports likely string/slice manipulation helpers that are used across array.go, property.go, and other modules. A new contributor should audit and expand test coverage for edge cases.

  • [ ] Review utilities.go to identify all exported functions
  • [ ] Audit utilities_test.go for coverage gaps (use go test -cover)
  • [ ] Add test cases for edge cases: empty inputs, nil values, unicode strings, malformed HTML entities
  • [ ] Ensure each utility function has at least 3-5 test cases covering happy path and error conditions
  • [ ] Run benchmarks to ensure test additions don't degrade performance

Add integration tests using testdata HTML files and document test patterns

The repo includes 5 testdata HTML files (gotesting.html, gowiki.html, metalreview.html, page.html, page2.html, page3.html) which appear to be integration test fixtures, but there's no documented test pattern guide for contributors. A new contributor should create integration tests that exercise real-world HTML parsing scenarios across these files and document the testing patterns in doc/testing.md.

  • [ ] Create example_test.go patterns documenting how to load and parse testdata HTML files
  • [ ] Add integration test cases in example_test.go that combine: query selection + traversal + property extraction across all 5 testdata files
  • [ ] Create doc/testing.md documenting: how to add new testdata files, naming conventions, and integration test patterns
  • [ ] Add at least one complex selector chain test per testdata file (e.g., chaining Find() + Filter() + Eq())
  • [ ] Verify tests pass and coverage includes traversal.go and filter.go code paths

🌿Good first issues

  • Add missing tests for manipulation.go (no manipulation_test.go file exists in the listing, yet manipulation.go is referenced)—create comprehensive tests covering Append, PrependHtml, SetAttr, and RemoveClass operations against testdata/ fixtures.
  • Expand documentation in doc/tips.md with real examples showing encoding handling for non-UTF8 HTML (README warns about this but provides no actual working code samples)—contribute a section with iconv/charset conversion examples.
  • Benchmark suite organization: migrate historical bench/ .svg files (v0.1.0 through v1.0.1c) into a structured Go benchmark with go test -bench syntax instead of loose files—modernize performance tracking to integrate with CI/CD.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • c92c451 — Merge pull request #542 from PuerkitoBio/dependabot/go_modules/golang.org/x/net-0.53.0 (mna)
  • 43d03d8 — Bump golang.org/x/net from 0.52.0 to 0.53.0 (dependabot[bot])
  • 401642b — Update readme to prepare for 1.12 release (mna)
  • afd9326 — Merge pull request #540 from PuerkitoBio/dependabot/go_modules/golang.org/x/net-0.52.0 (mna)
  • f799f78 — Bump golang.org/x/net from 0.50.0 to 0.52.0 (dependabot[bot])
  • 805d7b1 — Update CI to 1.25 and 1.26 (mna)
  • 7dc7e64 — Merge pull request #538 from PuerkitoBio/dependabot/go_modules/golang.org/x/net-0.50.0 (mna)
  • 3020ebb — Bump golang.org/x/net from 0.49.0 to 0.50.0 (dependabot[bot])
  • ee4bcdb — Merge pull request #537 from PuerkitoBio/dependabot/go_modules/golang.org/x/net-0.49.0 (mna)
  • f3bc303 — Bump golang.org/x/net from 0.48.0 to 0.49.0 (dependabot[bot])

🔒Security observations

The goquery codebase demonstrates reasonable security practices with active dependency management via Dependabot and a clean dependency tree. The primary concerns are: (1) an unusual Go version specification that should be corrected, (2) the inherent risks of parsing untrusted HTML without documented safeguards, and (3) lack of explicit security documentation for users. The codebase itself is a mature parsing library with no obvious code-level vulnerabilities detected. Dependencies are minimal and well-maintained. No hardcoded secrets, exposed credentials, or infrastructure misconfigurations were identified.

  • Medium · Outdated Go Version Specification — go.mod. The go.mod file specifies Go version 1.25.0, which appears to be a future or non-existent version. This could indicate version management issues or misconfigurations that might affect dependency resolution and security patching. Fix: Update the Go version to a stable, currently supported release (e.g., 1.21.x or 1.22.x). Verify the intended Go version matches your build environment.
  • Low · golang.org/x/net Dependency Version — go.mod. The golang.org/x/net v0.53.0 dependency is used. While this is a generally maintained package, it's important to ensure this version is current and receive security updates regularly. The x/net package handles network operations and can be a vector for vulnerabilities. Fix: Regularly update golang.org/x/net to the latest stable version. Consider enabling Dependabot alerts (already present in .github/dependabot.yml) and review security advisories periodically.
  • Low · HTML Parsing Input Validation — query.go, type.go. goquery is an HTML parser library based on net/html. While the library itself requires UTF-8 encoded input, there's potential risk if users pass untrusted HTML without proper validation. The library itself doesn't appear to have explicit protections against malicious HTML structures (e.g., deeply nested elements causing DoS). Fix: Document input validation requirements clearly. Users should validate and sanitize untrusted HTML inputs before parsing. Consider implementing limits on parsing depth or document size to prevent DoS attacks.
  • Low · Missing Security Documentation — README.md, doc/tips.md. The README and codebase lack explicit security guidelines or warnings about using goquery with untrusted content. This could lead to users inadvertently creating vulnerabilities in their applications. Fix: Add a security section to the README documenting: safe handling of untrusted HTML, encoding considerations, and potential DoS risks. Include examples of secure usage patterns.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · PuerkitoBio/goquery — RepoPilot