RepoPilotOpen in app →

mikefarah/yq

yq is a portable command-line YAML, JSON, XML, CSV, TOML, HCL and properties processor

Healthy

Healthy across the board

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 2d ago
  • 16 active contributors
  • Distributed ownership (top contributor 39% of recent commits)
Show 3 more →
  • MIT licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/mikefarah/yq)](https://repopilot.app/r/mikefarah/yq)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/mikefarah/yq on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: mikefarah/yq

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/mikefarah/yq shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 2d ago
  • 16 active contributors
  • Distributed ownership (top contributor 39% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live mikefarah/yq repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/mikefarah/yq.

What it runs against: a local clone of mikefarah/yq — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in mikefarah/yq | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 32 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>mikefarah/yq</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of mikefarah/yq. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/mikefarah/yq.git
#   cd yq
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of mikefarah/yq and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "mikefarah/yq(\\.git)?\\b" \\
  && ok "origin remote is mikefarah/yq" \\
  || miss "origin remote is not mikefarah/yq (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "cmd/root.go" \\
  && ok "cmd/root.go" \\
  || miss "missing critical file: cmd/root.go"
test -f "pkg/yqlib" \\
  && ok "pkg/yqlib" \\
  || miss "missing critical file: pkg/yqlib"
test -f "go.mod" \\
  && ok "go.mod" \\
  || miss "missing critical file: go.mod"
test -f "cmd/evaluate_all_command.go" \\
  && ok "cmd/evaluate_all_command.go" \\
  || miss "missing critical file: cmd/evaluate_all_command.go"
test -f "cmd/evaluate_sequence_command.go" \\
  && ok "cmd/evaluate_sequence_command.go" \\
  || miss "missing critical file: cmd/evaluate_sequence_command.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 32 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~2d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/mikefarah/yq"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

yq is a lightweight, portable command-line processor for YAML, JSON, XML, CSV, TOML, HCL, and properties files that uses jq-like syntax for querying and transforming structured data. Written in Go, it enables users to perform complex data transformations (filtering, mapping, merging) across multiple formats without format conversion boilerplate, all as a single static binary. Single-binary architecture: cmd/ directory contains entry points (root.go, evaluate_all_command.go, evaluate_sequence_command.go) wrapping a jq expression parser (via alecthomas/participle). Input handling supports multiple formats via dedicated decoders (YAML via goccy/go-yaml, XML, JSON, TOML, HCL, CSV). Output formatting via output encoder layer. Expression evaluation engine interprets parsed AST against input documents.

👥Who it's for

DevOps engineers and SREs who manage infrastructure-as-code (Kubernetes manifests, Terraform configs, Docker Compose files), CI/CD pipeline builders who need to parse and modify config files in scripts, and system administrators who need to programmatically edit YAML/JSON configs without installing language runtimes.

🌱Maturity & risk

Production-ready and actively maintained. The codebase shows 1.17M LOC in Go with established CI/CD via GitHub Actions (go.yml, docker-release.yml, release.yml), comprehensive acceptance test suite (acceptance_tests/ directory), and linting configured (.golangci.yml). Multi-platform binary distribution strategy (goreleaser.yaml) and Docker packaging indicate stable, long-term project.

Relatively low risk: single maintainer (mikefarah) is a known concern but mitigated by strong test coverage and automated release pipelines. Dependency surface is moderate (~25 direct deps) with well-maintained libraries (spf13/cobra for CLI, goccy/go-yaml for parsing). No obvious abandoned dependencies; go.mod targets Go 1.25.0. Breaking changes unlikely within v4 series given version constraint.

Active areas of work

Actively developed with focus on format support expansion and jq compatibility. Recent workflow files show continuous Docker releases, Go linting enforcement, and security scanning (CodeQL, Scorecard workflows). Dependabot configuration indicates automatic dependency updates. Test suite expansion evident from multiple acceptance_tests/ shell scripts covering edge cases (empty inputs, NUL separators, front-matter, format auto-detection).

🚀Get running

git clone https://github.com/mikefarah/yq.git && cd yq && go build -o yq ./cmd && ./yq --version. Or use: go install github.com/mikefarah/yq/v4@latest. For Docker: docker run mikefarah/yq --version.

Daily commands: make yq builds the binary into the current directory. make test runs Go test suite. ./yq '.foo' file.yaml for basic usage. yq -i '.foo = "bar"' file.yaml for in-place edits. See Makefile and Makefile.variables for all targets.

🗺️Map of the codebase

  • cmd/root.go — Entry point for the CLI application; defines the main command structure and orchestrates all subcommands and flag parsing.
  • pkg/yqlib — Core library package containing all expression evaluation, data manipulation, and format processing logic—the heart of yq's functionality.
  • go.mod — Defines all external dependencies including YAML parsers (goccy/go-yaml), expression parsers (participle), and format handlers (go-toml, hcl).
  • cmd/evaluate_all_command.go — Implements the primary evaluation command that processes multiple documents and applies expressions—core user-facing functionality.
  • cmd/evaluate_sequence_command.go — Handles streaming evaluation of sequences; enables efficient processing of large or piped YAML/JSON inputs.
  • Makefile — Build and test orchestration; critical for understanding how to build, test, and release the project locally.

🧩Components & responsibilities

  • CLI Layer (cmd/root.go, cmd/evaluate_*_command.go) — Parses command-line arguments, manages flags, routes to evaluation

🛠️How to make changes

Add Support for a New Output Format

  1. Add the format library as a dependency in go.mod (e.g., go get github.com/example/format-parser) (go.mod)
  2. Create a format handler in pkg/yqlib (typically a struct implementing reader/writer interfaces for the new format) (pkg/yqlib)
  3. Register the format in the format enumeration and add command flags in cmd/root.go for input/output format selection (cmd/root.go)
  4. Add acceptance tests in acceptance_tests/ directory to verify the new format works with existing yq expressions (acceptance_tests)
  5. Add example input/output files in examples/ to document usage (examples)

Add a New Built-in Function or Operator

  1. Implement the function logic in pkg/yqlib (likely in an operator or builtin functions module within the library) (pkg/yqlib)
  2. Register the function in the expression parser/evaluator so it is recognized by the jq-like syntax parser (pkg/yqlib)
  3. Add unit tests alongside the implementation in pkg/yqlib (pkg/yqlib)
  4. Create an acceptance test in acceptance_tests/ demonstrating the new function in real-world scenarios (acceptance_tests)

Add a New CLI Flag or Command Option

  1. Define the flag in cmd/root.go using Cobra's flag registration (e.g., rootCmd.PersistentFlags().StringVar()) (cmd/root.go)
  2. Pass the flag value through to the evaluation command in cmd/evaluate_all_command.go or cmd/evaluate_sequence_command.go (cmd/evaluate_all_command.go)
  3. Implement the flag's logic in the appropriate pkg/yqlib module (format handler, evaluator, etc.) (pkg/yqlib)
  4. Add acceptance test in acceptance_tests/flags.sh to verify the flag works correctly (acceptance_tests/flags.sh)

🔧Why these technologies

  • Go — Compile to single static binary, cross-platform support, fast execution, easy distribution without runtime dependencies.
  • goccy/go-yaml — High-performance YAML parser and encoder with good spec compliance and minimal external dependencies.
  • Cobra — Industry-standard Go CLI framework providing automatic shell completion, subcommands, and structured flag management.
  • participle/v2 — PEG parser generator enabling expressive jq-like syntax parsing without hand-written parser code.
  • Multiple format libraries — Modular approach using best-of-breed libraries for each format (go-toml, hashicorp/hcl, go-ini, etc.) rather than building all parsers in-house.

⚖️Trade-offs already made

  • jq-like syntax instead of native yq DSL

    • Why: Familiar to users of jq; lowers learning curve; leverages existing jq knowledge base and community.
    • Consequence: Cannot implement 100% of jq semantics; some advanced jq features may not be supported; parser complexity increases.
  • Format-agnostic expression evaluation

    • Why: Single expression engine works across YAML, JSON, XML, CSV, TOML, etc.; reduces code duplication; unified mental model.
    • Consequence: Format-specific features (e.g., XML attributes, CSV column semantics) must be normalized to a common AST; potential semantic loss.
  • Command-line tool rather than library-only

    • Why: Maximum accessibility; users do not need to write Go code; works in shell scripts, CI/CD pipelines, Docker containers.
    • Consequence: CLI design decisions constrain library API; less flexibility for programmatic use cases; version compatibility concerns.
  • Streaming evaluation (evaluate_sequence_command)

    • Why: Handles large multi-document files efficiently; enables piping; scales to huge datasets without loading entire file into memory.
    • Consequence: Adds architectural complexity; some operations (e.g., sorting across documents) must be aware of streaming boundaries.

🚫Non-goals (don't propose these)

  • Support 100% of jq functionality—yq is intentionally a subset tailored for YAML and multi-format workflows.
  • Interactive REPL mode—yq is designed for command-line usage and scripts, not interactive exploration.
  • Schema validation or type enforcement—yq is a processor, not a validator.
  • Custom user-defined functions (Lua support exists but limited)—yq focuses on built-in operators.
  • Real-time file watching or change streams—yq processes static files; file system monitoring is out of scope.

🪤Traps & gotchas

No required environment variables for basic development build. Go 1.25.0 is explicitly required (go.mod constraint)—older versions will fail. In-place edit flag (-i) may silently overwrite files without backup; test expressions without -i first. Expression parsing uses jq syntax but not all jq functions are implemented; test against jq documentation rather than assuming feature parity. Acceptance tests are shell scripts expecting bash and yq binary in PATH; run make yq first before running acceptance_tests/*.sh individually.

🏗️Architecture

💡Concepts to learn

  • Parser Combinator Pattern — yq uses alecthomas/participle (a PEG parser combinator library) to parse jq-like expressions into an AST; understanding combinator-based parsing helps debug expression parsing errors and add new operators
  • Abstract Syntax Tree (AST) Evaluation — Parsed yq expressions become an AST that is evaluated against input documents; understanding AST traversal patterns is essential to extending expression evaluation and custom function support
  • Streaming vs. Buffered Processing — yq must decide whether to load entire documents into memory or stream them; this choice affects memory footprint for large YAML files and multi-document handling (evaluate_all_command handles this differently than evaluate_sequence_command)
  • Cross-Format Encoding/Decoding — yq's killer feature is transparent format conversion; understanding how decoders abstract YAML/JSON/XML/CSV into a common internal representation (likely a generic tree structure) and encoders serialize back is critical to adding format support
  • jq Language Semantics — yq intentionally mimics jq syntax and semantics (pipe operator, array slicing, recursive descent, map/select/reduce functions); consulting jq documentation is essential for understanding expected behavior and edge cases
  • Lua Embedding for Custom Functions — yq includes gopher-lua for user-defined functions and scripting; understanding Lua sandbox interaction helps extend yq with custom transformations and avoid security pitfalls
  • In-Place File Editing Patterns — yq's -i flag requires atomic file operations (read-transform-write with backup or atomic rename); understanding temp file handling and POSIX atomicity guarantees prevents data loss bugs
  • stedolan/jq — Original jq JSON processor that yq explicitly emulates; yq aims to bring jq's syntax and capabilities to YAML and other formats
  • kislyuk/yq — Alternative Python-based yq processor; the two projects coexist in the ecosystem serving different performance/portability needs (Python vs. static binary)
  • golang-templates/sprig — Go template function library often used alongside yq in CI/CD pipelines for templating and transforming config files; complementary tool in the infrastructure-as-code ecosystem
  • jqlang/jq — Active Go port of jq; shares parser and evaluation algorithm concepts with yq, potential upstream for improving jq compatibility
  • hashicorp/hcl — HCL parser library used directly by yq (github.com/hashicorp/hcl/v2 in go.mod); understanding HCL structure helps with Terraform config processing features

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive acceptance tests for CSV and TSV format processing

The repo supports CSV and TSV formats (mentioned in README), but unlike YAML, JSON, XML, and properties formats which have dedicated acceptance test files (output-format.sh, etc.), there are no visible CSV/TSV-specific acceptance tests in acceptance_tests/. This is a high-value gap since CSV/TSV are complex formats with edge cases (quoted fields, escaping, delimiters, headers). New contributors can add tests covering: basic CSV parsing, TSV parsing, quoted field handling, empty fields, header preservation, and round-trip consistency.

  • [ ] Create acceptance_tests/csv-processing.sh with test cases for basic CSV read/write
  • [ ] Create acceptance_tests/tsv-processing.sh with test cases for TSV-specific behavior
  • [ ] Add tests for edge cases: quoted commas, newlines in fields, empty rows, custom delimiters
  • [ ] Add tests for CSV-to-JSON and JSON-to-CSV conversions
  • [ ] Reference these new tests in any test runner scripts (check Makefile and test-yq.yml)

Add unit tests for cmd/utils.go and cmd/utils_test.go coverage gaps

The utils.go file likely contains core utility functions used across multiple commands (root.go, evaluate_all_command.go, etc.), but the corresponding utils_test.go may have incomplete coverage. Since this is a foundational utility module, improving test coverage here has high impact on overall code quality. New contributors can identify untested functions and add targeted unit tests with realistic inputs/edge cases.

  • [ ] Review cmd/utils.go and cmd/utils_test.go to identify functions with missing test cases
  • [ ] Add unit tests for error handling paths in utility functions
  • [ ] Add tests for edge cases (empty inputs, nil pointers, invalid formats)
  • [ ] Run 'go test -cover ./cmd/...' to measure coverage before and after
  • [ ] Ensure all public utility functions have at least one test case

Create GitHub Actions workflow for testing format auto-detection (inputs-format-auto.sh)

The acceptance_tests/inputs-format-auto.sh test exists but there's no dedicated CI workflow in .github/workflows/ to validate format auto-detection across different file types and edge cases. The existing test-yq.yml likely runs basic tests, but format detection is critical functionality that deserves its own focused workflow with matrix testing across formats (YAML, JSON, XML, CSV, TOML, HCL, properties). This ensures the auto-detection feature doesn't regress.

  • [ ] Review .github/workflows/test-yq.yml to understand current testing structure
  • [ ] Create .github/workflows/test-format-detection.yml with matrix strategy testing
  • [ ] Include tests for: format detection by file extension, content-based detection, mixed format inputs
  • [ ] Add test cases for ambiguous formats (e.g., valid JSON that's also valid YAML)
  • [ ] Configure workflow to run on PR and push to main; reference in CONTRIBUTING.md

🌿Good first issues

  • Add acceptance tests for HCL format (hcl-basic.sh, hcl-nested.sh) — HCL support exists in dependencies but test coverage in acceptance_tests/ is absent; compare against existing toml-.sh or yaml-.sh test patterns
  • Document CSV format edge cases in README — README mentions CSV support but no examples; add section with realistic CSV-to-JSON/YAML transformation examples similar to existing XML examples
  • Extend evaluate_sequence_command_test.go to cover error handling for malformed expressions — test file exists but lacks comprehensive negative test cases for syntax errors and type mismatches; use table-driven test pattern already established in root_test.go

Top contributors

Click to expand

📝Recent commits

Click to expand
  • cb97935 — fix: TOML encoder uses inline tables for YAML FlowStyle mappings, inconsistent with explicit JSON parsing (#2687) (Copilot)
  • cfe2eee — Preserve empty TOML arrays in tables (#2686) (cyphercodes)
  • 1a433d1 — Bump actions/upload-artifact from 4.6.1 to 7.0.1 (#2663) (dependabot[bot])
  • 1c0d8b9 — Bump actions/checkout from 4.2.2 to 6.0.2 (#2668) (dependabot[bot])
  • 0110a3c — Bump golang.org/x/net from 0.52.0 to 0.53.0 (#2669) (dependabot[bot])
  • 54482d4 — Bump golang from 2a2b4b5 to 5f3787b (#2664) (dependabot[bot])
  • 33f3351 — Bump ossf/scorecard-action from 2.4.1 to 2.4.3 (#2665) (dependabot[bot])
  • 6cb656c — Bump alpine from 2510918 to 5b10f43 (#2667) (dependabot[bot])
  • ecc43d7 — fix: reset TOML decoder between files when evaluating all at once (#2685) (terminalchai)
  • 1deec5e — Fix repeatString overflow test on 32-bit platforms (#2680) (jandubois)

🔒Security observations

The yq codebase demonstrates a reasonable security posture for a CLI tool. Key strengths include: proper non-root user execution in Docker, minimal attack surface (no network/HTTP functionality), active use of Dependabot for dependency management, and clear security reporting policy. Main areas of concern are: the Lua scripting capability which could enable code execution if not properly sandboxed, version consistency between go.mod and Dockerfile, and reliance on complex parsers (YAML, HCL, Lua). No hardcoded secrets or injection vulnerabilities were detected in the provided file structure. The project maintains good security practices with CodeQL enabled and a dedicated SECURITY.md file.

  • Medium · Outdated Go Version in go.mod — go.mod, Dockerfile. The go.mod specifies Go 1.25.0, but the Dockerfile uses golang:1.26.2 for building. While this isn't inherently insecure, there's a version mismatch that could lead to inconsistent builds. Additionally, Go 1.25.0 may have security patches available in later 1.25.x or 1.26.x versions. Fix: Update go.mod to match the Dockerfile Go version (1.26.2) to ensure consistent builds across environments. Regularly update to the latest stable Go version to receive security patches.
  • Medium · Lua Scripting Dependency Without Apparent Sandboxing — go.mod, cmd/ directory. The codebase includes github.com/yuin/gopher-lua v1.1.2, which enables Lua script execution. If user-supplied Lua scripts are executed without proper sandboxing, this could allow arbitrary code execution or access to sensitive operations. Fix: Ensure that if Lua scripting is exposed as a feature, it is properly sandboxed with restricted permissions. Document security implications for users executing untrusted scripts. Consider implementing a whitelist of allowed Lua functions.
  • Low · Alpine Linux Base Image Without Specific Version Pin — Dockerfile (production stage). The Dockerfile uses 'alpine:3@sha256:...' which pins to Alpine 3.x but not a specific minor version. While the digest provides immutability, minor version updates could introduce subtle behavioral changes. Fix: Consider pinning to a specific Alpine version (e.g., alpine:3.20@sha256:...) for more predictable and reproducible builds, while still receiving security patches within that minor version.
  • Low · Permissions on Dockerfile Working Directory — Dockerfile (RUN chown -R yq:yq /workdir). The Dockerfile creates a non-root user 'yq' with UID 1000 and changes ownership of /workdir. However, the directory is world-accessible by default, which could allow information disclosure if sensitive files are written there. Fix: Explicitly set restrictive permissions on /workdir after changing ownership: 'RUN chmod 750 /workdir' to prevent other users from accessing the working directory.
  • Low · Dependency on External YAML Parser with Complex Parsing Logic — go.mod. The project uses github.com/goccy/go-yaml v1.19.2 and go.yaml.in/yaml/v4 for YAML parsing. While these are reputable libraries, YAML parsing can be a source of vulnerabilities (billion laughs attacks, XXE-like issues). The project's security scope acknowledges this is acceptable given the CLI nature. Fix: Keep dependencies updated and monitor for security advisories. Document any YAML parsing limitations or security considerations for users processing untrusted YAML files.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · mikefarah/yq — RepoPilot