k8sgpt-ai/k8sgpt

Item: k8sgpt-ai/k8sgpt
Rating: 5
Author: RepoPilot

Giving Kubernetes Superpowers to everyone

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 1d ago
✓23+ active contributors
✓Distributed ownership (top contributor 29% of recent commits)

Show all 6 evidence items →

✓Apache-2.0 licensed
✓CI configured
✓Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/k8sgpt-ai/k8sgpt)](https://repopilot.app/r/k8sgpt-ai/k8sgpt)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/k8sgpt-ai/k8sgpt on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: k8sgpt-ai/k8sgpt

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/k8sgpt-ai/k8sgpt shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 1d ago
23+ active contributors
Distributed ownership (top contributor 29% of recent commits)
Apache-2.0 licensed
CI configured
Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live k8sgpt-ai/k8sgpt repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/k8sgpt-ai/k8sgpt.

What it runs against: a local clone of k8sgpt-ai/k8sgpt — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in k8sgpt-ai/k8sgpt | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>k8sgpt-ai/k8sgpt</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of k8sgpt-ai/k8sgpt. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/k8sgpt-ai/k8sgpt.git
#   cd k8sgpt
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of k8sgpt-ai/k8sgpt and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "k8sgpt-ai/k8sgpt(\\.git)?\\b" \\
  && ok "origin remote is k8sgpt-ai/k8sgpt" \\
  || miss "origin remote is not k8sgpt-ai/k8sgpt (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "main.go" \\
  && ok "main.go" \\
  || miss "missing critical file: main.go"
test -f "cmd/root.go" \\
  && ok "cmd/root.go" \\
  || miss "missing critical file: cmd/root.go"
test -f "cmd/analyze/analyze.go" \\
  && ok "cmd/analyze/analyze.go" \\
  || miss "missing critical file: cmd/analyze/analyze.go"
test -f "pkg/ai" \\
  && ok "pkg/ai" \\
  || miss "missing critical file: pkg/ai"
test -f "cmd/auth/auth.go" \\
  && ok "cmd/auth/auth.go" \\
  || miss "missing critical file: cmd/auth/auth.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/k8sgpt-ai/k8sgpt"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

k8sgpt is a Kubernetes diagnostic tool that scans clusters, identifies issues, and explains them in plain English using AI. It codifies SRE knowledge into specialized analyzers for common Kubernetes problems, then enriches findings with AI backends (OpenAI, Azure, Cohere, Bedrock, Gemini, local models) to provide actionable insights without manual debugging. CLI-driven Go monorepo structured as cmd/ subdirectories (analyze, auth, cache for command groups), with pkg/ containing internal analyzers and LLM client logic. Charts/ contains production-ready Helm deployment templates. The codebase is ~733KB of Go, built around Cobra for CLI framework and Kubernetes client-go for cluster interaction.

👥Who it's for

Platform engineers, SREs, and DevOps teams who manage Kubernetes clusters and need to quickly diagnose cluster issues, resource failures, and misconfigurations. Also useful for newcomers to Kubernetes who lack deep troubleshooting expertise.

🌱Maturity & risk

Production-ready and actively maintained. The project has OpenSSF Best Practices badge, comprehensive CI/CD pipelines (build_container.yaml, golangci_lint.yaml, release.yaml, test.yaml), semantic versioning via release-please, and regular commits. It's available via brew and krew (Kubernetes plugin manager), indicating solid distribution maturity.

Moderate risk: the project has extensive dependencies (AWS SDK v2, Azure SDK, Google Cloud, Cohere, OpenAI) creating a large attack surface; external LLM API keys are required for core functionality, introducing credential management complexity. However, it's backed by an established GitHub organization with governance docs and maintainers list, mitigating single-maintainer risk.

Active areas of work

Active development with semantic PR validation, release-please automation for versioning, and container builds on release. The project has integration documentation (INTEGRATIONS.md), MCP (Model Context Protocol) support, and tracks adopters. Recent focus includes expanding LLM backend support and Kubernetes compatibility.

🚀Get running

git clone https://github.com/k8sgpt-ai/k8sgpt.git
cd k8sgpt
make build
# or install via: brew install k8sgpt-ai/tap/k8sgpt
k8sgpt auth add  # Configure LLM backend
k8sgpt analyze  # Scan your cluster

Daily commands:

make build          # Build binary
make test           # Run tests
make lint           # Run golangci-lint (see .github/workflows/golangci_lint.yaml)
./k8sgpt analyze    # Run analyzer against current kubectl context

🗺️Map of the codebase

main.go — Entry point for k8sgpt CLI application; defines version and initializes the root command.
cmd/root.go — Root cobra command configuration; all subcommands (analyze, auth, cache, etc.) are registered here.
cmd/analyze/analyze.go — Core analyze command implementation; orchestrates AI analysis of Kubernetes resources using configured providers.
pkg/ai — AI provider abstraction layer supporting OpenAI, Azure, Ollama, AWS Bedrock, and SageMaker; critical for extensibility.
cmd/auth/auth.go — Authentication management for AI provider credentials; essential for secure configuration of API keys.
cmd/cache/cache.go — Caching layer for analysis results; manages TTL and storage to reduce redundant API calls.
cmd/integration/integration.go — Integration activation/deactivation for third-party services; extensibility mechanism for external tool support.

🛠️How to make changes

Add a New AI Provider

Create a new provider file in pkg/ai/ (e.g., pkg/ai/newprovider.go) implementing the AI provider interface with Analyze() method (pkg/ai/newprovider.go)
Register the provider in the auth command to accept credentials via 'k8sgpt auth add' (cmd/auth/add.go)
Add provider type constant and configuration handling in the analyze command (cmd/analyze/analyze.go)
Write unit tests following existing provider patterns (e.g., amazonbedrock_test.go) (pkg/ai/newprovider_test.go)

Add a New Kubernetes Resource Analyzer

Create custom analyzer definition in cmd/customAnalyzer/ that extends the base analyzer interface (cmd/customAnalyzer/add.go)
Store analyzer configuration using the auth/cache storage pattern (cmd/customAnalyzer/customAnalyzer.go)
Update analyze.go to load and apply custom analyzers during resource analysis (cmd/analyze/analyze.go)
Add prompt templates or rules specific to the new resource type (cmd/customAnalyzer/add.go)

Add a New Integration (Third-Party Tool)

Create integration handler in cmd/integration/ with activate/deactivate logic (cmd/integration/integration.go)
Store integration credentials and settings in the auth subsystem (cmd/auth/auth.go)
Add integration activation/deactivation commands (cmd/integration/activate.go)
Integrate result delivery hooks into analyze.go to push findings to the third-party tool (cmd/analyze/analyze.go)

Add a New Command

Create a new subdirectory in cmd/ with command implementation (e.g., cmd/newcmd/newcmd.go) (cmd/newcmd/newcmd.go)
Register the command as a subcommand in cmd/root.go (cmd/root.go)
Use Cobra's standard pattern with Run or RunE funcs for command logic (cmd/newcmd/newcmd.go)
Add unit tests following existing patterns (e.g., cmd/root_test.go) (cmd/newcmd/newcmd_test.go)

🔧Why these technologies

Cobra CLI Framework — Structured command-line parsing with subcommands, flags, and help documentation; widely used in Kubernetes ecosystem (kubectl plugins).
k8s.io/client-go — Official Kubernetes Go client for resource discovery, queries, and event monitoring; enables deep cluster introspection.
Pluggable AI Providers (OpenAI, Azure, Ollama, Bedrock, SageMaker) — No single LLM vendor lock-in; users can swap providers based on cost, latency, or regulatory requirements.
Helm & Kubernetes Manifests — Enables k8sgpt to run as an in-cluster operator or sidecar; native Kubernetes deployment patterns.
Viper Configuration — Multi-format config file support (YAML, JSON, TOML) with environment variable overrides; flexible credential management.
gRPC & Protocol Buffers (buf.build schemas) — Supports high-performance integrations with other tools and future server modes.

⚖️Trade-offs already made

Stateless CLI-first design with optional in-cluster deployment
- Why: Simplifies distribution as a kubectl plugin; users can run locally or cluster-wide without persistent state.
- Consequence: Caching is optional and ephemeral; no built-in multi-user sessions or audit logs by default.
Support multiple AI providers via provider abstraction
- Why: Avoids vendor lock-in and allows cost/latency optimization.
- Consequence: Each new provider requires separate implementation and testing; no unified prompt templating across all models.
Synchronous analysis per command invocation
- Why: Simple UX; users get results immediately.
- Consequence: Long-running analyses (large clusters, slow LLM) block the CLI; no background job queue.
Optional caching layer for analysis results
- Why: Reduces API costs and latency for repeated analyses.
- Consequence: Stale cache can mask recent cluster changes; requires manual cache invalidation or TTL tuning.

🚫Non-goals (don't propose these)

Real-time continuous monitoring of Kubernetes clusters
Multi-tenant SaaS platform with user authentication and billing
Native Kubernetes operator with custom resource definitions
Persistent storage of historical analysis data
Web UI or dashboard (analysis results are CLI-only by default)

🪤Traps & gotchas

LLM API keys must be configured via k8sgpt auth add before analyze runs; kubectl context must be active (uses in-cluster or kubeconfig auth). The project requires write access to kubeconfig during setup. Some analyzers may have memory overhead on very large clusters due to resource enumeration. Model Context Protocol (MCP) is experimental per MCP.md and may have API stability risks.

🏗️Architecture

💡Concepts to learn

Kubernetes Client-Go — k8sgpt uses client-go (v0.32.3) to query cluster state; understanding informers, listers, and discovery APIs is essential for reading analyzer code
Large Language Model (LLM) Provider Abstraction — The project supports multiple LLM backends (OpenAI, Azure, Cohere, Bedrock, Gemini); understanding how pkg/llm abstracts provider differences is key to extending integrations
Kubernetes Resource Analysis Patterns — Analyzers follow a pattern: query API → detect failure states → generate diagnostic summaries; understanding this loop is central to contributing new analyzers
gRPC & Protocol Buffers — Project uses buf.build-generated gRPC code (visible in go.mod buf.build deps); understanding proto message passing is needed for MCP and remote mode debugging
Model Context Protocol (MCP) — k8sgpt implements MCP (per MCP.md and mark3labs/mcp-go dep) for AI model integration; understanding this emerging standard is valuable for future-proofing contributions
Cobra CLI Framework — All CLI commands use Cobra (spf13/cobra); understanding command structure, flags, and subcommand patterns is essential for CLI modifications
Helm Chart Distribution — k8sgpt ships as Helm chart (charts/k8sgpt/); understanding template rendering and values overrides helps with deployment customization

kubernetes/kubernetes — The upstream Kubernetes project k8sgpt diagnoses issues against; understanding core API objects is essential context
open-policy-agent/opa — Complementary tool for Kubernetes policy validation; often used alongside k8sgpt for compliance scanning
FairwindsOps/polaris — Similar Kubernetes auditor/scanner tool that checks best practices; direct alternative for resource validation
AlexsJones/sympozium — Sister project mentioned in README for managing agents in Kubernetes; natural companion for agent-based diagnostics
kubescape/kubescape — Kubernetes security scanner; complementary tool often chained with k8sgpt for comprehensive cluster insights

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add integration tests for cmd/customAnalyzer subcommand

The customAnalyzer feature (cmd/customAnalyzer/) lacks documented integration tests. Given that k8sgpt supports custom analyzers as a core feature for extending functionality, integration tests would validate the add/list/remove workflow and ensure custom analyzer configurations persist correctly across different scenarios. This directly supports the project's goal of 'Giving Kubernetes Superpowers to everyone' by ensuring extensibility works reliably.

[ ] Create pkg/customAnalyzer/customAnalyzer_test.go with test fixtures
[ ] Add integration test in cmd/customAnalyzer/ that tests the full lifecycle: add → list → remove
[ ] Test persistence of custom analyzer configurations using the config/auth storage pattern already in use
[ ] Verify error handling when adding malformed or duplicate analyzers
[ ] Reference: cmd/customAnalyzer/{add,list,remove}.go and existing test patterns in cmd/auth/

Add GitHub Action workflow for security scanning (SBOM & vulnerability checks)

The repo has OpenSSF Best Practices badge and SECURITY.md but lacks an automated SBOM (Software Bill of Materials) generation and dependency vulnerability scanning workflow. Given the security-sensitive nature (AI model integration, Kubernetes access), a dedicated security workflow in .github/workflows/ would improve supply chain security and align with OpenSSF practices. This complements existing golangci_lint.yaml and test.yaml workflows.

[ ] Create .github/workflows/security.yaml with SBOM generation using syft or similar
[ ] Add GitHub Advanced Security scanning or Dependabot alerts validation step
[ ] Generate and commit SBOM to repo root (e.g., sbom.json) on each release
[ ] Reference .github/workflows/release.yaml to understand the release workflow trigger points
[ ] Document SBOM location in SECURITY.md

Add unit tests for cmd/filters subcommand with cache interaction

The filters feature (cmd/filters/{add,list,remove}.go) interacts with the caching system (cmd/cache/), but test coverage for this interaction is not evident from the file structure. This is critical because filters affect which Kubernetes resources are analyzed, making correctness essential. Tests should validate filter persistence, retrieval, and interaction with the cache layer.

[ ] Create cmd/filters/filters_test.go with unit tests for add, list, and remove operations
[ ] Mock the cache layer (cmd/cache/) to test filter persistence scenarios
[ ] Add tests for invalid filter patterns and edge cases (empty filters, special characters)
[ ] Test that filter changes correctly clear or invalidate cache as expected
[ ] Ensure tests follow the pattern already established in cmd/auth/ tests

🌿Good first issues

Add test coverage for cmd/cache command family (currently minimal test files in that directory); good for learning CLI structure and testing patterns
Expand analyzer coverage documentation: pkg/analyzers/ lacks inline examples of how each analyzer detects issues; adding godoc examples to 3-5 analyzers would help contributors understand the pattern
Create integration test examples in Makefile for validating analyzer output against known Kubernetes failure scenarios; currently test.yaml workflow doesn't show concrete analyzer test cases

⭐Top contributors

Click to expand

@renovate[bot] — 29 commits
@github-actions[bot] — 21 commits
@AlexsJones — 18 commits
@three-foxes-in-a-trenchcoat — 7 commits
@umeshkaul — 3 commits

📝Recent commits