k8sgpt-ai/k8sgpt
Giving Kubernetes Superpowers to everyone
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 1d ago
- ✓23+ active contributors
- ✓Distributed ownership (top contributor 29% of recent commits)
Show all 6 evidence items →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/k8sgpt-ai/k8sgpt)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/k8sgpt-ai/k8sgpt on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: k8sgpt-ai/k8sgpt
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/k8sgpt-ai/k8sgpt shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 1d ago
- 23+ active contributors
- Distributed ownership (top contributor 29% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live k8sgpt-ai/k8sgpt
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/k8sgpt-ai/k8sgpt.
What it runs against: a local clone of k8sgpt-ai/k8sgpt — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in k8sgpt-ai/k8sgpt | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of k8sgpt-ai/k8sgpt. If you don't
# have one yet, run these first:
#
# git clone https://github.com/k8sgpt-ai/k8sgpt.git
# cd k8sgpt
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of k8sgpt-ai/k8sgpt and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "k8sgpt-ai/k8sgpt(\\.git)?\\b" \\
&& ok "origin remote is k8sgpt-ai/k8sgpt" \\
|| miss "origin remote is not k8sgpt-ai/k8sgpt (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 4. Critical files exist
test -f "main.go" \\
&& ok "main.go" \\
|| miss "missing critical file: main.go"
test -f "cmd/root.go" \\
&& ok "cmd/root.go" \\
|| miss "missing critical file: cmd/root.go"
test -f "cmd/analyze/analyze.go" \\
&& ok "cmd/analyze/analyze.go" \\
|| miss "missing critical file: cmd/analyze/analyze.go"
test -f "pkg/ai" \\
&& ok "pkg/ai" \\
|| miss "missing critical file: pkg/ai"
test -f "cmd/auth/auth.go" \\
&& ok "cmd/auth/auth.go" \\
|| miss "missing critical file: cmd/auth/auth.go"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/k8sgpt-ai/k8sgpt"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
k8sgpt is a Kubernetes diagnostic tool that scans clusters, identifies issues, and explains them in plain English using AI. It codifies SRE knowledge into specialized analyzers for common Kubernetes problems, then enriches findings with AI backends (OpenAI, Azure, Cohere, Bedrock, Gemini, local models) to provide actionable insights without manual debugging. CLI-driven Go monorepo structured as cmd/ subdirectories (analyze, auth, cache for command groups), with pkg/ containing internal analyzers and LLM client logic. Charts/ contains production-ready Helm deployment templates. The codebase is ~733KB of Go, built around Cobra for CLI framework and Kubernetes client-go for cluster interaction.
👥Who it's for
Platform engineers, SREs, and DevOps teams who manage Kubernetes clusters and need to quickly diagnose cluster issues, resource failures, and misconfigurations. Also useful for newcomers to Kubernetes who lack deep troubleshooting expertise.
🌱Maturity & risk
Production-ready and actively maintained. The project has OpenSSF Best Practices badge, comprehensive CI/CD pipelines (build_container.yaml, golangci_lint.yaml, release.yaml, test.yaml), semantic versioning via release-please, and regular commits. It's available via brew and krew (Kubernetes plugin manager), indicating solid distribution maturity.
Moderate risk: the project has extensive dependencies (AWS SDK v2, Azure SDK, Google Cloud, Cohere, OpenAI) creating a large attack surface; external LLM API keys are required for core functionality, introducing credential management complexity. However, it's backed by an established GitHub organization with governance docs and maintainers list, mitigating single-maintainer risk.
Active areas of work
Active development with semantic PR validation, release-please automation for versioning, and container builds on release. The project has integration documentation (INTEGRATIONS.md), MCP (Model Context Protocol) support, and tracks adopters. Recent focus includes expanding LLM backend support and Kubernetes compatibility.
🚀Get running
git clone https://github.com/k8sgpt-ai/k8sgpt.git
cd k8sgpt
make build
# or install via: brew install k8sgpt-ai/tap/k8sgpt
k8sgpt auth add # Configure LLM backend
k8sgpt analyze # Scan your cluster
Daily commands:
make build # Build binary
make test # Run tests
make lint # Run golangci-lint (see .github/workflows/golangci_lint.yaml)
./k8sgpt analyze # Run analyzer against current kubectl context
🗺️Map of the codebase
main.go— Entry point for k8sgpt CLI application; defines version and initializes the root command.cmd/root.go— Root cobra command configuration; all subcommands (analyze, auth, cache, etc.) are registered here.cmd/analyze/analyze.go— Core analyze command implementation; orchestrates AI analysis of Kubernetes resources using configured providers.pkg/ai— AI provider abstraction layer supporting OpenAI, Azure, Ollama, AWS Bedrock, and SageMaker; critical for extensibility.cmd/auth/auth.go— Authentication management for AI provider credentials; essential for secure configuration of API keys.cmd/cache/cache.go— Caching layer for analysis results; manages TTL and storage to reduce redundant API calls.cmd/integration/integration.go— Integration activation/deactivation for third-party services; extensibility mechanism for external tool support.
🛠️How to make changes
Add a New AI Provider
- Create a new provider file in pkg/ai/ (e.g., pkg/ai/newprovider.go) implementing the AI provider interface with Analyze() method (
pkg/ai/newprovider.go) - Register the provider in the auth command to accept credentials via 'k8sgpt auth add' (
cmd/auth/add.go) - Add provider type constant and configuration handling in the analyze command (
cmd/analyze/analyze.go) - Write unit tests following existing provider patterns (e.g., amazonbedrock_test.go) (
pkg/ai/newprovider_test.go)
Add a New Kubernetes Resource Analyzer
- Create custom analyzer definition in cmd/customAnalyzer/ that extends the base analyzer interface (
cmd/customAnalyzer/add.go) - Store analyzer configuration using the auth/cache storage pattern (
cmd/customAnalyzer/customAnalyzer.go) - Update analyze.go to load and apply custom analyzers during resource analysis (
cmd/analyze/analyze.go) - Add prompt templates or rules specific to the new resource type (
cmd/customAnalyzer/add.go)
Add a New Integration (Third-Party Tool)
- Create integration handler in cmd/integration/ with activate/deactivate logic (
cmd/integration/integration.go) - Store integration credentials and settings in the auth subsystem (
cmd/auth/auth.go) - Add integration activation/deactivation commands (
cmd/integration/activate.go) - Integrate result delivery hooks into analyze.go to push findings to the third-party tool (
cmd/analyze/analyze.go)
Add a New Command
- Create a new subdirectory in cmd/ with command implementation (e.g., cmd/newcmd/newcmd.go) (
cmd/newcmd/newcmd.go) - Register the command as a subcommand in cmd/root.go (
cmd/root.go) - Use Cobra's standard pattern with Run or RunE funcs for command logic (
cmd/newcmd/newcmd.go) - Add unit tests following existing patterns (e.g., cmd/root_test.go) (
cmd/newcmd/newcmd_test.go)
🔧Why these technologies
- Cobra CLI Framework — Structured command-line parsing with subcommands, flags, and help documentation; widely used in Kubernetes ecosystem (kubectl plugins).
- k8s.io/client-go — Official Kubernetes Go client for resource discovery, queries, and event monitoring; enables deep cluster introspection.
- Pluggable AI Providers (OpenAI, Azure, Ollama, Bedrock, SageMaker) — No single LLM vendor lock-in; users can swap providers based on cost, latency, or regulatory requirements.
- Helm & Kubernetes Manifests — Enables k8sgpt to run as an in-cluster operator or sidecar; native Kubernetes deployment patterns.
- Viper Configuration — Multi-format config file support (YAML, JSON, TOML) with environment variable overrides; flexible credential management.
- gRPC & Protocol Buffers (buf.build schemas) — Supports high-performance integrations with other tools and future server modes.
⚖️Trade-offs already made
-
Stateless CLI-first design with optional in-cluster deployment
- Why: Simplifies distribution as a kubectl plugin; users can run locally or cluster-wide without persistent state.
- Consequence: Caching is optional and ephemeral; no built-in multi-user sessions or audit logs by default.
-
Support multiple AI providers via provider abstraction
- Why: Avoids vendor lock-in and allows cost/latency optimization.
- Consequence: Each new provider requires separate implementation and testing; no unified prompt templating across all models.
-
Synchronous analysis per command invocation
- Why: Simple UX; users get results immediately.
- Consequence: Long-running analyses (large clusters, slow LLM) block the CLI; no background job queue.
-
Optional caching layer for analysis results
- Why: Reduces API costs and latency for repeated analyses.
- Consequence: Stale cache can mask recent cluster changes; requires manual cache invalidation or TTL tuning.
🚫Non-goals (don't propose these)
- Real-time continuous monitoring of Kubernetes clusters
- Multi-tenant SaaS platform with user authentication and billing
- Native Kubernetes operator with custom resource definitions
- Persistent storage of historical analysis data
- Web UI or dashboard (analysis results are CLI-only by default)
🪤Traps & gotchas
LLM API keys must be configured via k8sgpt auth add before analyze runs; kubectl context must be active (uses in-cluster or kubeconfig auth). The project requires write access to kubeconfig during setup. Some analyzers may have memory overhead on very large clusters due to resource enumeration. Model Context Protocol (MCP) is experimental per MCP.md and may have API stability risks.
🏗️Architecture
💡Concepts to learn
- Kubernetes Client-Go — k8sgpt uses client-go (v0.32.3) to query cluster state; understanding informers, listers, and discovery APIs is essential for reading analyzer code
- Large Language Model (LLM) Provider Abstraction — The project supports multiple LLM backends (OpenAI, Azure, Cohere, Bedrock, Gemini); understanding how pkg/llm abstracts provider differences is key to extending integrations
- Kubernetes Resource Analysis Patterns — Analyzers follow a pattern: query API → detect failure states → generate diagnostic summaries; understanding this loop is central to contributing new analyzers
- gRPC & Protocol Buffers — Project uses buf.build-generated gRPC code (visible in go.mod buf.build deps); understanding proto message passing is needed for MCP and remote mode debugging
- Model Context Protocol (MCP) — k8sgpt implements MCP (per MCP.md and mark3labs/mcp-go dep) for AI model integration; understanding this emerging standard is valuable for future-proofing contributions
- Cobra CLI Framework — All CLI commands use Cobra (spf13/cobra); understanding command structure, flags, and subcommand patterns is essential for CLI modifications
- Helm Chart Distribution — k8sgpt ships as Helm chart (charts/k8sgpt/); understanding template rendering and values overrides helps with deployment customization
🔗Related repos
kubernetes/kubernetes— The upstream Kubernetes project k8sgpt diagnoses issues against; understanding core API objects is essential contextopen-policy-agent/opa— Complementary tool for Kubernetes policy validation; often used alongside k8sgpt for compliance scanningFairwindsOps/polaris— Similar Kubernetes auditor/scanner tool that checks best practices; direct alternative for resource validationAlexsJones/sympozium— Sister project mentioned in README for managing agents in Kubernetes; natural companion for agent-based diagnosticskubescape/kubescape— Kubernetes security scanner; complementary tool often chained with k8sgpt for comprehensive cluster insights
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration tests for cmd/customAnalyzer subcommand
The customAnalyzer feature (cmd/customAnalyzer/) lacks documented integration tests. Given that k8sgpt supports custom analyzers as a core feature for extending functionality, integration tests would validate the add/list/remove workflow and ensure custom analyzer configurations persist correctly across different scenarios. This directly supports the project's goal of 'Giving Kubernetes Superpowers to everyone' by ensuring extensibility works reliably.
- [ ] Create pkg/customAnalyzer/customAnalyzer_test.go with test fixtures
- [ ] Add integration test in cmd/customAnalyzer/ that tests the full lifecycle: add → list → remove
- [ ] Test persistence of custom analyzer configurations using the config/auth storage pattern already in use
- [ ] Verify error handling when adding malformed or duplicate analyzers
- [ ] Reference: cmd/customAnalyzer/{add,list,remove}.go and existing test patterns in cmd/auth/
Add GitHub Action workflow for security scanning (SBOM & vulnerability checks)
The repo has OpenSSF Best Practices badge and SECURITY.md but lacks an automated SBOM (Software Bill of Materials) generation and dependency vulnerability scanning workflow. Given the security-sensitive nature (AI model integration, Kubernetes access), a dedicated security workflow in .github/workflows/ would improve supply chain security and align with OpenSSF practices. This complements existing golangci_lint.yaml and test.yaml workflows.
- [ ] Create .github/workflows/security.yaml with SBOM generation using syft or similar
- [ ] Add GitHub Advanced Security scanning or Dependabot alerts validation step
- [ ] Generate and commit SBOM to repo root (e.g., sbom.json) on each release
- [ ] Reference .github/workflows/release.yaml to understand the release workflow trigger points
- [ ] Document SBOM location in SECURITY.md
Add unit tests for cmd/filters subcommand with cache interaction
The filters feature (cmd/filters/{add,list,remove}.go) interacts with the caching system (cmd/cache/), but test coverage for this interaction is not evident from the file structure. This is critical because filters affect which Kubernetes resources are analyzed, making correctness essential. Tests should validate filter persistence, retrieval, and interaction with the cache layer.
- [ ] Create cmd/filters/filters_test.go with unit tests for add, list, and remove operations
- [ ] Mock the cache layer (cmd/cache/) to test filter persistence scenarios
- [ ] Add tests for invalid filter patterns and edge cases (empty filters, special characters)
- [ ] Test that filter changes correctly clear or invalidate cache as expected
- [ ] Ensure tests follow the pattern already established in cmd/auth/ tests
🌿Good first issues
- Add test coverage for cmd/cache command family (currently minimal test files in that directory); good for learning CLI structure and testing patterns
- Expand analyzer coverage documentation: pkg/analyzers/ lacks inline examples of how each analyzer detects issues; adding godoc examples to 3-5 analyzers would help contributors understand the pattern
- Create integration test examples in Makefile for validating analyzer output against known Kubernetes failure scenarios; currently test.yaml workflow doesn't show concrete analyzer test cases
⭐Top contributors
Click to expand
Top contributors
- @renovate[bot] — 29 commits
- @github-actions[bot] — 21 commits
- @AlexsJones — 18 commits
- @three-foxes-in-a-trenchcoat — 7 commits
- @umeshkaul — 3 commits
📝Recent commits
Click to expand
Recent commits
30463af— docs: remove FOSSA license badge and check (#1644) (three-foxes-in-a-trenchcoat)c82da09— docs: prepare governance docs for CNCF incubation (#1642) (three-foxes-in-a-trenchcoat)dfa21ea— chore(main): release 0.4.32 (#1628) (github-actions[bot])c87a31a— fix: amazonbedrockconverse claude models temp and topp (#1629) (CradleKing24)ac329d1— feat: add daemonset analyzer and special cases for pod and job (#1636) (doppoluv)28fe196— feat: add Azure API Type Support and add Custom HTTP Header (#1638) (lawrencelo8)ca0d3eb— fix: improve ConfigMap usage detection for sidecar patterns (#1602) (squatboy)6ba8fb2— fix: recognize GKE built-in ingress classes 'gce' and 'gce-internal' (#1599) (majiayu000)97fbf04— fix(deps): update module google.golang.org/grpc to v1.79.3 [security] (#1626) (renovate[bot])74b1ee1— chore(main): release 0.4.31 (#1619) (github-actions[bot])
🔒Security observations
- High · Outdated Go Toolchain Version —
go.mod. The project specifies Go 1.24.1 with toolchain go1.24.11, but uses go 1.24.1 as the module version. This creates a potential version mismatch issue. More critically, the go.mod file should be kept up-to-date with the latest stable Go release to ensure security patches are applied. Fix: Update to the latest stable Go version and ensure toolchain is synchronized with the go directive version. Monitor Go security advisories regularly. - High · Vulnerable AWS SDK Dependency —
go.mod - github.com/aws/aws-sdk-go v1.55.7. The dependency 'github.com/aws/aws-sdk-go v1.55.7' uses the older AWS SDK v1. This version may contain unpatched security vulnerabilities. The project already includes aws-sdk-go-v2 but still maintains the legacy v1 dependency. Fix: Remove aws-sdk-go v1 dependency and migrate all code to use aws-sdk-go-v2 exclusively. Audit all AWS API calls for secure credential handling. - High · Third-party AI Model Provider Dependencies Without Version Pinning —
go.mod - Multiple AI provider packages. Multiple AI provider dependencies (Ollama, OpenAI, Cohere, Google Generative AI, Bedrock, Watson) are specified without explicit version constraints in critical code paths. These external service integrations could introduce supply chain risks. Fix: Implement strict version pinning for all external AI provider SDKs. Add Software Bill of Materials (SBOM) generation. Perform regular security audits of third-party dependencies. - Medium · Helm Client Usage Without Security Validation —
go.mod - github.com/mittwald/go-helm-client v0.12.14. The project uses 'github.com/mittwald/go-helm-client v0.12.14' which handles Helm chart deployments. Helm charts could be exploited if source validation is not properly implemented. Fix: Implement Helm chart signature verification. Validate all chart sources. Use OCI registries for chart distribution. Implement role-based access controls for chart deployments. - Medium · gRPC and Protocol Buffer Dependencies from Custom Sources —
go.mod - buf.build/gen/go/k8sgpt-ai/k8sgpt/* packages. Custom buf.build generated dependencies (buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go and related) may bypass standard security review processes. These are auto-generated from protobuf definitions without visible source control. Fix: Ensure buf.build configurations are version controlled and reviewed. Implement commit signing for protobuf schema changes. Document gRPC service security boundaries. - Medium · Test-Only Dependency 'gomonkey' in Production —
go.mod - github.com/agiledragon/gomonkey/v2 v2.13.0. The package 'github.com/agiledragon/gomonkey/v2 v2.13.0' (a monkeypatching library) is included in go.mod. If this is used outside of tests, it could allow runtime behavior manipulation and security bypasses. Fix: Verify gomonkey is only used in test files (*_test.go). Consider using build tags to exclude from production builds. Audit code for any runtime patching outside tests. - Medium · Kubernetes Client Without RBAC Documentation —
go.mod - k8s.io/client-go v0.32.3, charts/k8sgpt/templates/. The project uses k8s.io/client-go v0.32.3 for Kubernetes API access. The file structure shows role.yaml and rolebinding.yaml templates, but RBAC principle of least privilege is not documented in the security context provided. Fix: Document minimum required RBAC permissions in SECURITY.md. Implement pod security policies. Use service account tokens with minimal scope. Add RBAC audit logging. - Medium · Cloud Storage Credentials Risk —
undefined. Dependencies for cloud storage (cloud.google.com/go/storage, Azure SDK, AWS SDK) are included but no evidence of Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.