RepoPilotOpen in app →

chaosblade-io/chaosblade

An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 7w ago
  • 18 active contributors
  • Distributed ownership (top contributor 34% of recent commits)
Show all 6 evidence items →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/chaosblade-io/chaosblade)](https://repopilot.app/r/chaosblade-io/chaosblade)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/chaosblade-io/chaosblade on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: chaosblade-io/chaosblade

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/chaosblade-io/chaosblade shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 7w ago
  • 18 active contributors
  • Distributed ownership (top contributor 34% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live chaosblade-io/chaosblade repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/chaosblade-io/chaosblade.

What it runs against: a local clone of chaosblade-io/chaosblade — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in chaosblade-io/chaosblade | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 79 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>chaosblade-io/chaosblade</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of chaosblade-io/chaosblade. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/chaosblade-io/chaosblade.git
#   cd chaosblade
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of chaosblade-io/chaosblade and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "chaosblade-io/chaosblade(\\.git)?\\b" \\
  && ok "origin remote is chaosblade-io/chaosblade" \\
  || miss "origin remote is not chaosblade-io/chaosblade (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "cli/main.go" \\
  && ok "cli/main.go" \\
  || miss "missing critical file: cli/main.go"
test -f "cli/cmd/cli.go" \\
  && ok "cli/cmd/cli.go" \\
  || miss "missing critical file: cli/cmd/cli.go"
test -f "exec/os/executor.go" \\
  && ok "exec/os/executor.go" \\
  || miss "missing critical file: exec/os/executor.go"
test -f "exec/jvm/executor.go" \\
  && ok "exec/jvm/executor.go" \\
  || miss "missing critical file: exec/jvm/executor.go"
test -f "exec/kubernetes/executor.go" \\
  && ok "exec/kubernetes/executor.go" \\
  || miss "missing critical file: exec/kubernetes/executor.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 79 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~49d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/chaosblade-io/chaosblade"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

ChaosBlade is Alibaba's chaos engineering toolkit that injects controlled failures (CPU, memory, network, disk, process faults) into distributed systems at scale. It supports OS-level experiments, Java/JVM injection, C++ code instrumentation, Docker containers, and Kubernetes workloads via a unified CLI, enabling enterprises to test system resilience without manual fault simulation. Monorepo structure with cli/cmd/ containing the command-line interface (create, destroy, prepare, check commands in separate .go files), build/ directory for Docker images and build specifications (build/spec/spec.go), and external exec-* modules loaded via dependencies. The CLI layer (cli/cmd/cli.go, command.go) orchestrates experiment lifecycle while delegating actual chaos injection to specialized executor packages for OS, containers, middleware, and cloud platforms.

👥Who it's for

SREs, platform engineers, and DevOps teams running production Kubernetes clusters or distributed systems who need to validate fault tolerance and disaster recovery procedures. Contributors are primarily chaos engineering practitioners and Alibaba Group maintainers extending experiment scenarios.

🌱Maturity & risk

Production-ready and actively maintained. The project shows significant maturity with CI/CD pipelines (GitHub Actions in .github/workflows/), comprehensive Dockerfile builds for multiple architectures (ARM, musl, UPX), structured code organization (cli/, exec modules), and Makefile-driven builds. Last activity visible in the module version v1.8.0 across dependent packages suggests ongoing development.

Moderate risk: the toolkit depends on 16+ external chaosblade-io organization packages (chaosblade-exec-os, chaosblade-exec-cloud, chaosblade-operator, etc.) that are outside this repo, creating coupling to external release cycles. The requirement for Kubernetes client libraries (k8s.io/client-go v12.0.0+incompatible indicates an older pinned version) and Go 1.25 may introduce compatibility issues. Single points of failure in the exec-* package ecosystem could affect multiple failure injection types.

Active areas of work

Version 1.8.0 is current across all dependent packages. Active maintenance is evidenced by structured CI/CD (ci.yml, release.yml workflows), linting rules (.golangci.yml, .markdownlint.json, .yamllint.yml), and documentation governance (CODE_OF_CONDUCT.md, CONTRIBUTING.md, MAINTAINERS.md). The repo includes build targets for multiple architectures (ARM, musl libc variants) suggesting active deployment across heterogeneous environments.

🚀Get running

git clone https://github.com/chaosblade-io/chaosblade.git
cd chaosblade
make

The Makefile orchestrates the Go build system. Use make build for local binary, make docker-build for containerized distribution, or make install to place the blade binary in your PATH.

Daily commands:

make
# Binary at ./blade or installed via 'make install'
./blade -h
./blade prepare os  # Prepare environment
./blade create os process kill --process mysql  # Inject fault
./blade destroy <uid>  # Clean up

For Kubernetes: chaosblade-operator must be deployed separately; interact via CRDs or blade CLI pointing to kubeconfig.

🗺️Map of the codebase

  • cli/main.go — Entry point for the ChaosBlade CLI tool; all command-line invocations start here.
  • cli/cmd/cli.go — Core CLI framework that parses and routes all experiment commands (create, destroy, query, etc.).
  • exec/os/executor.go — OS-level executor that injects chaos experiments at the system level; fundamental to most Linux-based attacks.
  • exec/jvm/executor.go — JVM executor for Java application chaos injection; handles sandbox integration and class instrumentation.
  • exec/kubernetes/executor.go — Kubernetes executor for container orchestration chaos; integrates with K8s API and CRI.
  • data/experiment.go — Data model for chaos experiments; defines the core structure used throughout the system.
  • go.mod — Module dependencies including exec-cloud, exec-cri, and other critical executor plugins.

🛠️How to make changes

Add a New OS-Level Chaos Experiment

  1. Define the experiment model and flags in a new file under a subdirectory matching the experiment type (e.g., exec/os/kill_process.go for process killing). (exec/os/executor.go)
  2. Implement the Executor interface (Start, Stop, Destroy methods) in your new experiment file. (exec/os/executor.go)
  3. Register the new experiment type in the OS executor's factory method so the CLI can invoke it. (exec/os/executor.go)
  4. Add CLI command handler (e.g., cli/cmd/kill_process.go) that parses flags and calls the executor. (cli/cmd/create.go)
  5. Update cli/cmd/cli.go to wire the new command into the root command hierarchy. (cli/cmd/cli.go)

Add a New Kubernetes Chaos Target

  1. Define the Kubernetes resource spec (pod, node, network policy) in a new file or extend exec/kubernetes/spec.go. (exec/kubernetes/spec.go)
  2. Implement a targeted executor or extend exec/kubernetes/executor.go to handle the new resource type. (exec/kubernetes/executor.go)
  3. Add resource discovery/query logic in cli/cmd/query_k8s.go to expose available targets. (cli/cmd/query_k8s.go)
  4. Create a CLI handler (e.g., cli/cmd/k8s_pod_kill.go) that orchestrates the experiment creation. (cli/cmd/create.go)
  5. Register the new command in cli/cmd/cli.go under the Kubernetes experiment group. (cli/cmd/cli.go)

Add a New JVM Chaos Experiment

  1. Define the bytecode manipulation rules and sandbox configuration in a new file (e.g., exec/jvm/memory_fill.go). (exec/jvm/executor.go)
  2. Integrate with exec/jvm/sandbox.go to generate the correct sandbox configuration for bytecode weaving. (exec/jvm/sandbox.go)
  3. Implement the JVM executor's Start and Stop methods to manage the sandbox agent lifecycle. (exec/jvm/executor.go)
  4. Create a CLI command handler in cli/cmd that parses JVM-specific flags (e.g., pid, class filters). (cli/cmd/create.go)
  5. Wire the command into cli/cmd/cli.go and ensure prepare_jvm.go prerequisites are triggered if needed. (cli/cmd/cli.go)

Add a New Experiment Query Type

  1. Create a new file in cli/cmd/ (e.g., query_memory.go) that discovers available memory-related resources. (cli/cmd/query.go)
  2. Implement the query command handler to fetch and format resource information (processes, services, configs, etc.). (cli/cmd/query.go)
  3. Register the new query command in cli/cmd/cli.go so it appears in the query subcommand group. (cli/cmd/cli.go)

🔧Why these technologies

  • Go/Golang — Cross-platform, statically compiled binary; minimal runtime dependencies; excellent for low-level system interaction (syscalls, networking) and container orchestration APIs.
  • Docker/OCI containers — Enables reproducible distributed chaos experiments and simplifies dependency management for different target environments (musl, ARM, etc.).
  • Kubernetes API client — Native K8s integration for orchestrating chaos at pod, node, and network levels; enables dynamic discovery of targets.
  • JVM Sandbox/bytecode instrumentation — Allows non-invasive injection of chaos into running Java applications without redeployment; operates at class/method instrumentation level.
  • CLI framework (Cobra-like) — Standard for Go CLI tools; hierarchical command structure matches the natural nesting of experiment types (blade create os cpu, blade create jvm memory, etc.).

⚖️Trade-offs already made

  • Single statically-compiled binary instead of agent-based architecture
    • Why: undefined
    • Consequence: undefined

🪤Traps & gotchas

The toolkit requires elevated privileges (root/sudo) to inject OS-level faults; test environments must account for this. Kubernetes integration depends on the external chaosblade-operator being deployed and accessible; local development without k8s-setup will skip cloud-native scenarios. The prepare commands are idempotent but install language-specific tools; ensure no conflicting package managers. SQLite persistence (glebarez/sqlite) stores experiment state; cleanup failure leaves orphaned DB records that may interfere with subsequent experiments. Go 1.25 is required; older Go versions will fail module resolution. The -x flags and debug mode require careful handling to avoid logging sensitive payloads.

🏗️Architecture

💡Concepts to learn

  • Chaos Engineering Model — ChaosBlade strictly follows the chaos experimental model (steady-state → hypothesis → failure injection → observation → recovery) to ensure valid fault testing; understanding this framework is core to using the toolkit correctly
  • eBPF (Extended Berkeley Packet Filter) — Visible in dependency cilium/ebpf for low-level system introspection and fault injection without kernel modifications; enables non-disruptive chaos scenarios
  • CRI (Container Runtime Interface) — chaosblade-exec-cri package abstracts over Docker, containerd, and CRI-O via the Kubernetes CRI; essential for cross-container-runtime compatibility
  • Linux cgroups — Underlying mechanism for OS resource chaos (CPU/memory limits, process killing) on Linux; ChaosBlade manipulates cgroup hierarchies for injection
  • Spec-based Declarative Injection — ChaosBlade separates experiment specification (YAML/CLI flags) from execution; the spec is validated before injection, enabling portability and repeatability across environments
  • Instrumentation (Java/C++) — For application-level chaos, ChaosBlade uses bytecode instrumentation (Java via javaagent) and source-level tampering (C++ via compiler hooks) to inject failures in arbitrary methods without code changes
  • State Persistence via SQLite — The toolkit maintains experiment state in a local SQLite DB (glebarez/sqlite) to track active faults, enable cleanup, and prevent duplicate injections; understanding DB schema is critical for debugging
  • chaosblade-io/chaosblade-exec-os — Executor module for OS-level chaos injection (CPU, memory, network, disk, process faults) that chaosblade CLI delegates to
  • chaosblade-io/chaosblade-operator — Kubernetes operator enabling Chaosblade experiments via CRDs on K8s clusters; required for cloud-native chaos engineering
  • chaosblade-io/chaosblade-spec-go — Go library defining chaos experiment specifications and validation rules that all executor modules depend on
  • gremlin/gremlin-python — Alternative chaos engineering platform with similar OS/container/cloud abstractions but closed-source; comparable use cases
  • powerfulseal/powerfulseal — Open-source Kubernetes chaos tool focused on pod and network failures; companion to ChaosBlade for cloud-native testing

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for CLI command workflows

The cli/cmd directory has many command implementations (create.go, destroy.go, prepare.go, query.go, etc.) but only a few have corresponding _test.go files (cli_test.go, command_test.go, destroy_test.go, prepare_jvm_test.go, query_disk_test.go). Critical paths like prepare.go, query.go, revoke.go, and server.go lack test coverage. This is valuable for chaos engineering tools where command reliability is safety-critical.

  • [ ] Create cli/cmd/prepare_test.go with tests for prepare.go command flows
  • [ ] Create cli/cmd/query_test.go with tests for query.go command flows and different experiment types
  • [ ] Create cli/cmd/revoke_test.go with tests for revoke.go cleanup operations
  • [ ] Create cli/cmd/server_test.go with tests for server.go lifecycle commands (start/stop/status)
  • [ ] Ensure tests cover both success and failure scenarios for chaos operations

Add e2e test workflow in GitHub Actions for chaos experiment execution

The .github/workflows directory only has ci.yml and release.yml, but lacks end-to-end tests that verify actual chaos experiments execute correctly. Given this is a chaos engineering toolkit with OS, middleware, cloud, and CRI executors, an e2e workflow would validate that the integrated system works across different targets (similar to how query_disk_test.go and query_network_test.go exist but no execution tests).

  • [ ] Create .github/workflows/e2e-test.yml that spins up test environments
  • [ ] Add tests for basic OS-level experiments (process kill, network delay) using chaosblade-exec-os
  • [ ] Add tests for container/CRI experiments using chaosblade-exec-cri
  • [ ] Configure workflow to run on pull requests and specific paths (cli/, build/)
  • [ ] Document test results in the workflow summary

Create comprehensive documentation for adding new experiment types

While BUILD.md and CONTRIBUTING.md exist, there's no specific developer guide for extending ChaosBlade with new experiments. The modular structure (chaosblade-exec-cloud, chaosblade-exec-middleware, chaosblade-exec-os, chaosblade-spec-go) suggests a plugin architecture, but the documentation gap makes it hard for contributors to add support for new target types or experiment scenarios.

  • [ ] Create docs/DEVELOPING_NEW_EXPERIMENTS.md explaining the executor pattern
  • [ ] Document how to implement new experiment specs using chaosblade-spec-go
  • [ ] Add a walkthrough example for creating a simple new experiment (e.g., new process chaos type)
  • [ ] Document how new executors integrate with the CLI layer (cli/cmd/exp.go patterns)
  • [ ] Include references to existing executor implementations as examples (chaosblade-exec-os, chaosblade-exec-middleware)

🌿Good first issues

  • Add integration tests for the check_os.go command covering all system checks (CPU, memory, disk, process); currently only check_java.go and check.go have test files, leaving OS validation untested.
  • Extend BUILD.md and BUILD_ZH.md with architecture-specific build instructions for ARM and musl variants (Dockerfiles exist in build/image/ but cross-compilation docs are minimal).
  • Create examples/ directory with runnable chaos scenarios (e.g., scripts for killing a process, injecting network latency, and recovering) to complement the CLI documentation.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 3f698b8 — fix: Dockerfile to reduce vulnerabilities (#1272) (xcaspar)
  • 22213f4 — fix: Dockerfile to reduce vulnerabilities (#1269) (xcaspar)
  • dcebc57 — fix: build/image/musl/Dockerfile to reduce vulnerabilities (#1252) (xcaspar)
  • 234726d — build(deps): bump github.com/opencontainers/selinux (#1259) (dependabot[bot])
  • c14bad3 — build(deps): bump golang.org/x/crypto from 0.36.0 to 0.45.0 (#1256) (dependabot[bot])
  • 199f03e — chore: update maintainer role for Spencer Cai to Individual (#1257) (spencercjh)
  • a414816 — build(deps): bump github.com/opencontainers/runc from 1.0.2 to 1.2.8 (#1249) (dependabot[bot])
  • 51698b4 — chore: Introduce missing unified open source license header and toolchain for chaosblade CLI (#1244) (spencercjh)
  • 7dd785f — chore: update version to v1.8.0 (#1227) (xcaspar)
  • e2cb4f8 — build(deps): bump golang.org/x/net from 0.1.0 to 0.38.0 (#1151) (dependabot[bot])

🔒Security observations

  • High · Outdated Go Version in Dockerfile — Dockerfile (line with FROM golang:1.20.5). The Dockerfile uses golang:1.20.5 as the base image, which is outdated and may contain known security vulnerabilities. Go 1.20.5 was released in June 2023 and is no longer actively maintained. The go.mod file specifies go 1.25, creating a version mismatch between the build environment and project requirements. Fix: Update the Dockerfile to use a newer, actively maintained Go version that aligns with go.mod requirements (Go 1.25 or latest stable). Example: FROM golang:1.25-alpine
  • High · Insecure HTTP in Dockerfile — Dockerfile (wget command). The Dockerfile contains an HTTP request (wget http://www.musl-libc.org/releases/musl-${MUSL_VERSION}.tar.g) without HTTPS, exposing the build process to man-in-the-middle attacks and potential supply chain compromise. The line is incomplete but demonstrates an insecure pattern. Fix: Use HTTPS instead of HTTP for all remote resource downloads. Verify integrity with checksums or signatures. Example: wget https://www.musl-libc.org/releases/musl-${MUSL_VERSION}.tar.gz && echo 'checksum' | sha256sum -c -
  • High · Use of Incompatible Kubernetes Client Version — go.mod (k8s.io/client-go v12.0.0+incompatible). The project uses k8s.io/client-go v12.0.0+incompatible, which is a very old version (released ~2018) and marked as incompatible. This version likely contains numerous known security vulnerabilities and compatibility issues with modern Kubernetes clusters. Fix: Update to a supported version of k8s.io/client-go that matches the k8s.io/apimachinery version (0.20.6). Consider upgrading to a more recent version like v0.28.0 or later for better security and feature support.
  • Medium · Outdated Kubernetes Apimachinery Version — go.mod (k8s.io/apimachinery v0.20.6). k8s.io/apimachinery v0.20.6 is outdated (released in 2021) and may contain unpatched security vulnerabilities. Using old Kubernetes dependencies can expose the system to known exploits. Fix: Upgrade k8s.io/apimachinery to version 0.28.0 or later (2024+) to receive security patches and improvements. Ensure compatibility with other k8s dependencies.
  • Medium · Outdated gopsutil Dependency — go.mod (github.com/shirou/gopsutil v3.21.11+incompatible). github.com/shirou/gopsutil v3.21.11+incompatible is marked as incompatible and is from 2021. This package handles system information gathering and may have security implications if vulnerabilities exist in system call handling. Fix: Update to a compatible, newer version of gopsutil. The current version 5.x or latest v3.x versions are recommended.
  • Medium · Missing Input Validation in CLI Module — cli/cmd/ (create.go, query.go, destroy.go, etc.). The CLI module (cli/cmd/create.go, cli/cmd/query.go, etc.) may process user input for chaos experiments without visible input validation. Given that this is a chaos engineering tool that modifies system state, insufficient input validation could lead to command injection or unintended system modifications. Fix: Implement comprehensive input validation for all CLI parameters. Sanitize and validate experiment specifications, parameters, and configuration inputs. Use allowlists where possible for experiment types and targets.
  • Medium · Potential SQL Injection in SQLite Usage — data/ module (experiment.go, preparation.go, source.go). The project uses github.com/glebarez/sqlite v1.11.0 for data persistence (data/experiment.go, data/preparation.go). Without visible ORM usage or parameterized queries, there is a risk of SQL injection if experiment data or queries are constructed dynamically. Fix: Ensure all database queries use parameterized queries

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · chaosblade-io/chaosblade — RepoPilot