RepoPilotOpen in app →

Shopify/toxiproxy

:alarm_clock: :fire: A TCP proxy to simulate network and system conditions for chaos and resiliency testing

Healthy

Healthy across all four use cases

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 3d ago
  • 10 active contributors
  • MIT licensed
Show all 6 evidence items →
  • CI configured
  • Tests present
  • Single-maintainer risk — top contributor 88% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/shopify/toxiproxy)](https://repopilot.app/r/shopify/toxiproxy)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/shopify/toxiproxy on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: Shopify/toxiproxy

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/Shopify/toxiproxy shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • Last commit 3d ago
  • 10 active contributors
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Single-maintainer risk — top contributor 88% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live Shopify/toxiproxy repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/Shopify/toxiproxy.

What it runs against: a local clone of Shopify/toxiproxy — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in Shopify/toxiproxy | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 33 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>Shopify/toxiproxy</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of Shopify/toxiproxy. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/Shopify/toxiproxy.git
#   cd toxiproxy
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of Shopify/toxiproxy and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "Shopify/toxiproxy(\\.git)?\\b" \\
  && ok "origin remote is Shopify/toxiproxy" \\
  || miss "origin remote is not Shopify/toxiproxy (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "api.go" \\
  && ok "api.go" \\
  || miss "missing critical file: api.go"
test -f "proxy.go" \\
  && ok "proxy.go" \\
  || miss "missing critical file: proxy.go"
test -f "toxic_collection.go" \\
  && ok "toxic_collection.go" \\
  || miss "missing critical file: toxic_collection.go"
test -f "stream/io_chan.go" \\
  && ok "stream/io_chan.go" \\
  || miss "missing critical file: stream/io_chan.go"
test -f "cmd/server/server.go" \\
  && ok "cmd/server/server.go" \\
  || miss "missing critical file: cmd/server/server.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 33 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~3d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/Shopify/toxiproxy"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Toxiproxy is a TCP proxy written in Go that injects network chaos (latency, bandwidth limits, connection drops, timeouts) into live traffic for resilience testing. It exposes an HTTP API to dynamically configure 'toxics' (failure conditions) on proxied connections, letting you test application behavior under degraded network conditions without modifying your application code. Monolithic Go binary with modular structure: core proxy logic in proxy.go/link.go/api.go, HTTP API handlers in api.go with client SDK under client/, CLI wrapper in cmd/server/ and cmd/cli/, metrics collectors in collectors/, and examples in _examples/toxics and _examples/tests. State is live in Proxy structs managed by ProxyCollection; toxics are applied per-link.

👥Who it's for

QA engineers and backend developers at organizations like Shopify who need to validate that their microservices and distributed systems gracefully handle network failures, latency, and partial outages during testing, CI/CD, and development without requiring mocks or code changes.

🌱Maturity & risk

Production-ready and actively maintained. Shopify has used it in all development and test environments since October 2014. The codebase shows sophisticated patterns (collectors/proxy.go, metrics.go), comprehensive test coverage (api_test.go, link_test.go), robust CI with multiple workflows (.github/workflows/), and recent dependency updates via dependabot.yml. Go version pinned to 1.23.0 with goreleaser and helm-compatible Docker builds.

Low risk for maintained code. Dependency footprint is lean (gorilla/mux, prometheus, zerolog, urfave/cli, tomb). Single organization maintainer (Shopify) but the tool is stable and well-tested. Monitor the CHANGELOG.md for breaking changes between v2.x releases. No obvious red flags in the release.yml or test workflows.

Active areas of work

Active maintenance with recent updates to dependencies (dependabot.yml), linting setup (.golangci.yml), and documented workflows for testing (test.yml) and releases (release.yml). CREATING_TOXICS.md and METRICS.md suggest ongoing feature additions. The example test suite (_examples/tests/) indicates active development of test harness patterns.

🚀Get running

git clone https://github.com/Shopify/toxiproxy.git
cd toxiproxy
make test    # Run tests
make build   # Build binary
./toxiproxy  # Start proxy server (listens on :8474 for API, proxies on configured addresses)

Daily commands:

# Start server (listening for HTTP API on port 8474)
go run ./cmd/server/server.go

# Or use Makefile
make run

# Run tests
make test

# Run with config file (populate proxies via YAML)
toxiproxy -config ./dev.yml

🗺️Map of the codebase

  • api.go — Main HTTP API handler for proxy control—the entry point for all client requests to manage proxies and toxics.
  • proxy.go — Core proxy abstraction that manages TCP connections, streams, and toxic application—the heart of the system.
  • toxic_collection.go — Registry and lifecycle manager for all toxics applied to a proxy—critical for understanding how network conditions are injected.
  • stream/io_chan.go — Bidirectional I/O abstraction for proxied traffic—handles the actual data flow between upstream and downstream.
  • cmd/server/server.go — Server bootstrap and initialization—shows how the proxy daemon starts and configures the API listener.
  • toxics/toxic.go — Toxic interface definition and base implementation—all custom toxics must implement this contract.
  • client/client.go — Go client library for communicating with Toxiproxy HTTP API—reference implementation for client integrations.

🧩Components & responsibilities

  • ApiServer (api.go) (Gorilla Mux, JSON) — HTTP REST API for CRUD operations on proxies and toxics; request validation and response serialization
    • Failure mode: Invalid API requests return 400/500 errors; invalid proxy/toxic names silently rejected

🛠️How to make changes

Add a New Built-in Toxic

  1. Create the toxic struct with config fields in toxics/ directory, implementing the Toxic interface from toxics/toxic.go (toxics/my_toxic.go)
  2. Implement Compile() and Interrupt() methods on your toxic struct (toxics/my_toxic.go)
  3. Register the toxic in toxic_collection.go by adding a case to the NewToxic() function with a unique name (toxic_collection.go)
  4. Add unit tests following existing patterns (e.g., bandwidth_test.go) to toxics/my_toxic_test.go (toxics/my_toxic_test.go)

Add a New HTTP API Endpoint

  1. Define a handler func in api.go following the pattern func (server *ApiServer) myHandler(w http.ResponseWriter, r *http.Request) (api.go)
  2. Register the route in the Mux setup within api.go using r.HandleFunc() (api.go)
  3. Parse request body using json.NewDecoder() and write responses with json.NewEncoder() (api.go)
  4. Add corresponding test cases in api_test.go following existing HTTP test patterns (api_test.go)

Add a New Client Method

  1. Create a method on ProxyClient in client/client.go that wraps an HTTP call using c.httpClient.Do() (client/client.go)
  2. Add request body marshaling and response body unmarshaling with error handling (client/client.go)
  3. Write test cases in client/client_test.go using httptest.Server mock (client/client_test.go)

Monitor a Custom Metric

  1. Create a prometheus.Collector in metrics.go or a new file under collectors/ (metrics.go)
  2. Register the collector in the RegisterCollectors() function and ensure Prometheus scrapes it via /metrics endpoint (metrics.go)
  3. Reference METRICS.md to document the new metric for users (METRICS.md)

🔧Why these technologies

  • Go — High-performance concurrent network I/O, minimal resource overhead required for a transparent proxy that must handle many simultaneous connections
  • HTTP API with Gorilla Mux — Simple REST interface for test clients to dynamically control proxy behavior without restart; language-agnostic control plane
  • Prometheus metrics — Standard observability integration; enables monitoring proxy health and connection statistics in production CI/test environments
  • zerolog — Structured, zero-allocation JSON logging for debugging proxy behavior without impacting throughput
  • tomb (goroutine lifecycle) — Graceful shutdown and goroutine cleanup for long-lived proxy and toxic processing routines

⚖️Trade-offs already made

  • HTTP API for control, direct TCP for data path

    • Why: Separate concerns: low-overhead data proxying vs. convenient management interface
    • Consequence: Requires two listeners but minimal control-path overhead on data throughput
  • Proxies are stateful, in-process objects (not distributed)

    • Why: Simplicity and determinism for testing; toxics must apply consistently to the same connection
    • Consequence: Cannot scale across multiple Toxiproxy instances for a single proxy; each proxy is single-machine bound
  • Toxics are pluggable at the link (bidirectional) level

    • Why: Reusable toxic logic for both upstream and downstream directions without duplication
    • Consequence: Some toxics (e.g., latency) apply symmetrically; asymmetric scenarios require multiple proxies
  • In-memory proxy registry; no persistence

    • Why: Toxiproxy is ephemeral for testing; restart clears all configuration
    • Consequence: Test harnesses must re-create proxies on each run; no disaster recovery of live proxies

🚫Non-goals (don't propose these)

  • Does not provide TLS/SSL termination or encryption
  • Does not handle authentication or authorization of API clients
  • Does not persist proxy/toxic configuration across restarts
  • Does not route or load-balance across multiple upstream targets
  • Does not support UDP or non-TCP protocols

🪤Traps & gotchas

No required environment variables, but Toxiproxy must be running before tests can apply toxics. The proxy API defaults to localhost:8474 — change via flags in cmd/server/server.go. Config files use YAML (see dev.yml, _examples/tests/cluster.yml) not JSON. The Link.handleConnection() method is CPU-intensive per connection — performance degrades with many concurrent connections. Toxics are applied per-direction (upstream/downstream); order matters when stacking multiple toxics on one proxy.

🏗️Architecture

💡Concepts to learn

  • TCP Proxy with Man-in-the-Middle Injection — Understanding how Toxiproxy intercepts, buffers, and modifies TCP streams without protocol knowledge (link.go's handleConnection loop) is key to adding new toxics
  • Token Bucket Rate Limiting — The bandwidth toxic uses token bucket algorithm to rate-limit throughput per connection — see how toxics limit data flows in real time
  • Goroutine Coordination with tomb.v1 — Toxiproxy uses tomb.Tomb for graceful goroutine lifecycle management (proxy.go spawn/cleanup) — critical for resource leaks in test suites
  • Prometheus Metrics Instrumentation — Collectors export metrics (collectors/proxy.go) for monitoring proxy health and chaos impact — necessary for understanding test-time traffic patterns
  • HTTP REST API for Runtime Reconfiguration — api.go implements dynamic toxic management without server restart — enables mid-test chaos injection (Toxic.Apply pattern in Ruby client)
  • Bidirectional Stream Buffering — link.go maintains separate upstream/downstream buffers for each direction — required to understand how toxics apply asymmetrically (latency on response only, etc.)
  • Chaos Engineering as Code — Toxiproxy enables 'failure as code' patterns (YAML config, HTTP API, client SDKs) — foundational to resilience testing philosophy in distributed systems
  • grpc-ecosystem/go-grpc-middleware — Similar chaos-injection approach but for gRPC interceptors rather than TCP proxies
  • linuxkit/linuxkit — Uses Toxiproxy for testing distributed system resilience in container orchestration scenarios
  • Shopify/toxiproxy-ruby — Official Ruby client library for Toxiproxy API — main companion repo for test code
  • Shopify/toxiproxy-python — Official Python client library — enables chaos testing in Python test suites
  • chaos-mesh/chaos-mesh — Kubernetes-native chaos engineering tool that complements Toxiproxy for orchestrated cluster-level failure injection

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for the Prometheus metrics API

The repo has metrics.go and collectors/ directory with Prometheus integration, but test/e2e/ only contains benchmark_test.go. There's no e2e test coverage for the metrics endpoints. Given that metrics.go and collectors/ (proxy.go, runtime.go, common.go) are core observability features, adding e2e tests would ensure the Prometheus scrape endpoints work correctly with toxics applied and validate metric accuracy.

  • [ ] Create test/e2e/metrics_test.go to test Prometheus /metrics endpoint responses
  • [ ] Verify metrics are emitted correctly when proxies are created/destroyed (test proxies using proxy_collection.go)
  • [ ] Test that collectors/runtime.go and collectors/proxy.go metrics are present and valid
  • [ ] Add tests validating metric values change appropriately when toxics are applied/removed
  • [ ] Reference metrics.go and existing collectors/ implementations in assertions

Add stream direction integration tests for bidirectional toxic effects

The stream/direction.go file exists with direction_test.go, but the integration between toxics applied to specific stream directions (Upstream/Downstream) lacks e2e coverage. This is critical because toxics behavior differs by direction, yet there's no test validating that a toxic applied to Upstream doesn't affect Downstream traffic in realistic scenarios.

  • [ ] Create test/e2e/direction_test.go testing toxics with explicit direction filters
  • [ ] Test applying a latency toxic to Upstream direction and verify Downstream traffic is unaffected
  • [ ] Test asymmetric network conditions (e.g., latency in one direction only) using stream/direction.go constants
  • [ ] Validate via testhelper/tcp_server.go that bidirectional communication works correctly with direction-specific toxics
  • [ ] Reference proxy.go and stream/direction.go in test setup

Add CLI integration tests for toxic lifecycle management

The cmd/cli/cli.go exists but there's no dedicated CLI e2e test suite validating the command-line interface for creating, updating, and deleting proxies and toxics. This is a high-value gap since users interact primarily through the CLI, and the test/e2e/ directory only has benchmark_test.go and endpoint.go.

  • [ ] Create test/e2e/cli_test.go with tests for major CLI commands (proxy create, toxic add, toxic delete, etc.)
  • [ ] Test CLI flag parsing for toxics with complex parameters (e.g., --toxictype bandwidth with rate limits)
  • [ ] Validate CLI output formatting and error handling against expected responses
  • [ ] Use testhelper/tcp_server.go to create real backend servers for end-to-end CLI workflows
  • [ ] Reference urfave/cli/v2 usage in cmd/cli/cli.go to ensure all major commands are covered

🌿Good first issues

  • Add example Go test file demonstrating all toxic types together (latency + bandwidth + timeout). Reference exists in _examples/tests/db_test.go but lacks a standalone quickstart.
  • Document metrics endpoints and Prometheus scrape config in METRICS.md with concrete examples — the collectors/common.go patterns aren't explained for new contributors.
  • Add integration test validating that ProxyCollection.ProxyNames() and filtering work correctly (see api.go endpoints for /proxies query params) — currently no test_test.go file for proxy_collection.go.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 2b329ff — Bump github/codeql-action from 4.35.2 to 4.35.3 (#730) (dependabot[bot])
  • 97a096f — Bump github.com/rs/zerolog from 1.35.0 to 1.35.1 (#729) (dependabot[bot])
  • 0083255 — Bump goreleaser/goreleaser-action from 7.0.0 to 7.2.1 (#728) (dependabot[bot])
  • 040036e — Bump github/codeql-action from 4.35.1 to 4.35.2 (#726) (dependabot[bot])
  • e1e059b — Bump docker/login-action from 4.0.0 to 4.1.0 (#723) (dependabot[bot])
  • be6f90e — Bump github/codeql-action from 4.34.1 to 4.35.1 (#721) (dependabot[bot])
  • 56b4363 — Bump github.com/rs/zerolog from 1.34.0 to 1.35.0 (#720) (dependabot[bot])
  • 316b492 — Bump github/codeql-action from 4.32.6 to 4.34.1 (#719) (dependabot[bot])
  • c4e60d7 — Bump docker/login-action from 3.7.0 to 4.0.0 (#717) (dependabot[bot])
  • 1918bcb — Bump github/codeql-action from 4.32.4 to 4.32.6 (#716) (dependabot[bot])

🔒Security observations

Toxiproxy's security posture is moderately secure with some notable concerns. The main issues are: (1) insecure default binding to 0.0.0.0 in Docker, (2) use of deprecated/unmaintained dependencies (tomb.v1 from 2014), and (3) potential information disclosure via logging configuration. The codebase uses modern dependencies like gorilla/mux and zerolog, but lacks visible input validation in API handlers. The application itself is designed for testing/chaos engineering purposes, so some security trade-offs are intentional. However, production deployments require explicit hardening, particularly around network binding, logging levels, and input validation. No hardcoded credentials or critical injection vulnerabilities were identified in the provided file structure.

  • Medium · Insecure Default Binding Address in Docker — Dockerfile. The Dockerfile uses CMD ['-host=0.0.0.0'] which binds Toxiproxy to all network interfaces by default. This exposes the proxy server to any network that can reach the container, potentially allowing unauthorized access and misuse of the proxy for network tampering. Fix: Change the default binding to a restricted interface (e.g., 'localhost' or '127.0.0.1') or document that users must explicitly configure host binding. For production deployments, use environment variables to allow explicit configuration rather than defaulting to 0.0.0.0.
  • Medium · Deprecated Dependency: gopkg.in/tomb.v1 — go.mod (gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7). The dependency 'gopkg.in/tomb.v1' is deprecated and unmaintained. While not a direct security vulnerability, using outdated dependencies increases the risk of unpatched security issues and compatibility problems. The last update was in 2014. Fix: Evaluate if this dependency can be replaced with a maintained alternative or if its functionality can be replicated with modern context-based cancellation patterns (context.Context). Consider migrating to golang.org/x/sync/errgroup for similar functionality.
  • Low · YAML Configuration File Without Validation — Multiple YAML config files (dev.yml, _examples/tests/cluster.yml, etc.). The presence of 'go.yaml.in/yaml/v2' dependency and YAML configuration files (dev.yml, cluster.yml, resources.yml) suggests YAML parsing. Without visible input validation, there's a potential for YAML injection or unsafe deserialization if user-supplied YAML is processed without sanitization. Fix: Implement strict YAML schema validation. Use a schema validation library to ensure only expected fields are accepted. Sanitize and validate all user-supplied YAML input before processing.
  • Low · Potential Information Disclosure via Logging — Dockerfile (LOG_LEVEL environment variable) and throughout codebase using zerolog. The codebase uses 'github.com/rs/zerolog' for logging and sets 'ENV LOG_LEVEL=info' by default in the Dockerfile. If not properly configured, logging could expose sensitive information about proxied connections, network conditions, or internal system state. Fix: Ensure LOG_LEVEL defaults to 'warn' or 'error' in production. Implement log filtering to prevent exposure of sensitive data (credentials, IP addresses, payload data). Document what data is logged and ensure users understand privacy implications.
  • Low · Missing HEALTHCHECK in Dockerfile — Dockerfile. The Dockerfile uses 'FROM scratch' and provides no health check mechanism. This can lead to deployment issues where unhealthy containers continue running, potentially masking service failures. Fix: Add a HEALTHCHECK instruction to the Dockerfile to verify the proxy is responsive. Example: HEALTHCHECK --interval=30s --timeout=3s CMD [curl or similar tool to check the API endpoint].
  • Low · No Input Validation Visible in API Handler — api.go and client-facing API endpoints. While the codebase includes 'api.go' and uses 'gorilla/mux' for routing, there's no clear evidence of comprehensive input validation on API endpoints that accept proxy configurations and toxic parameters. Fix: Implement strict input validation for all API endpoints. Validate port numbers, host addresses, timeout values, and other parameters. Use allowlists where possible rather than blocklists. Return clear validation error messages without exposing internal system details.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · Shopify/toxiproxy — RepoPilot