failsafe-lib/failsafe

Item: failsafe-lib/failsafe
Rating: 5
Author: RepoPilot

Fault tolerance and resilience patterns for the JVM

Healthy

Healthy across all four use cases

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 4mo ago
✓7 active contributors
✓Apache-2.0 licensed

Show all 7 evidence items →

✓CI configured
✓Tests present
⚠Slowing — last commit 4mo ago
⚠Single-maintainer risk — top contributor 92% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/failsafe-lib/failsafe)](https://repopilot.app/r/failsafe-lib/failsafe)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/failsafe-lib/failsafe on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: failsafe-lib/failsafe

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/failsafe-lib/failsafe shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

Last commit 4mo ago
7 active contributors
Apache-2.0 licensed
CI configured
Tests present
⚠ Slowing — last commit 4mo ago
⚠ Single-maintainer risk — top contributor 92% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live failsafe-lib/failsafe repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/failsafe-lib/failsafe.

What it runs against: a local clone of failsafe-lib/failsafe — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in failsafe-lib/failsafe | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 162 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>failsafe-lib/failsafe</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of failsafe-lib/failsafe. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/failsafe-lib/failsafe.git
#   cd failsafe
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of failsafe-lib/failsafe and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "failsafe-lib/failsafe(\\.git)?\\b" \\
  && ok "origin remote is failsafe-lib/failsafe" \\
  || miss "origin remote is not failsafe-lib/failsafe (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "core/src/main/java/dev/failsafe/Failsafe.java" \\
  && ok "core/src/main/java/dev/failsafe/Failsafe.java" \\
  || miss "missing critical file: core/src/main/java/dev/failsafe/Failsafe.java"
test -f "core/src/main/java/dev/failsafe/Policy.java" \\
  && ok "core/src/main/java/dev/failsafe/Policy.java" \\
  || miss "missing critical file: core/src/main/java/dev/failsafe/Policy.java"
test -f "core/src/main/java/dev/failsafe/FailsafeExecutor.java" \\
  && ok "core/src/main/java/dev/failsafe/FailsafeExecutor.java" \\
  || miss "missing critical file: core/src/main/java/dev/failsafe/FailsafeExecutor.java"
test -f "core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java" \\
  && ok "core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java" \\
  || miss "missing critical file: core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java"
test -f "core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java" \\
  && ok "core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java" \\
  || miss "missing critical file: core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 162 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~132d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/failsafe-lib/failsafe"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Failsafe is a lightweight, zero-dependency Java 8+ library for implementing fault tolerance and resilience patterns by wrapping executable logic with composable policies like Retry, CircuitBreaker, RateLimiter, Timeout, Bulkhead, and Fallback. It provides a fluent API to chain these patterns together, enabling robust handling of failures, cascading failures, and resource exhaustion in distributed systems without external dependencies. Single-module Maven project structure: core/src/main/java/dev/failsafe/ contains all policy implementations (CircuitBreaker, Retry, RateLimiter, Timeout, Bulkhead, Fallback), with builder pattern classes (e.g., RetryPolicyBuilder.java, CircuitBreakerBuilder.java) and execution wrappers (ExecutionImpl.java, AsyncExecutionImpl.java). Failsafe.java is the main entry point API. Test structure mirrors source under core/src/test/.

👥Who it's for

Java backend engineers and distributed systems developers building microservices, API clients, and integrations who need to handle transient failures, rate limits, circuit breaking, and timeouts without implementing these patterns from scratch or adopting heavyweight frameworks.

🌱Maturity & risk

Production-ready and actively maintained. The project uses semantic versioning (currently 3.3.3-SNAPSHOT), has Maven Central distribution, comprehensive GitHub Actions CI/CD via .github/workflows/maven.yml, Apache 2.0 licensing, and maintains CHANGELOG.md and VERSIONING.md documentation. The codebase shows active development with well-structured policy implementations and multiple resilience patterns built in.

Low risk for production use. The library is zero-dependency, reducing supply chain risk significantly. However, it is a single-maintainer-focused project (Jonathan Halterman + contributors), so you should monitor the GitHub issues and commit frequency. The snapshot version suggests development is ongoing; review CHANGELOG.md before upgrading between minor versions for potential breaking changes in the resilience policy APIs.

Active areas of work

The project is at version 3.3.3-SNAPSHOT, indicating active development toward a release. CONTRIBUTING.md outlines guidelines for external contributions. The Maven workflow in .github/workflows/maven.yml handles CI builds automatically. No specific ongoing features are visible in the file list, so check the GitHub issues and pull requests for current work items.

🚀Get running

git clone https://github.com/failsafe-lib/failsafe.git
cd failsafe
mvn clean install -DskipTests
mvn test  # Run the test suite

Daily commands: This is a library, not an executable application. To use it in development: mvn clean package builds the JAR. To run tests: mvn test. To run against a specific test class: mvn test -Dtest=SomeTest. For publishing: the CI workflow in .github/workflows/maven.yml handles automated builds and tests on push.

🗺️Map of the codebase

core/src/main/java/dev/failsafe/Failsafe.java — Entry point API for all fault tolerance operations; every integration begins here.
core/src/main/java/dev/failsafe/Policy.java — Abstract base for all policies (Retry, CircuitBreaker, RateLimiter, Timeout, Fallback, Bulkhead); core abstraction.
core/src/main/java/dev/failsafe/FailsafeExecutor.java — Orchestrates policy composition and execution flow; handles sync/async dispatch and result aggregation.
core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java — Implements retry logic with exponential backoff and jitter; most frequently used policy executor.
core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java — Implements circuit breaker state machine (closed/open/half-open); critical for cascading failure prevention.
core/src/main/java/dev/failsafe/Execution.java — Execution context interface tracking attempts, metadata, and state; used by all policies during execution.
core/src/main/java/dev/failsafe/event/EventListener.java — Observer pattern for policy lifecycle events; essential for monitoring and observability hooks.

🛠️How to make changes

Add a new Retry policy with custom backoff

Create a RetryPolicy via Failsafe.with() fluent builder, specifying failure conditions with withPredicate() (core/src/main/java/dev/failsafe/Failsafe.java)
Configure delay strategy via withDelay() or withBackoff() on RetryPolicyBuilder (core/src/main/java/dev/failsafe/RetryPolicyBuilder.java)
Set jitter and max retries via withMaxRetries() and withJitter() (core/src/main/java/dev/failsafe/RetryPolicyConfig.java)
Execute with Failsafe.with(retryPolicy).get(supplier) or execute(runnable) (core/src/main/java/dev/failsafe/FailsafeExecutor.java)
Optionally listen to ExecutionAttemptedEvent for observability (core/src/main/java/dev/failsafe/event/ExecutionAttemptedEvent.java)

Add a CircuitBreaker to prevent cascading failures

Create CircuitBreaker via Failsafe.circuitBreaker() with failure threshold (core/src/main/java/dev/failsafe/Failsafe.java)
Configure failure detection via withFailureThreshold() and withSuccessThreshold() (core/src/main/java/dev/failsafe/CircuitBreakerBuilder.java)
Set delay before half-open transition via withDelay() (core/src/main/java/dev/failsafe/CircuitBreakerConfig.java)
Compose with retry policy: Failsafe.with(retryPolicy, circuitBreaker).get(supplier) (core/src/main/java/dev/failsafe/FailsafeExecutor.java)
Listen to CircuitBreakerStateChangedEvent to monitor open/closed/half-open transitions (core/src/main/java/dev/failsafe/event/CircuitBreakerStateChangedEvent.java)

Add RateLimiting and Timeout policies

Create RateLimiter via Failsafe.rateLimiter() with rate and duration (core/src/main/java/dev/failsafe/Failsafe.java)
Choose smooth or bursty rate algorithm via withRate() or configure custom stats (core/src/main/java/dev/failsafe/RateLimiterBuilder.java)
Create Timeout via Failsafe.timeout() with Duration (core/src/main/java/dev/failsafe/TimeoutBuilder.java)
Compose all policies: Failsafe.with(retry, circuitBreaker, rateLimiter, timeout).get(supplier) (core/src/main/java/dev/failsafe/FailsafeExecutor.java)
Handle RateLimitExceededException and TimeoutExceededException in calling code (core/src/main/java/dev/failsafe/RateLimitExceededException.java)

Add async execution with CompletableFuture

Use Failsafe.with(policies).getAsync(asyncSupplier) for async execution (core/src/main/java/dev/failsafe/FailsafeExecutor.java)
Provide AsyncSupplier that returns CompletableFuture<T> (core/src/main/java/dev/failsafe/function/AsyncSupplier.java)
Optionally provide custom scheduler via withScheduler() (core/src/main/java/dev/failsafe/DelayablePolicyBuilder.java)
Chain callbacks with .thenAccept() or .thenApply() on returned CompletableFuture (core/src/main/java/dev/failsafe/AsyncExecutionImpl.java)

🪤Traps & gotchas

No external runtime dependencies, so don't expect logging or metrics frameworks—policies are silent by default unless you add PolicyListener callbacks. The builder pattern requires .build() call; forgetting it is a common mistake. Async execution uses custom thread pools or completion stages; ensure you understand whether your execution context supports async (e.g., virtual threads in Java 21+ vs. traditional Thread pools). CircuitBreaker state is shared across threads; ensure thread-safe usage. Timeout policy may use ScheduledExecutorService; verify you're not blocking the scheduler thread with long-running task handlers.

🏗️Architecture

💡Concepts to learn

Circuit Breaker Pattern — Core resilience pattern implemented in CircuitBreaker.java that prevents cascading failures by fast-failing requests when a service is degraded; essential to understand state transitions (Closed → Open → Half-Open).
Token Bucket Rate Limiting — RateLimiter.java uses token bucket algorithm to control request rates smoothly; needed to understand how Failsafe prevents resource exhaustion and respects downstream rate limits.
Bulkhead Pattern (Thread Pool Isolation) — Bulkhead.java isolates resource pools to prevent one failing operation from exhausting shared resources; critical for multi-tenant or multi-purpose systems.
Exponential Backoff with Jitter — RetryPolicy.java supports configurable retry delays; exponential backoff with jitter prevents thundering herd when many clients retry simultaneously.
Fallback (Graceful Degradation) — Fallback.java allows substituting an alternative response when primary logic fails; enables graceful degradation instead of complete failures.
Builder Pattern — All policies use builders (RetryPolicyBuilder, CircuitBreakerBuilder, etc.) for fluent configuration; essential to understand the API and how to add new configurable properties.
Java Module System (JPMS) — Failsafe targets Java 9+ modules (configured via moditect-maven-plugin); important for understanding module declaration and how Failsafe integrates into modular applications.

resilience4j/resilience4j — Direct competitor providing similar fault tolerance patterns (CircuitBreaker, Retry, RateLimiter, Bulkhead) with modular design and Micrometer integration; good alternative if you need metrics out-of-the-box.
Netflix/Hystrix — Predecessor to modern Java resilience libraries; Hystrix established CircuitBreaker as standard JVM pattern; Failsafe modernizes and simplifies the approach.
failsafe-lib/failsafe-okhttp — Official companion module integrating Failsafe policies directly with OkHttp HTTP client; example of how to extend Failsafe to specific frameworks.
failsafe-lib/failsafe-retrofit — Official companion module integrating Failsafe with Retrofit HTTP client library; demonstrates policy composition with declarative API frameworks.
google/guava — Not a direct competitor but frequently used alongside Failsafe for other JVM utility patterns (caching, retry helpers via Stopwatch); complements Failsafe's resilience focus.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for policy composition scenarios

The repo has multiple resilience policies (CircuitBreaker, Retry, Timeout, Bulkhead, RateLimiter, Fallback) but lacks dedicated integration tests for complex composition scenarios. New contributors can add test cases for realistic combinations like RetryPolicy + CircuitBreaker + Timeout, which is a common production pattern. This would validate policy interaction correctness and help prevent regressions.

[ ] Create core/src/test/java/dev/failsafe/integration/PolicyCompositionTest.java
[ ] Add test cases for CircuitBreaker + RetryPolicy interaction (e.g., retry behavior when circuit is open)
[ ] Add test cases for Timeout + AsyncExecution combinations
[ ] Add test cases for Bulkhead + RateLimiter stacking with failure scenarios
[ ] Add test cases for Fallback + CircuitBreaker interaction
[ ] Document expected behavior in each test case with comments

Add event listener integration tests and documentation

The codebase has EventListener infrastructure (ExecutionAttemptedEvent, ExecutionCompletedEvent, CircuitBreakerStateChangedEvent in core/src/main/java/dev/failsafe/event/) but lacks comprehensive tests validating that all events fire correctly across different policies and scenarios. This is critical for users building observability/monitoring solutions on top of Failsafe.

[ ] Create core/src/test/java/dev/failsafe/EventListenerTest.java covering all event types
[ ] Add tests verifying ExecutionAttemptedEvent fires for each retry attempt
[ ] Add tests verifying CircuitBreakerStateChangedEvent fires on state transitions
[ ] Add tests verifying ExecutionScheduledEvent timing is accurate for delayed retries
[ ] Add tests for event listener exception handling (listener throws during event)
[ ] Add documentation in README or CONTRIBUTING.md showing event listener usage patterns

Add async execution edge case tests for AsyncExecution/AsyncExecutionImpl

The AsyncExecution and AsyncExecutionImpl classes (core/src/main/java/dev/failsafe/AsyncExecution*.java) handle asynchronous execution with policies, but test coverage for edge cases like cancellation, timeout during async work, and concurrent policy updates is likely incomplete. This is high-value for users relying on async patterns.

[ ] Create core/src/test/java/dev/failsafe/AsyncExecutionEdgeCasesTest.java
[ ] Add tests for cancelling an AsyncExecution mid-flight and verifying cleanup
[ ] Add tests for Timeout policy behavior when async task exceeds timeout
[ ] Add tests for concurrent execution attempts and race condition prevention
[ ] Add tests for AsyncExecution with CompletableFuture integration
[ ] Add tests verifying proper executor shutdown/thread handling in async scenarios

🌿Good first issues

Add integration test examples in a new examples/ folder showing Retry + CircuitBreaker + Timeout composition for common patterns (HTTP client retry, database connection pooling, gRPC resilience)—great way to verify the API and improve documentation.
Expand the Bulkhead and RateLimiter test coverage in core/src/test/ with edge-case scenarios (e.g., permit exhaustion under high concurrency, fairness of queue ordering); several policy classes lack comprehensive async tests.
Add a Metrics interface and default implementation that PolicyListeners can wire into, allowing users to expose policy events (retries, circuit opens, rate limit exceeded) to metrics libraries like Micrometer without coupling Failsafe to a specific metrics backend—design first in CONTRIBUTING.md.

⭐Top contributors

Click to expand

@jhalterman — 92 commits
@sullis — 3 commits
@armujahid — 1 commits
@nicky9door — 1 commits
@aalmiray — 1 commits

📝Recent commits

Click to expand

ed3f927 — Fix flaky test (jhalterman)
ec7e01e — Add workflow_dispatch to ci (jhalterman)
98bb496 — Doc update - fixes #384 (jhalterman)
ceb14ac — ci: update github actions (#373) (armujahid)
e8d9928 — Fix unit test typo (jhalterman)
3b3780d — Add link to Slack in README (jhalterman)
7f6f31f — Minor javadoc fixes (jhalterman)
e6124a7 — Fix unit test that was quietly failing (jhalterman)
3ad8e8b — [maven-release-plugin] prepare for next development iteration (jhalterman)
8ae344f — [maven-release-plugin] prepare release failsafe-parent-3.3.2 (jhalterman)

🔒Security observations

The Failsafe library demonstrates a strong security posture overall. As a lightweight, zero-dependency fault tolerance library, it has minimal attack surface and dependency-related vulnerabilities. No critical security issues were identified in the codebase structure, file organization, or build configuration. The primary recommendations are development/operational in nature: transitioning from SNAPSHOT to stable releases for production use and ensuring comprehensive security documentation. The modular architecture with well-separated concerns (policies, executors, event handlers) and use of custom exceptions suggests good defensive programming practices.

Low · SNAPSHOT Version in Production Build — core/pom.xml (version: 3.3.3-SNAPSHOT). The pom.xml specifies version '3.3.3-SNAPSHOT', which indicates a development/snapshot build. SNAPSHOT versions are development builds that can change without notice and should not be used in production environments. This could lead to unexpected behavior changes in production deployments. Fix: Use a stable release version (e.g., 3.3.3) for production builds. Reserve SNAPSHOT versions only for development and testing environments. Implement CI/CD controls to prevent SNAPSHOT artifacts from being deployed to production.
Low · Incomplete README Documentation — README.md. The README.md snippet is truncated, which may indicate incomplete or inadequate documentation about security features, best practices, and security considerations for users. Comprehensive documentation helps users understand security implications. Fix: Ensure complete documentation including security best practices, threat models, and security-related configuration options. Document any security limitations and recommended usage patterns.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

failsafe-lib/failsafe

Embed the "Healthy" badge

Onboarding doc

Onboarding: failsafe-lib/failsafe

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add a new Retry policy with custom backoff

Add a CircuitBreaker to prevent cascading failures

Add RateLimiting and Timeout policies

Add async execution with CompletableFuture

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive integration tests for policy composition scenarios

Add event listener integration tests and documentation

Add async execution edge case tests for AsyncExecution/AsyncExecutionImpl

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next