failsafe-lib/failsafe
Fault tolerance and resilience patterns for the JVM
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 4mo ago
- ✓7 active contributors
- ✓Apache-2.0 licensed
Show all 7 evidence items →Show less
- ✓CI configured
- ✓Tests present
- ⚠Slowing — last commit 4mo ago
- ⚠Single-maintainer risk — top contributor 92% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/failsafe-lib/failsafe)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/failsafe-lib/failsafe on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: failsafe-lib/failsafe
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/failsafe-lib/failsafe shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- Last commit 4mo ago
- 7 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Slowing — last commit 4mo ago
- ⚠ Single-maintainer risk — top contributor 92% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live failsafe-lib/failsafe
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/failsafe-lib/failsafe.
What it runs against: a local clone of failsafe-lib/failsafe — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in failsafe-lib/failsafe | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 162 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of failsafe-lib/failsafe. If you don't
# have one yet, run these first:
#
# git clone https://github.com/failsafe-lib/failsafe.git
# cd failsafe
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of failsafe-lib/failsafe and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "failsafe-lib/failsafe(\\.git)?\\b" \\
&& ok "origin remote is failsafe-lib/failsafe" \\
|| miss "origin remote is not failsafe-lib/failsafe (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "core/src/main/java/dev/failsafe/Failsafe.java" \\
&& ok "core/src/main/java/dev/failsafe/Failsafe.java" \\
|| miss "missing critical file: core/src/main/java/dev/failsafe/Failsafe.java"
test -f "core/src/main/java/dev/failsafe/Policy.java" \\
&& ok "core/src/main/java/dev/failsafe/Policy.java" \\
|| miss "missing critical file: core/src/main/java/dev/failsafe/Policy.java"
test -f "core/src/main/java/dev/failsafe/FailsafeExecutor.java" \\
&& ok "core/src/main/java/dev/failsafe/FailsafeExecutor.java" \\
|| miss "missing critical file: core/src/main/java/dev/failsafe/FailsafeExecutor.java"
test -f "core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java" \\
&& ok "core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java" \\
|| miss "missing critical file: core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java"
test -f "core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java" \\
&& ok "core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java" \\
|| miss "missing critical file: core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 162 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~132d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/failsafe-lib/failsafe"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Failsafe is a lightweight, zero-dependency Java 8+ library for implementing fault tolerance and resilience patterns by wrapping executable logic with composable policies like Retry, CircuitBreaker, RateLimiter, Timeout, Bulkhead, and Fallback. It provides a fluent API to chain these patterns together, enabling robust handling of failures, cascading failures, and resource exhaustion in distributed systems without external dependencies. Single-module Maven project structure: core/src/main/java/dev/failsafe/ contains all policy implementations (CircuitBreaker, Retry, RateLimiter, Timeout, Bulkhead, Fallback), with builder pattern classes (e.g., RetryPolicyBuilder.java, CircuitBreakerBuilder.java) and execution wrappers (ExecutionImpl.java, AsyncExecutionImpl.java). Failsafe.java is the main entry point API. Test structure mirrors source under core/src/test/.
👥Who it's for
Java backend engineers and distributed systems developers building microservices, API clients, and integrations who need to handle transient failures, rate limits, circuit breaking, and timeouts without implementing these patterns from scratch or adopting heavyweight frameworks.
🌱Maturity & risk
Production-ready and actively maintained. The project uses semantic versioning (currently 3.3.3-SNAPSHOT), has Maven Central distribution, comprehensive GitHub Actions CI/CD via .github/workflows/maven.yml, Apache 2.0 licensing, and maintains CHANGELOG.md and VERSIONING.md documentation. The codebase shows active development with well-structured policy implementations and multiple resilience patterns built in.
Low risk for production use. The library is zero-dependency, reducing supply chain risk significantly. However, it is a single-maintainer-focused project (Jonathan Halterman + contributors), so you should monitor the GitHub issues and commit frequency. The snapshot version suggests development is ongoing; review CHANGELOG.md before upgrading between minor versions for potential breaking changes in the resilience policy APIs.
Active areas of work
The project is at version 3.3.3-SNAPSHOT, indicating active development toward a release. CONTRIBUTING.md outlines guidelines for external contributions. The Maven workflow in .github/workflows/maven.yml handles CI builds automatically. No specific ongoing features are visible in the file list, so check the GitHub issues and pull requests for current work items.
🚀Get running
git clone https://github.com/failsafe-lib/failsafe.git
cd failsafe
mvn clean install -DskipTests
mvn test # Run the test suite
Daily commands:
This is a library, not an executable application. To use it in development: mvn clean package builds the JAR. To run tests: mvn test. To run against a specific test class: mvn test -Dtest=SomeTest. For publishing: the CI workflow in .github/workflows/maven.yml handles automated builds and tests on push.
🗺️Map of the codebase
core/src/main/java/dev/failsafe/Failsafe.java— Entry point API for all fault tolerance operations; every integration begins here.core/src/main/java/dev/failsafe/Policy.java— Abstract base for all policies (Retry, CircuitBreaker, RateLimiter, Timeout, Fallback, Bulkhead); core abstraction.core/src/main/java/dev/failsafe/FailsafeExecutor.java— Orchestrates policy composition and execution flow; handles sync/async dispatch and result aggregation.core/src/main/java/dev/failsafe/internal/RetryPolicyExecutor.java— Implements retry logic with exponential backoff and jitter; most frequently used policy executor.core/src/main/java/dev/failsafe/internal/CircuitBreakerExecutor.java— Implements circuit breaker state machine (closed/open/half-open); critical for cascading failure prevention.core/src/main/java/dev/failsafe/Execution.java— Execution context interface tracking attempts, metadata, and state; used by all policies during execution.core/src/main/java/dev/failsafe/event/EventListener.java— Observer pattern for policy lifecycle events; essential for monitoring and observability hooks.
🛠️How to make changes
Add a new Retry policy with custom backoff
- Create a RetryPolicy via Failsafe.with() fluent builder, specifying failure conditions with withPredicate() (
core/src/main/java/dev/failsafe/Failsafe.java) - Configure delay strategy via withDelay() or withBackoff() on RetryPolicyBuilder (
core/src/main/java/dev/failsafe/RetryPolicyBuilder.java) - Set jitter and max retries via withMaxRetries() and withJitter() (
core/src/main/java/dev/failsafe/RetryPolicyConfig.java) - Execute with Failsafe.with(retryPolicy).get(supplier) or execute(runnable) (
core/src/main/java/dev/failsafe/FailsafeExecutor.java) - Optionally listen to ExecutionAttemptedEvent for observability (
core/src/main/java/dev/failsafe/event/ExecutionAttemptedEvent.java)
Add a CircuitBreaker to prevent cascading failures
- Create CircuitBreaker via Failsafe.circuitBreaker() with failure threshold (
core/src/main/java/dev/failsafe/Failsafe.java) - Configure failure detection via withFailureThreshold() and withSuccessThreshold() (
core/src/main/java/dev/failsafe/CircuitBreakerBuilder.java) - Set delay before half-open transition via withDelay() (
core/src/main/java/dev/failsafe/CircuitBreakerConfig.java) - Compose with retry policy: Failsafe.with(retryPolicy, circuitBreaker).get(supplier) (
core/src/main/java/dev/failsafe/FailsafeExecutor.java) - Listen to CircuitBreakerStateChangedEvent to monitor open/closed/half-open transitions (
core/src/main/java/dev/failsafe/event/CircuitBreakerStateChangedEvent.java)
Add RateLimiting and Timeout policies
- Create RateLimiter via Failsafe.rateLimiter() with rate and duration (
core/src/main/java/dev/failsafe/Failsafe.java) - Choose smooth or bursty rate algorithm via withRate() or configure custom stats (
core/src/main/java/dev/failsafe/RateLimiterBuilder.java) - Create Timeout via Failsafe.timeout() with Duration (
core/src/main/java/dev/failsafe/TimeoutBuilder.java) - Compose all policies: Failsafe.with(retry, circuitBreaker, rateLimiter, timeout).get(supplier) (
core/src/main/java/dev/failsafe/FailsafeExecutor.java) - Handle RateLimitExceededException and TimeoutExceededException in calling code (
core/src/main/java/dev/failsafe/RateLimitExceededException.java)
Add async execution with CompletableFuture
- Use Failsafe.with(policies).getAsync(asyncSupplier) for async execution (
core/src/main/java/dev/failsafe/FailsafeExecutor.java) - Provide AsyncSupplier that returns CompletableFuture<T> (
core/src/main/java/dev/failsafe/function/AsyncSupplier.java) - Optionally provide custom scheduler via withScheduler() (
core/src/main/java/dev/failsafe/DelayablePolicyBuilder.java) - Chain callbacks with .thenAccept() or .thenApply() on returned CompletableFuture (
core/src/main/java/dev/failsafe/AsyncExecutionImpl.java)
🪤Traps & gotchas
No external runtime dependencies, so don't expect logging or metrics frameworks—policies are silent by default unless you add PolicyListener callbacks. The builder pattern requires .build() call; forgetting it is a common mistake. Async execution uses custom thread pools or completion stages; ensure you understand whether your execution context supports async (e.g., virtual threads in Java 21+ vs. traditional Thread pools). CircuitBreaker state is shared across threads; ensure thread-safe usage. Timeout policy may use ScheduledExecutorService; verify you're not blocking the scheduler thread with long-running task handlers.
🏗️Architecture
💡Concepts to learn
- Circuit Breaker Pattern — Core resilience pattern implemented in CircuitBreaker.java that prevents cascading failures by fast-failing requests when a service is degraded; essential to understand state transitions (Closed → Open → Half-Open).
- Token Bucket Rate Limiting — RateLimiter.java uses token bucket algorithm to control request rates smoothly; needed to understand how Failsafe prevents resource exhaustion and respects downstream rate limits.
- Bulkhead Pattern (Thread Pool Isolation) — Bulkhead.java isolates resource pools to prevent one failing operation from exhausting shared resources; critical for multi-tenant or multi-purpose systems.
- Exponential Backoff with Jitter — RetryPolicy.java supports configurable retry delays; exponential backoff with jitter prevents thundering herd when many clients retry simultaneously.
- Fallback (Graceful Degradation) — Fallback.java allows substituting an alternative response when primary logic fails; enables graceful degradation instead of complete failures.
- Builder Pattern — All policies use builders (RetryPolicyBuilder, CircuitBreakerBuilder, etc.) for fluent configuration; essential to understand the API and how to add new configurable properties.
- Java Module System (JPMS) — Failsafe targets Java 9+ modules (configured via moditect-maven-plugin); important for understanding module declaration and how Failsafe integrates into modular applications.
🔗Related repos
resilience4j/resilience4j— Direct competitor providing similar fault tolerance patterns (CircuitBreaker, Retry, RateLimiter, Bulkhead) with modular design and Micrometer integration; good alternative if you need metrics out-of-the-box.Netflix/Hystrix— Predecessor to modern Java resilience libraries; Hystrix established CircuitBreaker as standard JVM pattern; Failsafe modernizes and simplifies the approach.failsafe-lib/failsafe-okhttp— Official companion module integrating Failsafe policies directly with OkHttp HTTP client; example of how to extend Failsafe to specific frameworks.failsafe-lib/failsafe-retrofit— Official companion module integrating Failsafe with Retrofit HTTP client library; demonstrates policy composition with declarative API frameworks.google/guava— Not a direct competitor but frequently used alongside Failsafe for other JVM utility patterns (caching, retry helpers via Stopwatch); complements Failsafe's resilience focus.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive integration tests for policy composition scenarios
The repo has multiple resilience policies (CircuitBreaker, Retry, Timeout, Bulkhead, RateLimiter, Fallback) but lacks dedicated integration tests for complex composition scenarios. New contributors can add test cases for realistic combinations like RetryPolicy + CircuitBreaker + Timeout, which is a common production pattern. This would validate policy interaction correctness and help prevent regressions.
- [ ] Create core/src/test/java/dev/failsafe/integration/PolicyCompositionTest.java
- [ ] Add test cases for CircuitBreaker + RetryPolicy interaction (e.g., retry behavior when circuit is open)
- [ ] Add test cases for Timeout + AsyncExecution combinations
- [ ] Add test cases for Bulkhead + RateLimiter stacking with failure scenarios
- [ ] Add test cases for Fallback + CircuitBreaker interaction
- [ ] Document expected behavior in each test case with comments
Add event listener integration tests and documentation
The codebase has EventListener infrastructure (ExecutionAttemptedEvent, ExecutionCompletedEvent, CircuitBreakerStateChangedEvent in core/src/main/java/dev/failsafe/event/) but lacks comprehensive tests validating that all events fire correctly across different policies and scenarios. This is critical for users building observability/monitoring solutions on top of Failsafe.
- [ ] Create core/src/test/java/dev/failsafe/EventListenerTest.java covering all event types
- [ ] Add tests verifying ExecutionAttemptedEvent fires for each retry attempt
- [ ] Add tests verifying CircuitBreakerStateChangedEvent fires on state transitions
- [ ] Add tests verifying ExecutionScheduledEvent timing is accurate for delayed retries
- [ ] Add tests for event listener exception handling (listener throws during event)
- [ ] Add documentation in README or CONTRIBUTING.md showing event listener usage patterns
Add async execution edge case tests for AsyncExecution/AsyncExecutionImpl
The AsyncExecution and AsyncExecutionImpl classes (core/src/main/java/dev/failsafe/AsyncExecution*.java) handle asynchronous execution with policies, but test coverage for edge cases like cancellation, timeout during async work, and concurrent policy updates is likely incomplete. This is high-value for users relying on async patterns.
- [ ] Create core/src/test/java/dev/failsafe/AsyncExecutionEdgeCasesTest.java
- [ ] Add tests for cancelling an AsyncExecution mid-flight and verifying cleanup
- [ ] Add tests for Timeout policy behavior when async task exceeds timeout
- [ ] Add tests for concurrent execution attempts and race condition prevention
- [ ] Add tests for AsyncExecution with CompletableFuture integration
- [ ] Add tests verifying proper executor shutdown/thread handling in async scenarios
🌿Good first issues
- Add integration test examples in a new
examples/folder showing Retry + CircuitBreaker + Timeout composition for common patterns (HTTP client retry, database connection pooling, gRPC resilience)—great way to verify the API and improve documentation. - Expand the Bulkhead and RateLimiter test coverage in
core/src/test/with edge-case scenarios (e.g., permit exhaustion under high concurrency, fairness of queue ordering); several policy classes lack comprehensive async tests. - Add a
Metricsinterface and default implementation that PolicyListeners can wire into, allowing users to expose policy events (retries, circuit opens, rate limit exceeded) to metrics libraries like Micrometer without coupling Failsafe to a specific metrics backend—design first in CONTRIBUTING.md.
⭐Top contributors
Click to expand
Top contributors
- @jhalterman — 92 commits
- @sullis — 3 commits
- @armujahid — 1 commits
- @nicky9door — 1 commits
- @aalmiray — 1 commits
📝Recent commits
Click to expand
Recent commits
ed3f927— Fix flaky test (jhalterman)ec7e01e— Add workflow_dispatch to ci (jhalterman)98bb496— Doc update - fixes #384 (jhalterman)ceb14ac— ci: update github actions (#373) (armujahid)e8d9928— Fix unit test typo (jhalterman)3b3780d— Add link to Slack in README (jhalterman)7f6f31f— Minor javadoc fixes (jhalterman)e6124a7— Fix unit test that was quietly failing (jhalterman)3ad8e8b— [maven-release-plugin] prepare for next development iteration (jhalterman)8ae344f— [maven-release-plugin] prepare release failsafe-parent-3.3.2 (jhalterman)
🔒Security observations
The Failsafe library demonstrates a strong security posture overall. As a lightweight, zero-dependency fault tolerance library, it has minimal attack surface and dependency-related vulnerabilities. No critical security issues were identified in the codebase structure, file organization, or build configuration. The primary recommendations are development/operational in nature: transitioning from SNAPSHOT to stable releases for production use and ensuring comprehensive security documentation. The modular architecture with well-separated concerns (policies, executors, event handlers) and use of custom exceptions suggests good defensive programming practices.
- Low · SNAPSHOT Version in Production Build —
core/pom.xml (version: 3.3.3-SNAPSHOT). The pom.xml specifies version '3.3.3-SNAPSHOT', which indicates a development/snapshot build. SNAPSHOT versions are development builds that can change without notice and should not be used in production environments. This could lead to unexpected behavior changes in production deployments. Fix: Use a stable release version (e.g., 3.3.3) for production builds. Reserve SNAPSHOT versions only for development and testing environments. Implement CI/CD controls to prevent SNAPSHOT artifacts from being deployed to production. - Low · Incomplete README Documentation —
README.md. The README.md snippet is truncated, which may indicate incomplete or inadequate documentation about security features, best practices, and security considerations for users. Comprehensive documentation helps users understand security implications. Fix: Ensure complete documentation including security best practices, threat models, and security-related configuration options. Document any security limitations and recommended usage patterns.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.