gilbertchen/duplicacy
A new generation cloud backup tool
Stale — last commit 1y ago
weakest axisnon-standard license (Other); last commit was 1y ago…
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
last commit was 1y ago; no CI workflows detected
- ✓6 active contributors
- ✓Other licensed
- ✓Tests present
Show all 7 evidence items →Show less
- ⚠Stale — last commit 1y ago
- ⚠Single-maintainer risk — top contributor 86% of recent commits
- ⚠Non-standard license (Other) — review terms
- ⚠No CI workflows detected
What would change the summary?
- →Use as dependency Concerns → Mixed if: clarify license terms
- →Deploy as-is Mixed → Healthy if: 1 commit in the last 180 days
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/gilbertchen/duplicacy)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/gilbertchen/duplicacy on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: gilbertchen/duplicacy
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/gilbertchen/duplicacy shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Stale — last commit 1y ago
- 6 active contributors
- Other licensed
- Tests present
- ⚠ Stale — last commit 1y ago
- ⚠ Single-maintainer risk — top contributor 86% of recent commits
- ⚠ Non-standard license (Other) — review terms
- ⚠ No CI workflows detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live gilbertchen/duplicacy
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/gilbertchen/duplicacy.
What it runs against: a local clone of gilbertchen/duplicacy — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in gilbertchen/duplicacy | Confirms the artifact applies here, not a fork |
| 2 | License is still Other | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | Last commit ≤ 400 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of gilbertchen/duplicacy. If you don't
# have one yet, run these first:
#
# git clone https://github.com/gilbertchen/duplicacy.git
# cd duplicacy
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of gilbertchen/duplicacy and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "gilbertchen/duplicacy(\\.git)?\\b" \\
&& ok "origin remote is gilbertchen/duplicacy" \\
|| miss "origin remote is not gilbertchen/duplicacy (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
&& ok "license is Other" \\
|| miss "license drift — was Other at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 400 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~370d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/gilbertchen/duplicacy"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Duplicacy is a lock-free deduplication cloud backup tool written in Go that enables multiple computers to back up to the same cloud storage without a centralized database, using content hashes as filenames for chunks and supporting cross-computer deduplication. It distinguishes itself by avoiding pack files and chunk databases entirely, instead storing each deduplicated chunk independently across multiple cloud backends (S3, Azure, GCS, Dropbox, B2, SFTP, and more). Monolithic structure: src/ contains 50+ files implementing storage backends (duplicacy_*storage.go for S3/Azure/GCS/Dropbox/B2/SFTP/etc.), core chunk operations (duplicacy_chunk.go, duplicacy_chunkmaker.go, duplicacy_chunkdownloader.go), backup orchestration (duplicacy_backupmanager.go), and utilities (config, keyring, logging). duplicacy/duplicacy_main.go is the CLI entry point. integration_tests/ contains shell-based end-to-end tests (copy_test.sh, resume_test.sh, sparse_test.sh).
👥Who it's for
System administrators and individual users who need reliable cross-platform cloud backups with multi-computer deduplication; developers implementing backup solutions who want a reference implementation of lock-free deduplication; DevOps engineers managing backup infrastructure across heterogeneous cloud providers.
🌱Maturity & risk
Production-ready and actively maintained. The codebase is substantial (~750K lines of Go), includes comprehensive integration tests in integration_tests/, a peer-reviewed IEEE paper documenting the algorithm, and a commercial Web GUI frontend (Duplicacy.com) built on this engine. Recent dependency updates visible in go.mod (2024 timeframe based on go-dropbox fork date) suggest active maintenance, though the single-maintainer nature should be noted.
Moderate risk factors: single maintainer (gilbertchen) creates bus factor concerns; 40+ external dependencies in go.mod with some custom forks (gilbertchen/goamz, gilbertchen/azure-sdk-for-go) that may lag upstream security patches. Large codebase (750K+ lines) with cryptographic operations (golang.org/x/crypto, AES, RSA via asymmetric encryption wiki) requires careful review. No visible GitHub Actions CI/CD in .github (only ISSUE_TEMPLATE.md), relying on manual integration tests instead.
Active areas of work
Cannot determine from repo metadata alone (no visible recent commits, PRs, or issues in provided file list). The commercial Duplicacy GUI and Vertical Backup for ESXi appear to be the active extensions. Check GitHub issues and releases for current work.
🚀Get running
git clone https://github.com/gilbertchen/duplicacy.git
cd duplicacy
go mod download
go build -o duplicacy ./duplicacy
./duplicacy --help
Daily commands:
cd duplicacy
go build -o duplicacy ./duplicacy
./duplicacy init -storage-name s3://my-bucket -repository /path/to/backup
./duplicacy backup
For testing: cd integration_tests && bash test.sh
🗺️Map of the codebase
- src/duplicacy_backupmanager.go: Core orchestrator for the backup workflow, handling file enumeration, chunking, deduplication, and chunk uploads
- src/duplicacy_chunk.go: Defines the Chunk struct and cryptographic operations (hashing, encryption, compression) that are fundamental to lock-free deduplication
- src/duplicacy_chunkmaker.go: Implements variable-length chunking algorithm (rabin fingerprinting via highwayhash) that enables content-based deduplication
- duplicacy/duplicacy_main.go: CLI entry point and command routing; start here to understand user-facing interface
- src/duplicacy_config.go: Repository configuration and credential management, critical for security and multi-computer coordination
- duplicacy_paper.pdf: IEEE-published paper explaining the lock-free deduplication algorithm and design rationale
- DESIGN.md: Architecture and design document for understanding system decisions
- integration_tests/test.sh: Entry point for integration test suite; defines how to validate backup/restore workflows
🛠️How to make changes
Adding a new storage backend: Implement the Storage interface in src/duplicacy_newstorage.go (see src/duplicacy_s3storage.go or src/duplicacy_azurestorage.go as templates). Modifying chunk handling: Edit src/duplicacy_chunkmaker.go (chunking strategy) or src/duplicacy_chunk.go (chunk structure). Fixing backup logic: src/duplicacy_backupmanager.go orchestrates the backup flow. CLI changes: duplicacy/duplicacy_main.go routes commands. Testing: Add unit tests to *_test.go files or shell tests to integration_tests/.
🪤Traps & gotchas
Lock-free coordination: Multi-computer backups rely on atomic cloud storage operations (put-if-absent semantics); ensure your storage backend supports this exactly or chunks may be duplicated or lost. Rabin fingerprinting tuning: src/duplicacy_chunkmaker.go uses highwayhash-based rolling hash; changing chunk size thresholds can break deduplication across snapshots. Keyring integration: src/duplicacy_keyring.go and duplicacy_keyring_windows.go require OS keyring access (Keychain on macOS, Credential Manager on Windows); headless/CI environments may fail silently. Concurrent chunk operations: Multiple goroutines access storage simultaneously; ensure your Storage implementation is thread-safe. Storage backend credentials: Different backends have different auth models (S3 uses AWS SDK, Azure uses connection strings, GCS uses OAuth2); check each backend's init code for required environment variables. Chunk hash collisions: While SHA256 is used, the system assumes hash collisions are impossible; never weaken the hash function.
💡Concepts to learn
- Lock-Free Deduplication — The core innovation enabling multiple computers to back up to shared storage without coordination overhead; understanding this is essential to grasp why Duplicacy's architecture differs fundamentally from traditional backup tools
- Content-Addressable Storage (CAS) — Chunks are identified by their hash (SHA256) rather than sequential IDs; this is why Duplicacy stores chunks independently by filename instead of using pack files and manifests
- Rabin Fingerprinting / Rolling Hash — Used in
src/duplicacy_chunkmaker.goto determine variable-length chunk boundaries; enables deduplication even when files shift slightly (e.g., inserted lines in source code) - AEAD Encryption (AES-GCM) — Duplicacy encrypts chunks with AES-GCM (golang.org/x/crypto); understanding authenticated encryption is crucial for security review and why it prevents tampering during multi-computer uploads
- Erasure Coding (Reed-Solomon) — Optional feature (klauspost/reedsolomon) for resilient data protection; allows recovery from partial storage loss without full redundancy, reducing cost on cloud backends
- Asymmetric Encryption (RSA) — Alternative encryption mode for scenarios where backup keys need separate management from restore keys; documented in wiki, uses golang.org/x/crypto for key operations
- Fossil Collection / Garbage Collection — Duplicacy's pruning strategy (see
integration_tests/fixed_test.sh,integration_tests/sparse_test.sh); understanding how orphaned chunks are identified and deleted is critical for storage efficiency and data retention policies
🔗Related repos
restic/restic— Leading alternative cloud backup tool with similar deduplication goals but uses pack files and a centralized manifest; key competitor and reference implementationborgbackup/borg— Another chunk-based deduplication backup tool (Attic predecessor) frequently benchmarked against Duplicacy; similar feature set but different architectural approachduplicacy/duplicacy-web— Official Web GUI frontend for Duplicacy built in Vue.js/Go; users deploying Duplicacy typically want this for ease-of-usetorvalds/linux— Codebase used in Duplicacy's official benchmarking experiments (images/duplicacy_benchmark_*.png); reference workload for performance validationgilbertchen/benchmarking— Repository containing Duplicacy's comparative benchmarks against restic, Attic, and duplicity; critical for understanding performance claims
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration tests for Azure and Google Cloud Storage backends
The integration_tests directory contains tests for copy, fixed, resume, sparse, and threaded operations, but these appear to be generic and may not cover cloud provider-specific edge cases. Azure (duplicacy_azurestorage.go) and GCS (duplicacy_gcsstorage.go) are major backends with no dedicated integration tests visible in the test suite. Adding provider-specific tests would catch authentication, rate-limiting, and API behavior issues.
- [ ] Create integration_tests/azure_test.sh with setup/teardown for Azure blob storage test containers
- [ ] Create integration_tests/gcs_test.sh with setup/teardown for Google Cloud Storage test buckets
- [ ] Source test_functions.sh for common utilities and add provider-specific helper functions
- [ ] Test chunk upload/download, deduplication, and snapshot listing for each provider
- [ ] Document required environment variables (AZURE_ACCOUNT_KEY, GCS_PROJECT_ID) in the test files
Add comprehensive unit tests for duplicacy_snapshot.go and duplicacy_snapshotmanager.go
While duplicacy_snapshotmanager_test.go exists, the snapshot management logic is core to the backup system. The duplicacy_snapshot.go file has no corresponding test file. Given the complexity of snapshot metadata, versioning, and the lock-free deduplication algorithm, robust unit tests for snapshot serialization, filtering, and retrieval would prevent regressions.
- [ ] Create src/duplicacy_snapshot_test.go with tests for snapshot marshaling/unmarshaling
- [ ] Add tests for snapshot ID generation, timestamp handling, and tag filtering
- [ ] Expand src/duplicacy_snapshotmanager_test.go to test snapshot listing across multiple revisions
- [ ] Add edge case tests: empty snapshots, malformed snapshot files, concurrent snapshot creation
- [ ] Test snapshot purge logic with various retention policies
Add GitHub Actions CI workflow for cross-platform binary builds and Go unit tests
The repo has .github/ISSUE_TEMPLATE.md but no visible workflow files (.github/workflows/). Given that Duplicacy supports Windows, macOS, and Linux with platform-specific code (duplicacy_utils_.go, duplicacy_shadowcopy_.go), a CI pipeline that builds and tests on all three platforms would catch platform-specific bugs early. This is especially important for the keyring and shadow copy features.
- [ ] Create .github/workflows/test.yml to run 'go test ./...' on ubuntu-latest, windows-latest, and macos-latest
- [ ] Add build matrix for Go 1.19+ to ensure compatibility
- [ ] Include steps to build binaries for linux, darwin (amd64, arm64), and windows (amd64)
- [ ] Run integration_tests/test.sh on Linux and macOS (skip shadow copy tests on Linux)
- [ ] Upload build artifacts so contributors can test binaries without local compilation
🌿Good first issues
- Add unit tests for
src/duplicacy_entry.goandsrc/duplicacy_entrylist.go(both have _test.go files but likely incomplete coverage for symlink, permission, and metadata edge cases on Windows/macOS/Linux) - Write documentation for the lock-free deduplication algorithm in a ALGORITHM.md file, explaining how
src/duplicacy_chunkdownloader.goand multi-computer chunk uploads prevent races without locks (reference duplicacy_paper.pdf) - Add integration tests for Storj backend (
src/duplicacy_storj*missing from integration_tests/) and SMB2 backend (src/duplicacy_filefabricstorage.goshares no test file)
⭐Top contributors
Click to expand
Top contributors
- @gilbertchen — 86 commits
- @markfeit — 8 commits
- @northnose — 2 commits
- @sevimo123 — 2 commits
- @gorbak25 — 1 commits
📝Recent commits
Click to expand
Recent commits
2def016— Bump version to 3.2.5 (gilbertchen)df76bd0— OneDrive: use correct parent reference when moving files (gilbertchen)065ae50— Improve parsing logic for swift storage URLs that contain multiple '@' (gilbertchen)bb214b6— Bump version to 3.2.4 (gilbertchen)6bca9fc— maxCollectionNumber must be increased even in collect-only mode (gilbertchen)a06d925— Remove 'incomplete_files' in deleteIncompleteSnapshot() (gilbertchen)69f5d2f— Don't save the incomplete snapshot for a dry run (gilbertchen)d182708— Fix zstd level name (fast -> fastest) (gilbertchen)f8a0964— Save the list of verified chunks every 5 minutes. (gilbertchen)b659456— Don't add corrupt chunks to verified_chunks (gilbertchen)
🔒Security observations
- High · Outdated Go Version —
go.mod. The project targets Go 1.19, which has reached end-of-life status. Go 1.19 no longer receives security updates. Multiple known vulnerabilities may exist in the runtime and standard library. Fix: Upgrade to the latest stable Go version (1.21+) to receive current security patches and updates. - High · Multiple Outdated and Vulnerable Dependencies —
go.mod and go.sum. Several dependencies have known security vulnerabilities or are significantly outdated: golang.org/x/crypto (v0.12.0), golang.org/x/net (v0.10.0), golang.org/x/oauth2 (2020 version), github.com/golang/protobuf (v1.5.0 - deprecated), github.com/dgrijalva/jwt-go (deprecated, should use golang-jwt), github.com/aws/aws-sdk-go (v1.30.7 from 2020). Fix: Update all dependencies to their latest versions: golang.org/x/crypto to latest, golang.org/x/net to latest, golang.org/x/oauth2 to current version, migrate from deprecated jwt-go to github.com/golang-jwt/jwt, update AWS SDK to v1.x latest or v2.x, update protobuf to use google.golang.org/protobuf. - High · Use of Deprecated JWT Library —
go.mod. Dependency on github.com/dgrijalva/jwt-go (v3.2.0) which is no longer maintained. This library has known security issues and the maintainer recommends switching to golang-jwt/jwt. Fix: Replace github.com/dgrijalva/jwt-go with github.com/golang-jwt/jwt/v5 and update all imports accordingly. - Medium · Insecure Keyring Storage on Windows —
src/duplicacy_keyring.go and src/duplicacy_keyring_windows.go. The presence of duplicacy_keyring_windows.go alongside platform-specific keyring implementations suggests credential storage functionality. Windows keyring implementations may have security implications if not properly isolating credentials. Fix: Audit the keyring implementation to ensure credentials are properly encrypted, never logged, and stored in secure OS-provided keyrings. Verify no credentials are cached in memory longer than necessary. - Medium · Multiple Cloud Storage Integrations Without Visible OAuth Validation —
src/duplicacy_*storage.go and src/duplicacy_*client.go files. The codebase includes integrations with numerous cloud providers (AWS S3, Azure, Google Cloud, Dropbox, OneDrive, etc.). Without comprehensive code review, there's risk of improper OAuth token handling, token expiration management, or credential exposure. Fix: Conduct security audit of all OAuth implementations; ensure tokens are never logged, properly refreshed, stored securely, and validated before use. Implement token expiration monitoring. - Medium · No Visible Input Validation Framework —
src/ directory broadly. Large codebase with multiple file paths and user inputs (backup sources, storage paths, etc.) but no evidence of centralized input validation or sanitization framework visible in file structure. Fix: Implement comprehensive input validation for all user-supplied paths and parameters; use allowlists where possible; validate file paths to prevent directory traversal attacks. - Medium · Network Communications - TLS Configuration Audit Needed —
src/duplicacy_*storage.go, src/duplicacy_webdavstorage.go, src/duplicacy_sftpstorage.go. Multiple storage backends communicate with cloud services and remote servers (SFTP, WebDAV, etc.). Without visible security configuration files, TLS/SSL certificate validation and cipher suite selection cannot be verified. Fix: Audit all network communication to ensure: TLS 1.2+ is enforced, certificate validation is enabled, strong cipher suites are configured, and hostname verification is implemented. Disable any legacy protocols. - Medium · Potential Cryptographic Implementation Risks —
undefined. Custom cryptographic implementations visible (duplicacy_chunk.go with encryption logic). Using golang.org/x/crypto (v0.12.0) which is outdated. Custom crypto is high-risk and Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.