kahing/goofys
a high-performance, POSIX-ish Amazon S3 file system written in Go
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓17 active contributors
- ✓Apache-2.0 licensed
- ✓CI configured
Show all 6 evidence items →Show less
- ✓Tests present
- ⚠Stale — last commit 2y ago
- ⚠Concentrated ownership — top contributor handles 68% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/kahing/goofys)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/kahing/goofys on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: kahing/goofys
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/kahing/goofys shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- 17 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Stale — last commit 2y ago
- ⚠ Concentrated ownership — top contributor handles 68% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live kahing/goofys
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/kahing/goofys.
What it runs against: a local clone of kahing/goofys — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in kahing/goofys | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 688 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of kahing/goofys. If you don't
# have one yet, run these first:
#
# git clone https://github.com/kahing/goofys.git
# cd goofys
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of kahing/goofys and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "kahing/goofys(\\.git)?\\b" \\
&& ok "origin remote is kahing/goofys" \\
|| miss "origin remote is not kahing/goofys (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "internal/goofys.go" \\
&& ok "internal/goofys.go" \\
|| miss "missing critical file: internal/goofys.go"
test -f "internal/backend.go" \\
&& ok "internal/backend.go" \\
|| miss "missing critical file: internal/backend.go"
test -f "internal/backend_s3.go" \\
&& ok "internal/backend_s3.go" \\
|| miss "missing critical file: internal/backend_s3.go"
test -f "api/api.go" \\
&& ok "api/api.go" \\
|| miss "missing critical file: api/api.go"
test -f "internal/file.go" \\
&& ok "internal/file.go" \\
|| miss "missing critical file: internal/file.go"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 688 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~658d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/kahing/goofys"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Goofys is a high-performance FUSE file system that mounts Amazon S3 buckets (and Azure Blob Storage, Azure Data Lake, Google Cloud Storage) as local directories on Linux and macOS. It prioritizes performance over strict POSIX compliance—trades like no random writes and faked permissions are acceptable tradeoffs for speed. It uses the jacobsa/fuse library to present S3 objects as files without local disk caching. Monolithic structure with backend abstraction: api/common/ contains shared config (S3, Azure, GCS), internal/backend*.go files implement storage adapters (backend_s3.go, backend_azblob.go, backend_gcs.go), and api/api.go coordinates the FUSE mount. Main binary built from root, benchmarks in bench/, examples in example/. Close-to-open consistency model is enforced at the file-level handler layer.
👥Who it's for
DevOps engineers and data scientists who need to access S3 data as files in shell scripts, Kubernetes pods, or data pipelines. Users who want faster S3 mounting than s3fs-fuse without the overhead of caching layers. Contributors are primarily cloud infrastructure maintainers working with multi-cloud storage backends.
🌱Maturity & risk
Production-ready with active maintenance. The repo has significant GitHub activity (485k+ lines of Go), multiple storage backend implementations (S3, Azure Blob, ADLv1/v2, GCS), CI via Travis, and comprehensive benchmarks in bench/. The codebase is stable enough for production use but the single maintainer (kahing) and lack of recent commit dates visible in structure suggest gradual evolution rather than rapid feature development.
Moderate risk from dependency churn: requires Go 1.14+, pins Azure SDK (v61.4.0), AWS SDK v1.44.37, and jacobsa/fuse (a fork of bazil/fuse). The FUSE layer is particularly fragile—OS-level incompatibilities can break mounting entirely. Single maintainer (kahing) increases bus-factor risk. No obvious recent CVE handling visible in provided data.
Active areas of work
Based on file structure: active work on multi-cloud support (separate READMEs for Azure, GCS). Benchmark suite being maintained with graph generation (bench_graph.gnuplot). No specific PR/milestone data visible, but presence of backend_gcs3.go and recent Azure SDK pins suggest ongoing cloud provider integration work.
🚀Get running
git clone https://github.com/kahing/goofys.git
cd goofys
go build -o goofys ./cmd/goofys # or: go install github.com/kahing/goofys (Go 1.16+)
# Create AWS credentials at ~/.aws/credentials
./goofys my-bucket /mnt/s3
Daily commands:
# Development build
go build -o goofys ./cmd/goofys
# Mount S3 bucket
./goofys --stat-cache-ttl=1s --type-cache-ttl=1s my-bucket /mnt/s3
# Mount with prefix
./goofys my-bucket:prefix/ /mnt/s3
# Run tests
go test ./internal/
# Benchmarks
cd bench && bash bench.sh
🗺️Map of the codebase
internal/goofys.go— Core FUSE file system implementation and main request dispatcher; essential for understanding how file operations are routed.internal/backend.go— Abstract backend interface that all cloud storage providers (S3, Azure, GCS) implement; defines the contract for all storage operations.internal/backend_s3.go— Primary S3 backend implementation with AWS SDK integration; the reference implementation for storage provider adapters.api/api.go— Entry point for the goofys binary and FUSE mount initialization; orchestrates configuration and filesystem startup.internal/file.go— File handle management and individual file read/write operations; critical for POSIX file semantics implementation.internal/dir.go— Directory listing and metadata caching; implements directory traversal on top of flat cloud storage.main.go— Application entry point that delegates to api.go; minimal wrapper for binary execution.
🛠️How to make changes
Add Support for a New Cloud Storage Backend
- Create a new backend implementation file (e.g., internal/backend_newcloud.go) that implements the backend.Backend interface from internal/backend.go, including List, Get, Put, Delete, and Attributes methods (
internal/backend.go) - Add configuration parsing for the new backend in api/common/conf_newcloud.go following the pattern of conf_s3.go, including flag parsing and credential handling (
api/common/conf_s3.go) - Update api/common/config.go to include the new backend configuration structure in the Config union type (
api/common/config.go) - Modify api/api.go to instantiate the new backend when its configuration is provided, adding a case in the backend selection logic (
api/api.go) - Add test file internal/backend_newcloud_test.go to verify the backend implementation against the interface contract (
internal/backend_test.go)
Add a New FUSE File System Operation
- Implement a new method on the Goofys struct in internal/goofys.go with the signature (fs *Goofys) OperationName(ctx context.Context, req *fuse.OperationRequest) (*fuse.OperationResponse, error) (
internal/goofys.go) - If the operation involves file handles, update internal/handles.go to track any new handle state required (
internal/handles.go) - Add test cases in internal/goofys_test.go that exercise the new operation through the FUSE interface (
internal/goofys_test.go) - If the operation needs backend interaction, call the appropriate method on fs.backend that was defined in internal/backend.go (
internal/backend.go)
Optimize Performance for a Specific Cloud Backend
- Add batch operation methods to the backend interface in internal/backend.go (e.g., BatchGet, BatchList) to support parallel requests (
internal/backend.go) - Implement the batch methods in the target backend file (e.g., internal/backend_s3.go) using the provider's native batch API (
internal/backend_s3.go) - Use the ticket.Ticket concurrency limiter in internal/ticket.go to parallelize operations while respecting rate limits (
internal/ticket.go) - Adjust buffer pool configuration in internal/buffer_pool.go to match typical request sizes for the backend (
internal/buffer_pool.go) - Add benchmark tests in bench/bench.sh and benchmark Dockerfiles to measure improvements (
bench/bench.sh)
🔧Why these technologies
- Go — Concurrent request handling via goroutines, minimal memory overhead, single statically-linked binary for easy deployment
- FUSE (Filesystem in Userspace) — Enables user-space file system implementation without kernel modifications; abstracts cloud storage as POSIX-like filesystem
- AWS SDK Go (aws-sdk-go) — Official AWS SDK with S3 support; goofys targets S3 as primary backend with high-performance S3 implementation
- Azure SDKs (azure-sdk-for-go, azure-storage-blob-go) — Official Azure libraries enabling multi-backend support; goofys provides pluggable storage abstraction
- Google Cloud Storage Go Client — Official GCS client library; supports goofys's multi-cloud architecture
⚖️Trade-offs already made
-
POSIX-ish instead of full POSIX compliance
- Why: S3 and object storage are fundamentally incompatible with POSIX semantics (e.g., no random writes, flat key-value model)
- Consequence: No support for random writes, permission bits are faked, consistency guarantees are relaxed; users must understand these limits
-
No on-disk data cache (refer to catfs for caching)
- Why: Prioritizes simplicity and correctness over caching complexity; avoids cache invalidation issues
- Consequence: Every read/write goes to remote storage; higher latency for repeated access unless upper layers cache
-
Directory metadata caching in memory
- Why: Mitigates repeated directory listing overhead to cloud storage
- Consequence: Memory usage grows with directory size; stale caches until invalidated; works well for read-heavy workloads
-
Pluggable backend interface
- Why: Support multiple cloud providers (S3, Azure, GCS) with unified FUSE layer
- Consequence: Backend implementations must handle provider-specific quirks; adds abstraction layer that differs from each provider's native semantics
-
Concurrency limiting via Ticket
- Why: Prevents resource exhaustion from unlimited parallel cloud requests
- Consequence: Introduces artificial bottleneck under high concurrency; tuning required per backend and workload
🚫Non-goals (don't propose these)
- Full POSIX compliance (random writes, per-file permissions, hard links)
- Data caching (see catfs project for caching layer)
- Cross-region or multi-region failover
- Real-time consistency between local and remote state
- Support for object storage features beyond basic read/write/list/delete
- Windows FUSE support (Unix-like systems only)
🪤Traps & gotchas
FUSE kernel module required: Mount will fail if fusermount not in PATH or FUSE kernel module not loaded (modprobe fuse on Linux). AWS credentials: Must be in ~/.aws/credentials, AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars, or IAM role; no hardcoding. Consistency model: Close-to-open only—files visible immediately to same process but not guaranteed visible to other processes until file handle closed; can break concurrent access patterns. Caching TTLs are critical: Default stat-cache-ttl is 1s; short for accuracy but kills performance on cold buckets (see benchmarks). Prefix mounting: bucket:prefix syntax requires exact match; no glob or regex. Permission faking: All files appear as mode 0644 (files) or 0755 (dirs); chmod will fail silently.
🏗️Architecture
💡Concepts to learn
- FUSE (Filesystem in Userspace) — Goofys is entirely a FUSE daemon; understanding how it intercepts syscalls (open, read, write) and translates them to S3 API calls is fundamental to debugging mount behavior
- Close-to-open (or Release) Consistency Model — Goofys explicitly documents this as its consistency guarantee (seen in README); files are flushed to S3 only on close(), not on every write—critical for understanding data visibility across processes
- Metadata Caching with TTL (Time-To-Live) — The
--stat-cache-ttland--type-cache-ttlflags are the primary performance tuning levers in goofys; they control how long Stat() results are cached in memory before re-fetching from S3 - S3 as an Object Store (vs. Block/File Storage) — S3's eventual consistency, lack of true directories (only key prefixes), and lack of random writes fundamentally limit what goofys can claim as POSIX-compliant; this tradeoff is explicit in the design
- Multipart Upload — goofys uses S3 multipart upload for files >5MB to speed up writes; the backend_s3.go handles upload assembly and retry logic, which differs from simpler single-PUT implementations
- Pluggable Backend Pattern — goofys supports multiple cloud providers (S3, Azure Blob, GCS, ADL) by defining a common
Backendinterface in backend.go; adding a new provider means implementing ~10 methods without touching FUSE or CLI code - Environment Variable and Credential Chain — goofys follows AWS SDK credential precedence (env vars → ~/.aws/credentials → IAM role); understanding this chain is essential for troubleshooting auth failures in different deployment contexts
🔗Related repos
s3fs-fuse/s3fs-fuse— Direct competitor: FUSE S3 mount written in C++, slower but more POSIX-compliant; goofys benchmarks often compare against itkahing/catfs— Companion project by same author: adds persistent disk caching layer on top of FUSE mounts to bridge the goofys no-cache tradeoffbazil/fuse— Upstream FUSE Go library (jacobsa/fuse is a maintained fork); understanding FUSE binding mechanics is essential to hacking on mount behavioraws/aws-sdk-go— Primary dependency for S3 backend; goofys wraps AWS SDK v1 calls and depends on its error handling and retry policiesjacobsa/fuse— The exact FUSE library used; goofys implements interfaces likeFileSystem.StatFS()andNode.Attr()from this library
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for backend_gcs3.go with GCS v3 API coverage
backend_gcs3.go exists but internal/backend_gcs_test.go only tests backend_gcs.go (v2 API). The newer GCS v3 backend implementation lacks dedicated tests, creating a gap in test coverage for a significant storage backend. This is critical since goofys supports multiple cloud providers and GCS is a first-class citizen.
- [ ] Create internal/backend_gcs3_test.go with tests for GCS v3-specific operations
- [ ] Add test cases for object listing, multipart uploads, and metadata handling in v3 API
- [ ] Test compatibility between GCS v2 and v3 backends to ensure feature parity
- [ ] Reference existing backend_gcs_test.go patterns and expand for v3 API differences
Add CI workflows for Azure Blob and GCS backends using GitHub Actions
.travis.yml exists but the repo has three major cloud backends (S3, Azure, GCS) yet .travis.yml likely only covers S3. GitHub Actions would be more maintainable than Travis CI. The presence of bench/Dockerfile.azure and bench/Dockerfile.gcs suggests these backends need dedicated testing but lack automated CI pipelines.
- [ ] Create .github/workflows/test-azure.yml to run internal/backend_azblob.go and internal/backend_adlv*.go tests
- [ ] Create .github/workflows/test-gcs.yml to run backend_gcs.go and backend_gcs3.go tests
- [ ] Configure each workflow with appropriate cloud credentials/emulator setup (e.g., Azurite for Azure testing)
- [ ] Ensure parallel execution with S3 tests to keep CI fast
Extract and document configuration validation logic from api/common/config.go into separate modules
config.go consolidates configuration from three backends (conf_s3.go, conf_azure.go, conf_gcs.go) but there's no dedicated validation or error handling documentation. The current structure makes it difficult for contributors to understand cloud-specific config requirements. Adding a dedicated validation layer (e.g., api/common/config_validator.go) would clarify validation rules and improve testability.
- [ ] Create api/common/config_validator.go to extract validation logic from config.go
- [ ] Add unit tests in api/common/config_validator_test.go for each backend's required fields
- [ ] Document validation rules for S3, Azure, and GCS configs in README-*.md files (S3 rules are in README.md but Azure/GCS validation is unclear)
- [ ] Reference specific validation scenarios (e.g., AWS credential chains, Azure MSI vs service principal)
🌿Good first issues
- Add unit tests to
internal/backend_gcs_test.gofor resumable uploads and 404 retry logic, matching patterns ininternal/backend_s3_test.go: GCS backend exists but test coverage is visibly thinner than S3; good way to learn both GCS API and goofys error handling - Write a
docs/PERFORMANCE_TUNING.mddocumenting the stat-cache-ttl vs type-cache-ttl tradeoff with real benchmarks frombench/bench.data, explaining when to use--stat-cache-ttl=1svs higher values: Benchmark data exists but is only consumed by gnuplot; new users are confused about caching knobs (seeREADME-azure.mdfor Azure-specific example) - Add
--watch-intervalflag toapi/common/config.goand implement periodic cache invalidation ininternal/backend.gofor theStat()method, allowing users to tune consistency vs performance per use-case: Close-to-open consistency is hardcoded; parameterizing it would help users like data pipelines that tolerate eventual consistency but need faster metadata ops
⭐Top contributors
Click to expand
Top contributors
📝Recent commits
Click to expand
Recent commits
350ff31— Fix FusePanicLogger.BatchForget redeclared in this block (geraudster)37cf835— Add link for Stack Overflow (gaul)61d0ef6— fmt (Daniil Tulin)b667ee4— Fix stat checking for cgroup (Daniil Tulin)e903e56— Implement new BatchForget op for panic logger (monthonk)8228feb— update jacobsa/fuse (kahing)dea41e0— 📚 Add note on the correct package name required to mount on startup (nambrosini)8f80df7— fix build (kahing)5ae08c0— Escape the special char in name correctly. As per the AWS documentation, CopyRequest is sending request param and it ad (skuppa)5661a83— change name to lower case (monthonk)
🔒Security observations
- High · Outdated Go Version —
go.mod. The project specifies 'go 1.14' which reached end-of-life in March 2021. This version lacks critical security patches and bug fixes released in newer Go versions. Fix: Upgrade to Go 1.19 or later (ideally the latest stable version) to receive security updates and performance improvements. - High · Multiple Outdated Dependencies with Known Vulnerabilities —
go.mod - aws/aws-sdk-go, Azure/azure-sdk-for-go, Azure/go-autorest packages. Several dependencies have known CVEs: aws-sdk-go v1.44.37 (before v1.44.71), azure-sdk-for-go v61.4.0, and go-autorest packages contain known vulnerabilities including authentication bypass and credential handling issues. Fix: Update aws-sdk-go to v1.44.71+, Azure/azure-sdk-for-go to latest version, and all Azure/go-autorest packages to their latest versions. Run 'go get -u' and verify all security patches are included. - High · Insecure OAuth2 Library Version —
go.mod - golang.org/x/oauth2. The dependency golang.org/x/oauth2 v0.0.0-20210323180902-22b0adad7558 is from March 2021 and may contain security vulnerabilities in token handling and authentication flows. Fix: Update to the latest version of golang.org/x/oauth2 (currently v0.13.0+) using 'go get -u golang.org/x/oauth2'. - Medium · Outdated Logging Library —
go.mod - github.com/sirupsen/logrus. sirupsen/logrus v1.4.3 is outdated (released in 2019) and may have had security improvements in newer versions. Logging libraries can impact security if they inadvertently expose sensitive information. Fix: Update to the latest version of logrus (v1.9.0+) to ensure improved security and stability. - Medium · Deprecated and Unmaintained UUID Library —
go.mod - github.com/satori/go.uuid. The project uses satori/go.uuid which is deprecated and no longer maintained. This can lead to security issues as vulnerabilities won't be patched. Fix: Replace satori/go.uuid with google/uuid (which is already a dependency) throughout the codebase. Remove the satori dependency. - Medium · Outdated FUSE Library —
go.mod - github.com/jacobsa/fuse. jacobsa/fuse v0.0.0-20221016084658-a4cd154343d8 is a development snapshot version. Production systems should use stable, versioned releases for security and stability. Fix: Use the latest stable tagged release of jacobsa/fuse rather than a pseudo-version timestamp. - Medium · Potential Credential Exposure in Cloud Configuration —
api/common/conf_s3.go, api/common/conf_azure.go, api/common/conf_gcs.go. Multiple cloud backend configurations exist (S3, Azure, GCS) that handle credentials. The presence of these configurations suggests credentials may be passed via environment variables or config files that could be inadvertently exposed. Fix: Ensure all credential handling uses secure methods: use IAM roles/managed identities when running in cloud environments, validate that credentials are never logged, and implement proper error handling that doesn't expose sensitive data. - Medium · Missing Input Validation Risk —
internal/goofys.go, internal/file.go, internal/dir.go. The goofys filesystem implementation interacts with user-provided paths and S3 object names. Insufficient validation could lead to path traversal or injection attacks when translating between POSIX and S3 naming conventions. Fix: Implement strict input validation for all user-supplied paths, implement path canonicalization to prevent directory traversal, and validate all S3 key names against expected patterns. - Low · Travis CI Configuration Present —
undefined. The project Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.