evilsocket/cake
Distributed inference for mobile, desktop and server.
Single-maintainer risk — review before adopting
weakest axisnon-standard license (Other); top contributor handles 95% of recent commits
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 2w ago
- ✓3 active contributors
- ✓Other licensed
Show all 8 evidence items →Show less
- ✓CI configured
- ✓Tests present
- ⚠Small team — 3 contributors active in recent commits
- ⚠Single-maintainer risk — top contributor 95% of recent commits
- ⚠Non-standard license (Other) — review terms
What would change the summary?
- →Use as dependency Concerns → Mixed if: clarify license terms
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Forkable" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/evilsocket/cake)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/evilsocket/cake on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: evilsocket/cake
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/evilsocket/cake shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
WAIT — Single-maintainer risk — review before adopting
- Last commit 2w ago
- 3 active contributors
- Other licensed
- CI configured
- Tests present
- ⚠ Small team — 3 contributors active in recent commits
- ⚠ Single-maintainer risk — top contributor 95% of recent commits
- ⚠ Non-standard license (Other) — review terms
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live evilsocket/cake
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/evilsocket/cake.
What it runs against: a local clone of evilsocket/cake — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in evilsocket/cake | Confirms the artifact applies here, not a fork |
| 2 | License is still Other | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | Last commit ≤ 44 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of evilsocket/cake. If you don't
# have one yet, run these first:
#
# git clone https://github.com/evilsocket/cake.git
# cd cake
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of evilsocket/cake and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "evilsocket/cake(\\.git)?\\b" \\
&& ok "origin remote is evilsocket/cake" \\
|| miss "origin remote is not evilsocket/cake (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
&& ok "license is Other" \\
|| miss "license drift — was Other at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 44 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~14d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/evilsocket/cake"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Cake is a distributed multimodal AI inference server written in Rust that partitions large language models and diffusion models across heterogeneous clusters (iOS, Android, macOS, Linux, Windows) to enable inference on hardware that cannot fit entire models. It supports text generation (15 model families), image generation (Stable Diffusion, FLUX), and TTS (VibeVoice), with an OpenAI-compatible REST API and auto-detection of model architectures from HuggingFace checkpoints. Workspace monorepo (Cargo.toml members: cake-core, cake-cli, cake-mobile) where cake-core houses the distributed inference engine with backend abstraction (autoresearch/backends/cpu|cuda|metal|rocm|vulkan), cake-cli provides the command-line interface and web UI, and cake-mobile contains iOS/Android bindings (Swift, Kotlin). Kernel implementations are separated into autoresearch/kernels/ (attention, fused-ops, linear-attention) for benchmark-driven optimization.
👥Who it's for
ML engineers and researchers who need to run inference on multi-modal models across resource-constrained or heterogeneous hardware setups (old phones, Steam Decks, mixed GPU/CPU clusters); developers building OpenAI-compatible inference services who want fine-grained control over sharding and backend selection (CUDA/Metal/Vulkan/CPU).
🌱Maturity & risk
Actively developed but explicitly experimental. The README states 'This is experimental code that's being actively developed and changed very quickly.' CI/CD is configured (.github/workflows/ci.yml, release.yml), but the codebase uses thin LTO and aggressive optimizations suggesting performance iteration is ongoing. Repo has structured benchmarking infrastructure (autoresearch/ directory) but is not production-hardened.
Single maintainer (evilsocket), rapidly changing APIs (workspace version pinned at 0.1.0), and experimental status mean breaking changes are likely. Dependencies span Rust/CUDA/Metal/Vulkan/Kotlin/Swift/Python—managing compatibility across this breadth is non-trivial. Last commit recency unknown from provided data, but the autoresearch/ folder suggests active experimentation rather than stable maintenance.
Active areas of work
Performance research is active: autoresearch/ contains baseline.txt and experiments.tsv for Metal and Vulkan, suggesting ongoing kernel profiling and backend optimization. Multiple platform backends (Metal, Vulkan, CUDA, ROCm, CPU) are being tuned in parallel. Model zoo is expanding (Qwen3, FLUX.1-dev FP8 mentioned in README). Web UI and TUI clients are being developed alongside core inference.
🚀Get running
git clone https://github.com/evilsocket/cake.git
cd cake
# For CUDA (Linux/NVIDIA):
cargo build --release --features cuda
# For Metal (macOS/Apple Silicon GPU):
cargo build --release --features metal
# For CPU only (portable):
cargo build --release
# Download models from HuggingFace:
cake pull evilsocket/Qwen3-0.6B
# Run the inference server:
cake server
Daily commands:
# Development build (with default CPU backend):
cargo build
# Release with specific backend:
cargo build --release --features cuda
# Run server:
cargo run --release --features cuda -- server
# Run CLI:
cargo run --release --features cuda -- pull evilsocket/Qwen3-0.6B
cake chat
# Single-node cluster:
cake server --listen 0.0.0.0:8080
# Multi-node cluster:
# Nodes auto-discover via mDNS or manual topology in config
🗺️Map of the codebase
- Cargo.toml: Workspace root defining members (cake-core, cake-cli, cake-mobile) and shared dependencies; controls feature flags for backend selection (cuda, metal, vulkan, accelerate).
- cake-core/src/: Core inference engine; contains backend abstraction, tensor operations, model loading, and distributed clustering logic—the heart of the system (inferred from workspace structure).
- autoresearch/backends/: Backend implementations and benchmarking infrastructure; each subdirectory (cpu, cuda, metal, rocm, vulkan) contains prepare.sh, benchmark.sh, and program.md defining the backend contract and optimization opportunities.
- autoresearch/kernels/: Performance-critical kernel implementations (attention, fused-ops, linear-attention) with benchmarks; used to drive optimization across backends.
- .github/workflows/ci.yml: Continuous integration configuration; defines test, build, and release pipeline across multiple platforms and feature flags.
- cake-cli/src/: Command-line interface and REST API server (likely); exposes OpenAI-compatible endpoints and handles model lifecycle (pull, run, chat).
- cake-mobile/: iOS (Swift) and Android (Kotlin) FFI bindings; enables running inference on mobile devices as part of distributed clusters.
- README.md: Quick-start guide with exact build commands for each platform/backend; essential for understanding feature selection and first-time setup.
- .cargo/config.toml: Cargo workspace configuration; may contain target-specific settings, profile overrides, or build script customizations for backends.
- Dockerfile: Docker image definition for Linux/CUDA deployments; referenced in docker-compose cluster setup and release workflow.
🛠️How to make changes
Adding a backend: Implement traits in cake-core/src/backends/ following the pattern in autoresearch/backends/{cpu,cuda,metal}/ (prepare.sh, benchmark.sh, program.md). Adding a model family: Extend architecture auto-detection in cake-core's model loading (check HuggingFace config.json parsing). Optimizing kernels: Profile with autoresearch/ benchmarks (benchmark.sh in target backend), then implement in WGSL/CUDA/Metal under autoresearch/kernels/. Web UI changes: Modify HTML under the web server route (likely in cake-cli). Mobile bindings: Update Swift (iOS) or Kotlin (Android) in cake-mobile/ and regenerate FFI bindings.
🪤Traps & gotchas
- Feature flags are mandatory: Building without specifying a backend (--features cuda/metal/vulkan/accelerate) defaults to CPU-only, which may silently ignore hardware acceleration if not explicitly requested. 2. Model cache location: Models download to ~/.cache/huggingface/hub/ (HuggingFace standard), but this path is shared with other tools—ensure disk space and cache coherency if running multiple tools. 3. mDNS clustering requires network setup: Zero-config mDNS discovery may fail on networks with mDNS disabled (corporate VPN, restricted WiFi); manual topology config required. 4. Platform-specific build dependencies: Metal requires macOS SDK, CUDA requires NVIDIA toolkit, Vulkan requires Vulkan SDK—missing these silently fails. 5. Workspace resolver = 2 enforces Rust 1.64+: Older Rust versions will fail at the workspace level, not at individual crates. 6. Profile.release settings (panic='abort', thin LTO, codegen-units=8): These are aggressive optimizations for performance; if debugging, build in debug mode or adjust Cargo.toml [profile.release] settings.
💡Concepts to learn
- Tensor Sharding / Model Parallelism — Core to cake's distributed inference: splitting transformer blocks across devices requires understanding how to shard layers, attention heads, and communication patterns to minimize network overhead.
- Zero-Copy Serialization (likely using capnproto or similar) — Cake clusters must pass intermediate tensors between heterogeneous devices efficiently; zero-copy formats minimize serialization overhead in distributed inference.
- Backend Abstraction / Trait-Based Polymorphism — Cake supports CUDA, Metal, Vulkan, ROCm, CPU—architecture likely uses Rust traits to abstract kernel execution, enabling new backends without duplicating inference logic.
- mDNS (Multicast DNS) Service Discovery — Cake's zero-config clustering relies on mDNS to auto-discover nodes without manual IP configuration; understanding mDNS helps debug clustering failures.
- Quantization (FP8, INT8, etc.) — Models like 'evilsocket/flux1-dev FP8' use reduced precision to fit on mobile/edge hardware; understanding quantization trade-offs (speed vs. accuracy) is essential for model selection.
- Async/Await & Tokio Runtime — Cake is a networked, multi-device system; Rust async/await (likely tokio) enables efficient handling of distributed inference pipelines without thread overhead.
- Kernel Specialization (Fused Operations) — autoresearch/kernels/fused-ops indicates that cake composes multiple operations (e.g., attention + softmax + linear) into single GPU kernels; reduces memory bandwidth and latency.
🔗Related repos
ggerganov/llama.cpp— Gold-standard CPU/GPU inference engine for LLMs with quantization support; cake is a distributed, multi-platform alternative with clustering.vllm-project/vllm— High-throughput LLM serving framework with tensor parallelism; cake differs by targeting heterogeneous edge clusters rather than datacenter GPUs.huggingface/safetensors— Model serialization format that cake uses to load checkpoints from HuggingFace; critical dependency for model compatibility.tinygrad/tinygrad— Tiny neural network framework with multi-backend support (CPU, GPU, TPU); philosophical alternative emphasizing simplicity vs. cake's distributed focus.gpt4all-org/gpt4all— Desktop/mobile inference for quantized LLMs; cake overlaps in mobile targets (iOS/Android) but extends to distributed multi-device clusters.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive benchmarking CI pipeline for autoresearch experiments
The repo has extensive autoresearch infrastructure across 9+ backend/kernel/model directories (CPU, CUDA, Metal, Vulkan, ROCm, etc.), each with benchmark.sh, prepare.sh, and program.md files, but no GitHub Actions workflow to run and track these benchmarks. A new contributor could create a .github/workflows/benchmarks.yml that orchestrates benchmark runs across different backends, tracks performance regressions, and generates comparison reports. This would prevent performance degradation and provide visibility into optimization efforts.
- [ ] Create .github/workflows/benchmarks.yml with matrix strategy for backends (CPU, CUDA, Metal, Vulkan, ROCm)
- [ ] Parse benchmark outputs from autoresearch/backends/*/benchmark.sh scripts
- [ ] Store baseline results similar to existing autoresearch/backends/metal/baseline.txt and autoresearch/backends/vulkan/baseline.txt
- [ ] Add GitHub Actions job to compare current results against baselines and comment on PRs
- [ ] Document in CLAUDE.md or new docs/benchmarking.md how to interpret results
Add integration tests for multi-device distributed inference scenarios
The core value proposition of cake is 'shard models across a heterogeneous cluster of devices' (iOS, Android, macOS, Linux, Windows), but there are no visible integration tests validating this cross-platform sharding capability. A new contributor could create integration tests in a new tests/ directory that mock distributed inference scenarios, testing model splitting, device communication, and result aggregation across simulated heterogeneous devices.
- [ ] Create tests/ directory with integration test structure
- [ ] Add tests for cake-core model sharding logic with mock devices
- [ ] Create scenario tests: single-node inference, 2-node CPU+GPU split, multi-device cluster simulation
- [ ] Add tests for result aggregation and consistency validation across device boundaries
- [ ] Update Cargo.toml with [dev-dependencies] and integration test configuration if needed
Document backend-specific setup and limitations in autoresearch/backends/*/README.md files
Each backend directory (CPU, CUDA, Metal, Vulkan, ROCm, inference-primitives) has prepare.sh, benchmark.sh, and program.md, but no README.md explaining prerequisites, system requirements, expected hardware, or known limitations. This creates friction for contributors trying to run benchmarks on their hardware. A new contributor could create backend-specific documentation that clarifies setup complexity and hardware requirements.
- [ ] Create autoresearch/backends/cpu/README.md documenting CPU requirements, BLAS library setup (if needed), expected performance baseline
- [ ] Create autoresearch/backends/cuda/README.md documenting CUDA version requirements, GPU compatibility, and ROCm/inference-primitives differences
- [ ] Create autoresearch/backends/metal/README.md documenting macOS/iOS device requirements and Metal version constraints
- [ ] Create autoresearch/backends/vulkan/README.md documenting cross-platform driver requirements and hardware support matrix
- [ ] Add troubleshooting sections referencing baseline.txt and experiments.tsv files where they exist
🌿Good first issues
- Add documentation and examples for custom kernel implementations: autoresearch/kernels/ has attention, fused-ops, and linear-attention with benchmark.sh scripts, but no tutorial for contributors on how to add a new kernel variant (e.g., grouped-query-attention). Write a KERNEL_DEVELOPMENT.md guide with a working example.
- Expand model compatibility matrix: README lists '15 text model families' and '6 image model variants' but doesn't enumerate them or document which models have been tested on which backends (CUDA/Metal/Vulkan/CPU). Create docs/model_compatibility.md with a table and known issues per model-backend pair.
- Add integration tests for mDNS clustering: cake supports zero-config clustering but autoresearch/ and test structure (inferred from cargo workspace) don't show explicit clustering tests. Write tests/clustering_mdns_test.rs that spawn 2-3 local nodes and verify distributed tensor sharding across them.
⭐Top contributors
Click to expand
Top contributors
- @evilsocket — 95 commits
- @Copilot — 4 commits
- @aupadhyay — 1 commits
📝Recent commits
Click to expand
Recent commits
be522af— Merge pull request #84 from evilsocket/copilot/verify-debug-issue-79 (evilsocket)0f1eebb— fix: replace posix_madvise/POSIX_MADV_WILLNEED in disk_expert_provider.rs for Android compat (Copilot)e58cb3a— fix: resolve pre-existing CI failures - madvise on Android, missing files field on Windows (Copilot)6c8a4bc— fix: address code review - Option-typed last_err, document CONNECT_TIMEOUT value (Copilot)8881bc5— fix: iOS TCP connection failures (issue #79) - retry logic, spawn_blocking, UIBackgroundModes (Copilot)3870042— Merge pull request #78 from aupadhyay/fix/metal-coregraphics-linkage (evilsocket)b9cf69d— fix: link CoreGraphics framework for Metal device detection (aupadhyay)1e0bec4— feat: add layer_devices field to Context for future multi-GPU pipeline parallelism (evilsocket)874db56— flash-moe: parallel expert warmup + dequant optimization — 8× faster loading (evilsocket)7367dc9— fix: increase chat model loading timeout to 10 min (MoE expert pre-warming) (evilsocket)
🔒Security observations
- High · Insecure Docker Base Image Configuration —
Dockerfile (Stage 1: Chef). The Dockerfile uses nvidia/cuda:12.6.0-devel-ubuntu24.04 as a base, which is a development image. Development images are larger, contain unnecessary tools, and have a larger attack surface. Additionally, the curl installation without verification and the incomplete snippet suggest potential issues with package validation. Fix: Use nvidia/cuda:12.6.0-runtime-ubuntu24.04 instead of devel variant for production. Implement multi-stage builds to separate build-time and runtime dependencies. Verify package checksums and use specific package versions instead of latest. - High · Missing Security Headers and Network Exposure —
Dockerfile (run command) and docker-compose.yml. The Docker Compose configuration exposes the API on 0.0.0.0:8080 without any reverse proxy, TLS termination, or authentication mechanism. The topology file allows arbitrary layer distribution across workers without access control validation. Fix: Implement a reverse proxy (nginx/traefik) with TLS termination. Add authentication (API keys, mTLS) for worker-to-master communication. Use 127.0.0.1 for local development and implement network policies for cluster environments. - High · Unencrypted Inter-Node Communication —
docker-compose.yml topology configuration. The docker-compose.yml and topology configuration show worker nodes communicating on port 10128 without indication of TLS/encryption. Distributed inference workloads may transmit sensitive model data and intermediate computations unencrypted across the network. Fix: Implement mandatory TLS/mTLS for all inter-node communication. Use certificate pinning for worker authentication. Encrypt model transfer with AES-256-GCM or similar. - Medium · Incomplete Docker Build Configuration —
Dockerfile (Chef stage, curl installation). The Dockerfile snippet is incomplete (curl command is truncated at '--tlsv1.'). This incomplete state suggests the file may have been improperly reviewed or could contain security misconfigurations. Fix: Complete the Dockerfile configuration. Implement explicit version pinning for Rust toolchain (via rust-toolchain.toml). Validate all downloaded artifacts with checksums. - Medium · Missing Input Validation on Topology Configuration —
docker-compose.yml (topology file handling). The topology-docker.yml file allows arbitrary layer assignments and host specifications. There is no apparent validation of layer ranges, host accessibility, or resource constraints, allowing potential DoS or model poisoning attacks. Fix: Implement schema validation for topology files using JSON Schema or similar. Validate layer ranges against model architecture. Implement resource quotas and rate limiting per worker node. - Medium · Debug Symbols Retained in Release Build —
Cargo.toml [profile.release]. The Cargo.toml release profile setsstrip = false, retaining debug symbols. While useful for profiling, this increases binary size and may leak implementation details in a distributed environment. Fix: Setstrip = truefor production releases. Implement separate debug and release profiles. Use split-debuginfo feature for development while keeping production binaries stripped. - Medium · Unsafe LTO Configuration —
Cargo.toml [profile.release]. The release profile useslto = 'thin'which provides reduced optimization. While faster to compile, this may result in less optimized binaries with potential performance-related security implications for inference workloads. Fix: For production builds, uselto = 'fat'orlto = 'true'withcodegen-units = 1to maximize security through code optimization and reduce exploitable code patterns. - Medium · Missing SBOM and Dependency Verification —
Cargo.toml (workspace.dependencies). The workspace depends on multiple external crates (implied by the features like cuda, metal, vulkan) with no visible dependency auditing, lock file security verification, or Software Bill of Materials (SBOM) generation. Fix: Implementcargo-auditin CI/CD pipeline. Generate SBOM usingcargo-sbomor cyclonedx. Pin dependencies with exact versions in Cargo.lock. Usecargo-denyfor policy enforcement. - undefined · undefined —
undefined. undefined Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.