evilsocket/cake

Item: evilsocket/cake
Rating: 3
Author: RepoPilot

Distributed inference for mobile, desktop and server.

Mixed

Single-maintainer risk — review before adopting

weakest axis

Use as dependencyConcerns

non-standard license (Other); top contributor handles 95% of recent commits

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 2w ago
✓3 active contributors
✓Other licensed

Show all 8 evidence items →

✓CI configured
✓Tests present
⚠Small team — 3 contributors active in recent commits
⚠Single-maintainer risk — top contributor 95% of recent commits
⚠Non-standard license (Other) — review terms

What would change the summary?

→Use as dependency Concerns → Mixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/evilsocket/cake?axis=fork)](https://repopilot.app/r/evilsocket/cake)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/evilsocket/cake on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: evilsocket/cake

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/evilsocket/cake shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Single-maintainer risk — review before adopting

Last commit 2w ago
3 active contributors
Other licensed
CI configured
Tests present
⚠ Small team — 3 contributors active in recent commits
⚠ Single-maintainer risk — top contributor 95% of recent commits
⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live evilsocket/cake repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/evilsocket/cake.

What it runs against: a local clone of evilsocket/cake — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in evilsocket/cake | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | Last commit ≤ 44 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>evilsocket/cake</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of evilsocket/cake. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/evilsocket/cake.git
#   cd cake
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of evilsocket/cake and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "evilsocket/cake(\\.git)?\\b" \\
  && ok "origin remote is evilsocket/cake" \\
  || miss "origin remote is not evilsocket/cake (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 44 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~14d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/evilsocket/cake"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Cake is a distributed multimodal AI inference server written in Rust that partitions large language models and diffusion models across heterogeneous clusters (iOS, Android, macOS, Linux, Windows) to enable inference on hardware that cannot fit entire models. It supports text generation (15 model families), image generation (Stable Diffusion, FLUX), and TTS (VibeVoice), with an OpenAI-compatible REST API and auto-detection of model architectures from HuggingFace checkpoints. Workspace monorepo (Cargo.toml members: cake-core, cake-cli, cake-mobile) where cake-core houses the distributed inference engine with backend abstraction (autoresearch/backends/cpu|cuda|metal|rocm|vulkan), cake-cli provides the command-line interface and web UI, and cake-mobile contains iOS/Android bindings (Swift, Kotlin). Kernel implementations are separated into autoresearch/kernels/ (attention, fused-ops, linear-attention) for benchmark-driven optimization.

👥Who it's for

ML engineers and researchers who need to run inference on multi-modal models across resource-constrained or heterogeneous hardware setups (old phones, Steam Decks, mixed GPU/CPU clusters); developers building OpenAI-compatible inference services who want fine-grained control over sharding and backend selection (CUDA/Metal/Vulkan/CPU).

🌱Maturity & risk

Actively developed but explicitly experimental. The README states 'This is experimental code that's being actively developed and changed very quickly.' CI/CD is configured (.github/workflows/ci.yml, release.yml), but the codebase uses thin LTO and aggressive optimizations suggesting performance iteration is ongoing. Repo has structured benchmarking infrastructure (autoresearch/ directory) but is not production-hardened.

Single maintainer (evilsocket), rapidly changing APIs (workspace version pinned at 0.1.0), and experimental status mean breaking changes are likely. Dependencies span Rust/CUDA/Metal/Vulkan/Kotlin/Swift/Python—managing compatibility across this breadth is non-trivial. Last commit recency unknown from provided data, but the autoresearch/ folder suggests active experimentation rather than stable maintenance.

Active areas of work

Performance research is active: autoresearch/ contains baseline.txt and experiments.tsv for Metal and Vulkan, suggesting ongoing kernel profiling and backend optimization. Multiple platform backends (Metal, Vulkan, CUDA, ROCm, CPU) are being tuned in parallel. Model zoo is expanding (Qwen3, FLUX.1-dev FP8 mentioned in README). Web UI and TUI clients are being developed alongside core inference.

🚀Get running

git clone https://github.com/evilsocket/cake.git
cd cake
# For CUDA (Linux/NVIDIA):
cargo build --release --features cuda
# For Metal (macOS/Apple Silicon GPU):
cargo build --release --features metal
# For CPU only (portable):
cargo build --release
# Download models from HuggingFace:
cake pull evilsocket/Qwen3-0.6B
# Run the inference server:
cake server

Daily commands:

# Development build (with default CPU backend):
cargo build
# Release with specific backend:
cargo build --release --features cuda
# Run server:
cargo run --release --features cuda -- server
# Run CLI:
cargo run --release --features cuda -- pull evilsocket/Qwen3-0.6B
cake chat
# Single-node cluster:
cake server --listen 0.0.0.0:8080
# Multi-node cluster:
# Nodes auto-discover via mDNS or manual topology in config

🗺️Map of the codebase

Cargo.toml: Workspace root defining members (cake-core, cake-cli, cake-mobile) and shared dependencies; controls feature flags for backend selection (cuda, metal, vulkan, accelerate).
cake-core/src/: Core inference engine; contains backend abstraction, tensor operations, model loading, and distributed clustering logic—the heart of the system (inferred from workspace structure).
autoresearch/backends/: Backend implementations and benchmarking infrastructure; each subdirectory (cpu, cuda, metal, rocm, vulkan) contains prepare.sh, benchmark.sh, and program.md defining the backend contract and optimization opportunities.
autoresearch/kernels/: Performance-critical kernel implementations (attention, fused-ops, linear-attention) with benchmarks; used to drive optimization across backends.
.github/workflows/ci.yml: Continuous integration configuration; defines test, build, and release pipeline across multiple platforms and feature flags.
cake-cli/src/: Command-line interface and REST API server (likely); exposes OpenAI-compatible endpoints and handles model lifecycle (pull, run, chat).
cake-mobile/: iOS (Swift) and Android (Kotlin) FFI bindings; enables running inference on mobile devices as part of distributed clusters.
README.md: Quick-start guide with exact build commands for each platform/backend; essential for understanding feature selection and first-time setup.
.cargo/config.toml: Cargo workspace configuration; may contain target-specific settings, profile overrides, or build script customizations for backends.
Dockerfile: Docker image definition for Linux/CUDA deployments; referenced in docker-compose cluster setup and release workflow.

🛠️How to make changes

Adding a backend: Implement traits in cake-core/src/backends/ following the pattern in autoresearch/backends/{cpu,cuda,metal}/ (prepare.sh, benchmark.sh, program.md). Adding a model family: Extend architecture auto-detection in cake-core's model loading (check HuggingFace config.json parsing). Optimizing kernels: Profile with autoresearch/ benchmarks (benchmark.sh in target backend), then implement in WGSL/CUDA/Metal under autoresearch/kernels/. Web UI changes: Modify HTML under the web server route (likely in cake-cli). Mobile bindings: Update Swift (iOS) or Kotlin (Android) in cake-mobile/ and regenerate FFI bindings.

🪤Traps & gotchas

Feature flags are mandatory: Building without specifying a backend (--features cuda/metal/vulkan/accelerate) defaults to CPU-only, which may silently ignore hardware acceleration if not explicitly requested. 2. Model cache location: Models download to ~/.cache/huggingface/hub/ (HuggingFace standard), but this path is shared with other tools—ensure disk space and cache coherency if running multiple tools. 3. mDNS clustering requires network setup: Zero-config mDNS discovery may fail on networks with mDNS disabled (corporate VPN, restricted WiFi); manual topology config required. 4. Platform-specific build dependencies: Metal requires macOS SDK, CUDA requires NVIDIA toolkit, Vulkan requires Vulkan SDK—missing these silently fails. 5. Workspace resolver = 2 enforces Rust 1.64+: Older Rust versions will fail at the workspace level, not at individual crates. 6. Profile.release settings (panic='abort', thin LTO, codegen-units=8): These are aggressive optimizations for performance; if debugging, build in debug mode or adjust Cargo.toml [profile.release] settings.

💡Concepts to learn

Tensor Sharding / Model Parallelism — Core to cake's distributed inference: splitting transformer blocks across devices requires understanding how to shard layers, attention heads, and communication patterns to minimize network overhead.
Zero-Copy Serialization (likely using capnproto or similar) — Cake clusters must pass intermediate tensors between heterogeneous devices efficiently; zero-copy formats minimize serialization overhead in distributed inference.
Backend Abstraction / Trait-Based Polymorphism — Cake supports CUDA, Metal, Vulkan, ROCm, CPU—architecture likely uses Rust traits to abstract kernel execution, enabling new backends without duplicating inference logic.
mDNS (Multicast DNS) Service Discovery — Cake's zero-config clustering relies on mDNS to auto-discover nodes without manual IP configuration; understanding mDNS helps debug clustering failures.
Quantization (FP8, INT8, etc.) — Models like 'evilsocket/flux1-dev FP8' use reduced precision to fit on mobile/edge hardware; understanding quantization trade-offs (speed vs. accuracy) is essential for model selection.
Async/Await & Tokio Runtime — Cake is a networked, multi-device system; Rust async/await (likely tokio) enables efficient handling of distributed inference pipelines without thread overhead.
Kernel Specialization (Fused Operations) — autoresearch/kernels/fused-ops indicates that cake composes multiple operations (e.g., attention + softmax + linear) into single GPU kernels; reduces memory bandwidth and latency.

ggerganov/llama.cpp — Gold-standard CPU/GPU inference engine for LLMs with quantization support; cake is a distributed, multi-platform alternative with clustering.
vllm-project/vllm — High-throughput LLM serving framework with tensor parallelism; cake differs by targeting heterogeneous edge clusters rather than datacenter GPUs.
huggingface/safetensors — Model serialization format that cake uses to load checkpoints from HuggingFace; critical dependency for model compatibility.
tinygrad/tinygrad — Tiny neural network framework with multi-backend support (CPU, GPU, TPU); philosophical alternative emphasizing simplicity vs. cake's distributed focus.
gpt4all-org/gpt4all — Desktop/mobile inference for quantized LLMs; cake overlaps in mobile targets (iOS/Android) but extends to distributed multi-device clusters.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive benchmarking CI pipeline for autoresearch experiments

The repo has extensive autoresearch infrastructure across 9+ backend/kernel/model directories (CPU, CUDA, Metal, Vulkan, ROCm, etc.), each with benchmark.sh, prepare.sh, and program.md files, but no GitHub Actions workflow to run and track these benchmarks. A new contributor could create a .github/workflows/benchmarks.yml that orchestrates benchmark runs across different backends, tracks performance regressions, and generates comparison reports. This would prevent performance degradation and provide visibility into optimization efforts.

[ ] Create .github/workflows/benchmarks.yml with matrix strategy for backends (CPU, CUDA, Metal, Vulkan, ROCm)
[ ] Parse benchmark outputs from autoresearch/backends/*/benchmark.sh scripts
[ ] Store baseline results similar to existing autoresearch/backends/metal/baseline.txt and autoresearch/backends/vulkan/baseline.txt
[ ] Add GitHub Actions job to compare current results against baselines and comment on PRs
[ ] Document in CLAUDE.md or new docs/benchmarking.md how to interpret results

Add integration tests for multi-device distributed inference scenarios

The core value proposition of cake is 'shard models across a heterogeneous cluster of devices' (iOS, Android, macOS, Linux, Windows), but there are no visible integration tests validating this cross-platform sharding capability. A new contributor could create integration tests in a new tests/ directory that mock distributed inference scenarios, testing model splitting, device communication, and result aggregation across simulated heterogeneous devices.

[ ] Create tests/ directory with integration test structure
[ ] Add tests for cake-core model sharding logic with mock devices
[ ] Create scenario tests: single-node inference, 2-node CPU+GPU split, multi-device cluster simulation
[ ] Add tests for result aggregation and consistency validation across device boundaries
[ ] Update Cargo.toml with [dev-dependencies] and integration test configuration if needed

Document backend-specific setup and limitations in autoresearch/backends/*/README.md files

Each backend directory (CPU, CUDA, Metal, Vulkan, ROCm, inference-primitives) has prepare.sh, benchmark.sh, and program.md, but no README.md explaining prerequisites, system requirements, expected hardware, or known limitations. This creates friction for contributors trying to run benchmarks on their hardware. A new contributor could create backend-specific documentation that clarifies setup complexity and hardware requirements.

[ ] Create autoresearch/backends/cpu/README.md documenting CPU requirements, BLAS library setup (if needed), expected performance baseline
[ ] Create autoresearch/backends/cuda/README.md documenting CUDA version requirements, GPU compatibility, and ROCm/inference-primitives differences
[ ] Create autoresearch/backends/metal/README.md documenting macOS/iOS device requirements and Metal version constraints
[ ] Create autoresearch/backends/vulkan/README.md documenting cross-platform driver requirements and hardware support matrix
[ ] Add troubleshooting sections referencing baseline.txt and experiments.tsv files where they exist

🌿Good first issues

Add documentation and examples for custom kernel implementations: autoresearch/kernels/ has attention, fused-ops, and linear-attention with benchmark.sh scripts, but no tutorial for contributors on how to add a new kernel variant (e.g., grouped-query-attention). Write a KERNEL_DEVELOPMENT.md guide with a working example.
Expand model compatibility matrix: README lists '15 text model families' and '6 image model variants' but doesn't enumerate them or document which models have been tested on which backends (CUDA/Metal/Vulkan/CPU). Create docs/model_compatibility.md with a table and known issues per model-backend pair.
Add integration tests for mDNS clustering: cake supports zero-config clustering but autoresearch/ and test structure (inferred from cargo workspace) don't show explicit clustering tests. Write tests/clustering_mdns_test.rs that spawn 2-3 local nodes and verify distributed tensor sharding across them.

⭐Top contributors

Click to expand

@evilsocket — 95 commits
@Copilot — 4 commits
@aupadhyay — 1 commits

📝Recent commits