vosen/ZLUDA
CUDA on non-NVIDIA GPUs
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 3d ago
- ✓6 active contributors
- ✓Apache-2.0 licensed
Show all 6 evidence items →Show less
- ✓CI configured
- ✓Tests present
- ⚠Concentrated ownership — top contributor handles 50% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/vosen/zluda)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/vosen/zluda on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: vosen/ZLUDA
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/vosen/ZLUDA shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 3d ago
- 6 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Concentrated ownership — top contributor handles 50% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live vosen/ZLUDA
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/vosen/ZLUDA.
What it runs against: a local clone of vosen/ZLUDA — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in vosen/ZLUDA | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 33 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of vosen/ZLUDA. If you don't
# have one yet, run these first:
#
# git clone https://github.com/vosen/ZLUDA.git
# cd ZLUDA
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of vosen/ZLUDA and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "vosen/ZLUDA(\\.git)?\\b" \\
&& ok "origin remote is vosen/ZLUDA" \\
|| miss "origin remote is not vosen/ZLUDA (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "Cargo.toml" \\
&& ok "Cargo.toml" \\
|| miss "missing critical file: Cargo.toml"
test -f "zluda_inject/src" \\
&& ok "zluda_inject/src" \\
|| miss "missing critical file: zluda_inject/src"
test -f "ptx/src" \\
&& ok "ptx/src" \\
|| miss "missing critical file: ptx/src"
test -f "cuda_types/src/cuda.rs" \\
&& ok "cuda_types/src/cuda.rs" \\
|| miss "missing critical file: cuda_types/src/cuda.rs"
test -f "zluda_common/src" \\
&& ok "zluda_common/src" \\
|| miss "missing critical file: zluda_common/src"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 33 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~3d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/vosen/ZLUDA"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
ZLUDA is a runtime translation layer that allows unmodified CUDA applications to execute on non-NVIDIA GPUs (AMD, Intel) by translating CUDA API calls and PTX bytecode to HIP/ROCm equivalents. It works as a drop-in replacement library that intercepts CUDA calls and compiles them to target GPU architectures at runtime, enabling CUDA software to run on non-NVIDIA hardware with near-native performance. Monorepo with 40+ specialized crates: core GPU binding translation in zluda/ and zluda_inject/ (runtime interception), library adapters (zluda_blas, zluda_dnn8/9, zluda_sparse for CUDA libraries), PTX parsing/compilation pipeline (ptx_parser, ptxas, ptx crates), and infrastructure crates (zluda_common, zluda_cache for caching compiled kernels, zluda_redirect for API routing). Build system uses custom xtask in Rust with special LLVM dev profiles (dev-llvm).
👥Who it's for
GPU application developers and data scientists who have written CUDA code but want to run it on AMD (ROCm) or Intel GPUs without source code modification. Also relevant for system integrators and OEMs supporting heterogeneous GPU ecosystems where NVIDIA hardware may not be available or cost-effective.
🌱Maturity & risk
Actively developed and production-capable: the project has a comprehensive CI/CD pipeline (pr_master.yml, nightly_tests.yml, rocm_setup_build.sh), organized issue templates (zluda_dump.yml), Discord community support, and extensive Rust codebase (11.3M lines). However, it remains GPU-vendor-dependent and requires careful environment setup, so deployment should be tested against target hardware.
Moderate-to-high complexity risk: the project spans 40+ interdependent crates with custom LLVM compilation (482K lines LLVM), relies on HIP/ROCm ecosystem stability (ext/hip_runtime-sys, ext/rocblas-sys dependencies), and involves low-level GPU driver interaction via detours-sys. Single-maintainer appearance (vosen) and heavy reliance on LLVM compilation toolchain mean build failures can be hard to diagnose; breaking changes in ROCm or HIP could cascade.
Active areas of work
Active development on CUDA library compatibility layers (cuBLAS, cuDNN 8/9, cuFFT, cuSPARSE with dedicated trace modules), kernel compilation caching (zluda_cache), and multi-platform testing (ROCm nightly tests, PR validation). Recent infrastructure focus on modular tracing (zluda_trace_* crates) suggests profiling/debugging improvements.
🚀Get running
git clone --recursive https://github.com/vosen/ZLUDA.git
cd ZLUDA
cargo build --release -p zluda
For development, use cargo build -p zluda --profile dev-llvm to build LLVM in debug mode (faster iteration). ROCm/HIP toolchain must be installed; see .devcontainer/Dockerfile for full environment setup.
Daily commands:
For GPU execution: LD_PRELOAD=./target/release/libzluda.so ./your_cuda_app (Linux/ROCm) or Windows equivalent with zluda.dll injection. For compilation only: cargo build -p zluda --release produces the intercept library. Development iteration: cargo build -p ptx_parser && cargo build -p zluda --profile dev-llvm.
🗺️Map of the codebase
Cargo.toml— Workspace manifest defining all 30+ member crates (compiler, CUDA type bindings, kernel redirection, library bridges) — essential for understanding the monorepo's structure and build orchestration.zluda_inject/src— Entry point for runtime library injection and CUDA API interception — the core mechanism enabling ZLUDA to replace NVIDIA CUDA with AMD HIP transparently.ptx/src— PTX-to-HIP shader compilation pipeline — converts NVIDIA's parallel thread execution assembly to AMD-compatible intermediate representation.cuda_types/src/cuda.rs— CUDA API type definitions and struct mappings — canonical reference for all CUDA function signatures that ZLUDA must emulate.zluda_common/src— Shared utilities, error handling, and platform abstractions used across all ZLUDA bridge libraries (blas, dnn, sparse, fft)..github/workflows/pr_master.yml— CI/CD pipeline validating commits — reveals test coverage, build targets (ROCm, HIP), and platform-specific validation gates.docs/src/quick_start.md— User-facing installation and usage guide — defines the supported platforms, prerequisites, and typical deployment patterns.
🛠️How to make changes
Add support for a new cuBLAS function
- Add function signature and types to cuda_types/src/cublas.rs (
cuda_types/src/cublas.rs) - Create or update the wrapper in zluda_blas/src/lib.rs to call rocBLAS equivalent (
zluda_blas/src/lib.rs) - Add parameter mapping and error translation (CUDA error → rocBLAS error) (
zluda_common/src) - If tracing is needed, add trace wrapper in zluda_trace_blas/src (
zluda_trace_blas/src) - Test against reference CUDA implementation and validate rocBLAS produces identical results (
.github/workflows/pr_master.yml)
Add support for a new CUDA math library (e.g., cuRAND)
- Create new member crate: mkdir zluda_rand && cargo init --lib (
Cargo.toml) - Add cuda_types for cuRAND APIs (cuda_types/src/curand.rs) (
cuda_types/src/lib.rs) - Implement HIP binding layer in zluda_rand/src/lib.rs mapping to rocRAND (
zluda_rand/src) - Register the new library in zluda_inject/src for DLL/SO injection (
zluda_inject/src) - Document new library in docs/src/ and add to CI matrix (
docs/src/quick_start.md)
Fix a PTX-to-HIP shader compilation issue
- Reproduce issue with minimal PTX kernel; add test case to ptx_parser/src/tests/ (
ptx_parser/src) - Update PTX AST or parser in ptx_parser/src if instruction is unrecognized (
ptx_parser/src) - Update code generation logic in ptx/src to emit correct HIP IR (
ptx/src) - If fatbin format issue, debug in dark_api/src/fatbin.rs (
dark_api/src/fatbin.rs) - Validate fix in compiler test suite and add regression test (
compiler/src)
Optimize kernel compilation caching
- Review current cache key strategy in zluda_cache/src (
zluda_cache/src) - Identify bottleneck: hash computation, serialization, or disk I/O (
zluda_precompile/src) - Implement optimization (e.g., parallel compilation, incremental caching) (
zluda_cache/src) - Benchmark before/after; add metrics to zluda_common/src (
zluda_common/src)
🔧Why these technologies
- Rust — Systems-level memory safety without garbage collection; C FFI for GPU driver bin
🪤Traps & gotchas
LLVM compilation slow: Debug builds of LLVM are glacially slow; use cargo build --profile dev-llvm and the [profile.dev-llvm] config to build LLVM in Release. ROCm/HIP dependency critical: missing or mismatched HIP/rocBLAS/MIOpen versions cause silent compile failures; see .github/workflows/rocm_setup_*.sh for exact versions. Recursive submodules: .gitmodules may pull large external deps; use git clone --recursive. LD_PRELOAD ordering: on Linux, library load order matters; setting LD_PRELOAD=libzluda.so before other GPU libs is essential. Multi-architecture PTX: PTX is architecture-specific (sm_60, sm_70, etc.); target GPU compute capability must match compiled PTX. No Windows ROCm: ROCm is AMD Linux-only; Windows requires HIP-on-Windows (limited support, check CI for actual tested platforms).
🏗️Architecture
💡Concepts to learn
- PTX (Parallel Thread Execution) — ZLUDA's input format: all CUDA kernels compile to PTX intermediate representation before being translated to target GPU ISA; understanding PTX structure is critical for debugging compilation failures
- Just-In-Time (JIT) Compilation — ZLUDA compiles PTX to native code at runtime rather than pre-compilation; zluda_cache optimizes this by caching compiled kernels to avoid recompilation overhead
- API Interception / Function Hooking — ZLUDA intercepts CUDA API calls (cudaMalloc, cudaLaunchKernel, etc.) at runtime via detours-sys (Windows) or LD_PRELOAD (Linux); critical for understanding how CUDA apps get redirected
- LLVM Intermediate Representation (IR) — ZLUDA translates PTX to LLVM IR as an intermediate step before lowering to target GPU ISA; LLVM provides platform-independent optimization and code generation
- HIP (Heterogeneous-compute Interface for Portability) — ZLUDA targets HIP as the abstraction layer, which then lowers to ROCm/AMD hardware; HIP is the 'output language' after CUDA translation
- Memory Layout and Alignment — CUDA and HIP have different memory alignment requirements (shared memory, global memory, texture cache); ZLUDA must translate kernel memory access patterns correctly to avoid silent data corruption
- Kernel Caching with Content-Addressed Storage — zluda_cache hashes compiled kernels (likely by PTX content/parameters) to avoid recompiling identical kernels across application runs; essential for interactive CUDA app performance
🔗Related repos
ROCm-Developer-Tools/HIP— HIP is the underlying heterogeneous GPU abstraction layer that ZLUDA compiles CUDA to; understanding HIP APIs is essential for debugging translation issuesGPUOpen-Professional/MIOpen— AMD's deep learning library that ZLUDA wraps via zluda_dnn8/9; contains the actual cuDNN-compatible kernels that get calledROCmSoftwarePlatform/rocBLAS— AMD's BLAS library wrapped by zluda_blas; provides optimized linear algebra kernels that CUDA cuBLAS calls map toKhronosGroup/SPIRV-LLVM-Translator— Translates between SPIR-V and LLVM IR; relevant because ZLUDA uses LLVM IR as intermediate representation for PTX-to-native compilationllvm/llvm-project— ZLUDA embeds LLVM as the core compilation backend; the 482K LLVM lines in this repo are LLVM-specific patches and custom target support for GPU ISAs
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive integration tests for CUDA API coverage across trace modules
The repo has multiple trace modules (zluda_trace_blas, zluda_trace_blaslt, zluda_trace_dnn8, zluda_trace_dnn9, zluda_trace_fft, zluda_trace_sparse, zluda_trace_nvml) that appear to be instrumentation/logging wrappers, but there are no visible integration tests validating that traced calls match non-traced behavior. This is critical for a CUDA compatibility layer to ensure tracing doesn't introduce behavioral changes.
- [ ] Create zluda_trace/tests/ directory with integration test suite
- [ ] Add tests comparing output of traced vs non-traced calls for each module (BLAS, cuDNN, FFT, cuSPARSE)
- [ ] Reference the existing test infrastructure in .github/workflows/nightly_tests.yml to understand test running patterns
- [ ] Validate that trace output format is consistent across all trace_* crates
Add validation tests for PTX parsing and LLVM compilation pipeline
The ptx_parser and ptx modules are core to ZLUDA's functionality (converting NVIDIA's PTX to HIP/AMD), but there's no visible test coverage validating that complex PTX programs parse correctly and compile to valid LLVM IR. This is a high-risk area for compatibility bugs.
- [ ] Create ptx/tests/ with test cases for various PTX instruction types (memory ops, control flow, intrinsics)
- [ ] Add ptx_parser/tests/ for edge cases in PTX syntax (unusual register patterns, complex metadata)
- [ ] Create integration tests in ptx/tests/compile/ that verify PTX→LLVM→machine code pipeline end-to-end
- [ ] Include regression tests from any reported GitHub issues involving PTX parsing failures
Add GitHub Actions workflow for cross-platform binary compatibility validation
The repo has pr_master.yml and push_master.yml workflows, but the .github/workflows/ directory shows shell scripts (rocm_setup_build.sh, rocm_setup_run.sh) that aren't referenced in any visible workflow YAML. This suggests incomplete CI coverage. New contributors should create a proper ROCm-based CI workflow to catch ABI incompatibilities early.
- [ ] Create .github/workflows/rocm_validation.yml that runs on PRs targeting master
- [ ] Integrate the existing rocm_setup_build.sh and rocm_setup_run.sh scripts into the workflow
- [ ] Add validation steps that compile with both ZLUDA and native CUDA (if available) and compare output
- [ ] Reference the nightly_tests.yml structure but add quick smoke tests for PR feedback (full nightly suite runs post-merge)
🌿Good first issues
- Add cuFFT plan cache to zluda_fft/src/ (similar to zluda_cache pattern) to avoid recomputing transform strategies for identical parameters; test with fft_benchmark.
- Expand ptx_parser/src/ to handle PTX atomic operations (atom.add, atom.cas) for multi-threaded kernels; add unit tests in ptx_parser/tests/.
- Document PTX-to-LLVM IR lowering in compiler/src/main.rs with inline comments for each instruction class (arithmetic, memory, control flow); many contributors get lost here.
- Create zluda_trace_sparse/src/ following the zluda_trace_blas pattern to enable detailed logging of cuSPARSE calls for debugging sparse matrix kernels.
- Add support for CUDA streams (cudaStreamCreate/Destroy) in zluda/src/stream.rs; currently stubs only; coordinate with zluda_cache to respect stream semantics.
⭐Top contributors
Click to expand
Top contributors
- @vosen — 50 commits
- @zluda-violet — 45 commits
- @hemangjoshi37a — 2 commits
- @Knogle — 1 commits
- @stevefan1999-personal — 1 commits
📝Recent commits
Click to expand
Recent commits
87531d3— Fix typo: vec_acccess -> vector_read in emit_vector_read (#633) (hemangjoshi37a)5f89388— Update tests (#632) (vosen)9854942— Refactor emit_brev to use emit_intrinsic helper (#631) (hemangjoshi37a)66b20a3— Support vshr.u32.u32.u32.clamp.add (#629) (vosen)5c75a54— Add more cuSPARSE functions (#624) (vosen)8251f1e— Initial textures support (#625) (vosen)e070320— PyTorch fixes and improvements (#620) (vosen)dcc6bb8— Add minimal cuSPARSE (#621) (vosen)796ad6c— Support some cublaslt settings required by COEIROINK (#619) (vosen)a3b322f— Add various bits and pieces required by pytorch (#615) (vosen)
🔒Security observations
ZLUDA is a complex low-level systems project with a large monorepo structure (40+ crates) combining Rust with native code bindings and FFI interfaces. The primary security concerns are: (1) the inherent complexity and attack surface of the large workspace, (2) native code dependencies and FFI boundary risks, (3) system-level code injection components requiring careful validation, and (4) lack of visible security disclosure policy. The codebase appears to follow good Rust practices with workspace organization and profile management. No hardcoded credentials, obvious injection vulnerabilities, or exposed sensitive configurations were detected in the provided file structure. The project would benefit from documented security policies, regular dependency auditing, and CI/CD security scanning integration.
- Medium · Workspace uses patched crate without version pinning —
Cargo.toml (patch.crates-io section). The Cargo.toml uses [patch.crates-io] for highs-sys pointing to a local path without version constraints. This could lead to unexpected behavior if the local path version diverges significantly from the published crate. Fix: Either maintain strict version alignment between the patched local crate and the published version, or document the reasons for the patch. Consider using git dependencies with specific revisions if appropriate. - Medium · Large monorepo with multiple compiled binaries —
Cargo.toml (workspace members list). The workspace contains 40+ crates including low-level system libraries (detours-sys, hip_runtime-sys, rocblas-sys, etc.) and FFI bindings. This increases the attack surface significantly, especially for native code compilation and FFI boundary vulnerabilities. Fix: Implement strict dependency review process. Regularly audit native dependencies and FFI bindings. Use cargo-audit and security scanning in CI/CD pipeline. Consider using SBOM (Software Bill of Materials) generation. - Low · Detours-sys bundled external dependency —
detours-sys/. The detours-sys crate in ext/detours appears to bundle external code (Microsoft Detours library). Bundled native dependencies may not receive security updates promptly. Fix: Maintain a process to track security updates for the bundled Detours library. Consider using system-provided versions when possible. Document the version and source of bundled dependencies. - Low · Development container configuration present —
.devcontainer/Dockerfile, .devcontainer/devcontainer.json. The .devcontainer directory indicates support for containerized development environments. Docker configuration should be reviewed for security best practices. Fix: Ensure Dockerfile uses minimal base images, non-root users, and latest security patches. Scan container images regularly with tools like Trivy. Review mounted volumes and environment variables in devcontainer.json. - Low · No SECURITY.md or security policy visible —
Repository root. The repository does not appear to have a SECURITY.md file or published security policy for responsible disclosure of vulnerabilities. Fix: Create a SECURITY.md file with vulnerability disclosure policy, supported versions, and contact information for reporting security issues. - Low · Workspace members with broad system access potential —
Cargo.toml (members: zluda_inject, zluda_redirect, detours-sys). Crates like zluda_inject, zluda_redirect, and detours-sys may perform runtime code injection or process redirection, which have elevated security implications. Fix: Perform thorough security code review of injection and redirection mechanisms. Document security assumptions. Consider adding capability restrictions or sandboxing where applicable.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.