vosen/ZLUDA

Item: vosen/ZLUDA
Rating: 5
Author: RepoPilot

CUDA on non-NVIDIA GPUs

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 3d ago
✓6 active contributors
✓Apache-2.0 licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Concentrated ownership — top contributor handles 50% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/vosen/zluda)](https://repopilot.app/r/vosen/zluda)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/vosen/zluda on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: vosen/ZLUDA

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/vosen/ZLUDA shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 3d ago
6 active contributors
Apache-2.0 licensed
CI configured
Tests present
⚠ Concentrated ownership — top contributor handles 50% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live vosen/ZLUDA repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/vosen/ZLUDA.

What it runs against: a local clone of vosen/ZLUDA — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in vosen/ZLUDA | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 33 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>vosen/ZLUDA</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of vosen/ZLUDA. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/vosen/ZLUDA.git
#   cd ZLUDA
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of vosen/ZLUDA and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "vosen/ZLUDA(\\.git)?\\b" \\
  && ok "origin remote is vosen/ZLUDA" \\
  || miss "origin remote is not vosen/ZLUDA (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "Cargo.toml" \\
  && ok "Cargo.toml" \\
  || miss "missing critical file: Cargo.toml"
test -f "zluda_inject/src" \\
  && ok "zluda_inject/src" \\
  || miss "missing critical file: zluda_inject/src"
test -f "ptx/src" \\
  && ok "ptx/src" \\
  || miss "missing critical file: ptx/src"
test -f "cuda_types/src/cuda.rs" \\
  && ok "cuda_types/src/cuda.rs" \\
  || miss "missing critical file: cuda_types/src/cuda.rs"
test -f "zluda_common/src" \\
  && ok "zluda_common/src" \\
  || miss "missing critical file: zluda_common/src"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 33 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~3d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/vosen/ZLUDA"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

ZLUDA is a runtime translation layer that allows unmodified CUDA applications to execute on non-NVIDIA GPUs (AMD, Intel) by translating CUDA API calls and PTX bytecode to HIP/ROCm equivalents. It works as a drop-in replacement library that intercepts CUDA calls and compiles them to target GPU architectures at runtime, enabling CUDA software to run on non-NVIDIA hardware with near-native performance. Monorepo with 40+ specialized crates: core GPU binding translation in zluda/ and zluda_inject/ (runtime interception), library adapters (zluda_blas, zluda_dnn8/9, zluda_sparse for CUDA libraries), PTX parsing/compilation pipeline (ptx_parser, ptxas, ptx crates), and infrastructure crates (zluda_common, zluda_cache for caching compiled kernels, zluda_redirect for API routing). Build system uses custom xtask in Rust with special LLVM dev profiles (dev-llvm).

👥Who it's for

GPU application developers and data scientists who have written CUDA code but want to run it on AMD (ROCm) or Intel GPUs without source code modification. Also relevant for system integrators and OEMs supporting heterogeneous GPU ecosystems where NVIDIA hardware may not be available or cost-effective.

🌱Maturity & risk

Actively developed and production-capable: the project has a comprehensive CI/CD pipeline (pr_master.yml, nightly_tests.yml, rocm_setup_build.sh), organized issue templates (zluda_dump.yml), Discord community support, and extensive Rust codebase (11.3M lines). However, it remains GPU-vendor-dependent and requires careful environment setup, so deployment should be tested against target hardware.

Moderate-to-high complexity risk: the project spans 40+ interdependent crates with custom LLVM compilation (482K lines LLVM), relies on HIP/ROCm ecosystem stability (ext/hip_runtime-sys, ext/rocblas-sys dependencies), and involves low-level GPU driver interaction via detours-sys. Single-maintainer appearance (vosen) and heavy reliance on LLVM compilation toolchain mean build failures can be hard to diagnose; breaking changes in ROCm or HIP could cascade.

Active areas of work

Active development on CUDA library compatibility layers (cuBLAS, cuDNN 8/9, cuFFT, cuSPARSE with dedicated trace modules), kernel compilation caching (zluda_cache), and multi-platform testing (ROCm nightly tests, PR validation). Recent infrastructure focus on modular tracing (zluda_trace_* crates) suggests profiling/debugging improvements.

🚀Get running

git clone --recursive https://github.com/vosen/ZLUDA.git
cd ZLUDA
cargo build --release -p zluda

For development, use cargo build -p zluda --profile dev-llvm to build LLVM in debug mode (faster iteration). ROCm/HIP toolchain must be installed; see .devcontainer/Dockerfile for full environment setup.

Daily commands: For GPU execution: LD_PRELOAD=./target/release/libzluda.so ./your_cuda_app (Linux/ROCm) or Windows equivalent with zluda.dll injection. For compilation only: cargo build -p zluda --release produces the intercept library. Development iteration: cargo build -p ptx_parser && cargo build -p zluda --profile dev-llvm.

🗺️Map of the codebase

Cargo.toml — Workspace manifest defining all 30+ member crates (compiler, CUDA type bindings, kernel redirection, library bridges) — essential for understanding the monorepo's structure and build orchestration.
zluda_inject/src — Entry point for runtime library injection and CUDA API interception — the core mechanism enabling ZLUDA to replace NVIDIA CUDA with AMD HIP transparently.
ptx/src — PTX-to-HIP shader compilation pipeline — converts NVIDIA's parallel thread execution assembly to AMD-compatible intermediate representation.
cuda_types/src/cuda.rs — CUDA API type definitions and struct mappings — canonical reference for all CUDA function signatures that ZLUDA must emulate.
zluda_common/src — Shared utilities, error handling, and platform abstractions used across all ZLUDA bridge libraries (blas, dnn, sparse, fft).
.github/workflows/pr_master.yml — CI/CD pipeline validating commits — reveals test coverage, build targets (ROCm, HIP), and platform-specific validation gates.
docs/src/quick_start.md — User-facing installation and usage guide — defines the supported platforms, prerequisites, and typical deployment patterns.

🛠️How to make changes

Add support for a new cuBLAS function

Add function signature and types to cuda_types/src/cublas.rs (cuda_types/src/cublas.rs)
Create or update the wrapper in zluda_blas/src/lib.rs to call rocBLAS equivalent (zluda_blas/src/lib.rs)
Add parameter mapping and error translation (CUDA error → rocBLAS error) (zluda_common/src)
If tracing is needed, add trace wrapper in zluda_trace_blas/src (zluda_trace_blas/src)
Test against reference CUDA implementation and validate rocBLAS produces identical results (.github/workflows/pr_master.yml)

Add support for a new CUDA math library (e.g., cuRAND)

Create new member crate: mkdir zluda_rand && cargo init --lib (Cargo.toml)
Add cuda_types for cuRAND APIs (cuda_types/src/curand.rs) (cuda_types/src/lib.rs)
Implement HIP binding layer in zluda_rand/src/lib.rs mapping to rocRAND (zluda_rand/src)
Register the new library in zluda_inject/src for DLL/SO injection (zluda_inject/src)
Document new library in docs/src/ and add to CI matrix (docs/src/quick_start.md)

Fix a PTX-to-HIP shader compilation issue

Reproduce issue with minimal PTX kernel; add test case to ptx_parser/src/tests/ (ptx_parser/src)
Update PTX AST or parser in ptx_parser/src if instruction is unrecognized (ptx_parser/src)
Update code generation logic in ptx/src to emit correct HIP IR (ptx/src)
If fatbin format issue, debug in dark_api/src/fatbin.rs (dark_api/src/fatbin.rs)
Validate fix in compiler test suite and add regression test (compiler/src)

Optimize kernel compilation caching

Review current cache key strategy in zluda_cache/src (zluda_cache/src)
Identify bottleneck: hash computation, serialization, or disk I/O (zluda_precompile/src)
Implement optimization (e.g., parallel compilation, incremental caching) (zluda_cache/src)
Benchmark before/after; add metrics to zluda_common/src (zluda_common/src)

🔧Why these technologies

Rust — Systems-level memory safety without garbage collection; C FFI for GPU driver bin

🪤Traps & gotchas

LLVM compilation slow: Debug builds of LLVM are glacially slow; use cargo build --profile dev-llvm and the [profile.dev-llvm] config to build LLVM in Release. ROCm/HIP dependency critical: missing or mismatched HIP/rocBLAS/MIOpen versions cause silent compile failures; see .github/workflows/rocm_setup_*.sh for exact versions. Recursive submodules: .gitmodules may pull large external deps; use git clone --recursive. LD_PRELOAD ordering: on Linux, library load order matters; setting LD_PRELOAD=libzluda.so before other GPU libs is essential. Multi-architecture PTX: PTX is architecture-specific (sm_60, sm_70, etc.); target GPU compute capability must match compiled PTX. No Windows ROCm: ROCm is AMD Linux-only; Windows requires HIP-on-Windows (limited support, check CI for actual tested platforms).

🏗️Architecture

💡Concepts to learn

PTX (Parallel Thread Execution) — ZLUDA's input format: all CUDA kernels compile to PTX intermediate representation before being translated to target GPU ISA; understanding PTX structure is critical for debugging compilation failures
Just-In-Time (JIT) Compilation — ZLUDA compiles PTX to native code at runtime rather than pre-compilation; zluda_cache optimizes this by caching compiled kernels to avoid recompilation overhead
API Interception / Function Hooking — ZLUDA intercepts CUDA API calls (cudaMalloc, cudaLaunchKernel, etc.) at runtime via detours-sys (Windows) or LD_PRELOAD (Linux); critical for understanding how CUDA apps get redirected
LLVM Intermediate Representation (IR) — ZLUDA translates PTX to LLVM IR as an intermediate step before lowering to target GPU ISA; LLVM provides platform-independent optimization and code generation
HIP (Heterogeneous-compute Interface for Portability) — ZLUDA targets HIP as the abstraction layer, which then lowers to ROCm/AMD hardware; HIP is the 'output language' after CUDA translation
Memory Layout and Alignment — CUDA and HIP have different memory alignment requirements (shared memory, global memory, texture cache); ZLUDA must translate kernel memory access patterns correctly to avoid silent data corruption
Kernel Caching with Content-Addressed Storage — zluda_cache hashes compiled kernels (likely by PTX content/parameters) to avoid recompiling identical kernels across application runs; essential for interactive CUDA app performance

ROCm-Developer-Tools/HIP — HIP is the underlying heterogeneous GPU abstraction layer that ZLUDA compiles CUDA to; understanding HIP APIs is essential for debugging translation issues
GPUOpen-Professional/MIOpen — AMD's deep learning library that ZLUDA wraps via zluda_dnn8/9; contains the actual cuDNN-compatible kernels that get called
ROCmSoftwarePlatform/rocBLAS — AMD's BLAS library wrapped by zluda_blas; provides optimized linear algebra kernels that CUDA cuBLAS calls map to
KhronosGroup/SPIRV-LLVM-Translator — Translates between SPIR-V and LLVM IR; relevant because ZLUDA uses LLVM IR as intermediate representation for PTX-to-native compilation
llvm/llvm-project — ZLUDA embeds LLVM as the core compilation backend; the 482K LLVM lines in this repo are LLVM-specific patches and custom target support for GPU ISAs

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for CUDA API coverage across trace modules

The repo has multiple trace modules (zluda_trace_blas, zluda_trace_blaslt, zluda_trace_dnn8, zluda_trace_dnn9, zluda_trace_fft, zluda_trace_sparse, zluda_trace_nvml) that appear to be instrumentation/logging wrappers, but there are no visible integration tests validating that traced calls match non-traced behavior. This is critical for a CUDA compatibility layer to ensure tracing doesn't introduce behavioral changes.

[ ] Create zluda_trace/tests/ directory with integration test suite
[ ] Add tests comparing output of traced vs non-traced calls for each module (BLAS, cuDNN, FFT, cuSPARSE)
[ ] Reference the existing test infrastructure in .github/workflows/nightly_tests.yml to understand test running patterns
[ ] Validate that trace output format is consistent across all trace_* crates

Add validation tests for PTX parsing and LLVM compilation pipeline

The ptx_parser and ptx modules are core to ZLUDA's functionality (converting NVIDIA's PTX to HIP/AMD), but there's no visible test coverage validating that complex PTX programs parse correctly and compile to valid LLVM IR. This is a high-risk area for compatibility bugs.

[ ] Create ptx/tests/ with test cases for various PTX instruction types (memory ops, control flow, intrinsics)
[ ] Add ptx_parser/tests/ for edge cases in PTX syntax (unusual register patterns, complex metadata)
[ ] Create integration tests in ptx/tests/compile/ that verify PTX→LLVM→machine code pipeline end-to-end
[ ] Include regression tests from any reported GitHub issues involving PTX parsing failures

Add GitHub Actions workflow for cross-platform binary compatibility validation

The repo has pr_master.yml and push_master.yml workflows, but the .github/workflows/ directory shows shell scripts (rocm_setup_build.sh, rocm_setup_run.sh) that aren't referenced in any visible workflow YAML. This suggests incomplete CI coverage. New contributors should create a proper ROCm-based CI workflow to catch ABI incompatibilities early.

[ ] Create .github/workflows/rocm_validation.yml that runs on PRs targeting master
[ ] Integrate the existing rocm_setup_build.sh and rocm_setup_run.sh scripts into the workflow
[ ] Add validation steps that compile with both ZLUDA and native CUDA (if available) and compare output
[ ] Reference the nightly_tests.yml structure but add quick smoke tests for PR feedback (full nightly suite runs post-merge)

🌿Good first issues

Add cuFFT plan cache to zluda_fft/src/ (similar to zluda_cache pattern) to avoid recomputing transform strategies for identical parameters; test with fft_benchmark.
Expand ptx_parser/src/ to handle PTX atomic operations (atom.add, atom.cas) for multi-threaded kernels; add unit tests in ptx_parser/tests/.
Document PTX-to-LLVM IR lowering in compiler/src/main.rs with inline comments for each instruction class (arithmetic, memory, control flow); many contributors get lost here.
Create zluda_trace_sparse/src/ following the zluda_trace_blas pattern to enable detailed logging of cuSPARSE calls for debugging sparse matrix kernels.
Add support for CUDA streams (cudaStreamCreate/Destroy) in zluda/src/stream.rs; currently stubs only; coordinate with zluda_cache to respect stream semantics.

⭐Top contributors

Click to expand

@vosen — 50 commits
@zluda-violet — 45 commits
@hemangjoshi37a — 2 commits
@Knogle — 1 commits
@stevefan1999-personal — 1 commits

📝Recent commits

Click to expand

87531d3 — Fix typo: vec_acccess -> vector_read in emit_vector_read (#633) (hemangjoshi37a)
5f89388 — Update tests (#632) (vosen)
9854942 — Refactor emit_brev to use emit_intrinsic helper (#631) (hemangjoshi37a)
66b20a3 — Support vshr.u32.u32.u32.clamp.add (#629) (vosen)
5c75a54 — Add more cuSPARSE functions (#624) (vosen)
8251f1e — Initial textures support (#625) (vosen)
e070320 — PyTorch fixes and improvements (#620) (vosen)
dcc6bb8 — Add minimal cuSPARSE (#621) (vosen)
796ad6c — Support some cublaslt settings required by COEIROINK (#619) (vosen)
a3b322f — Add various bits and pieces required by pytorch (#615) (vosen)

🔒Security observations

ZLUDA is a complex low-level systems project with a large monorepo structure (40+ crates) combining Rust with native code bindings and FFI interfaces. The primary security concerns are: (1) the inherent complexity and attack surface of the large workspace, (2) native code dependencies and FFI boundary risks, (3) system-level code injection components requiring careful validation, and (4) lack of visible security disclosure policy. The codebase appears to follow good Rust practices with workspace organization and profile management. No hardcoded credentials, obvious injection vulnerabilities, or exposed sensitive configurations were detected in the provided file structure. The project would benefit from documented security policies, regular dependency auditing, and CI/CD security scanning integration.

Medium · Workspace uses patched crate without version pinning — Cargo.toml (patch.crates-io section). The Cargo.toml uses [patch.crates-io] for highs-sys pointing to a local path without version constraints. This could lead to unexpected behavior if the local path version diverges significantly from the published crate. Fix: Either maintain strict version alignment between the patched local crate and the published version, or document the reasons for the patch. Consider using git dependencies with specific revisions if appropriate.
Medium · Large monorepo with multiple compiled binaries — Cargo.toml (workspace members list). The workspace contains 40+ crates including low-level system libraries (detours-sys, hip_runtime-sys, rocblas-sys, etc.) and FFI bindings. This increases the attack surface significantly, especially for native code compilation and FFI boundary vulnerabilities. Fix: Implement strict dependency review process. Regularly audit native dependencies and FFI bindings. Use cargo-audit and security scanning in CI/CD pipeline. Consider using SBOM (Software Bill of Materials) generation.
Low · Detours-sys bundled external dependency — detours-sys/. The detours-sys crate in ext/detours appears to bundle external code (Microsoft Detours library). Bundled native dependencies may not receive security updates promptly. Fix: Maintain a process to track security updates for the bundled Detours library. Consider using system-provided versions when possible. Document the version and source of bundled dependencies.
Low · Development container configuration present — .devcontainer/Dockerfile, .devcontainer/devcontainer.json. The .devcontainer directory indicates support for containerized development environments. Docker configuration should be reviewed for security best practices. Fix: Ensure Dockerfile uses minimal base images, non-root users, and latest security patches. Scan container images regularly with tools like Trivy. Review mounted volumes and environment variables in devcontainer.json.
Low · No SECURITY.md or security policy visible — Repository root. The repository does not appear to have a SECURITY.md file or published security policy for responsible disclosure of vulnerabilities. Fix: Create a SECURITY.md file with vulnerability disclosure policy, supported versions, and contact information for reporting security issues.
Low · Workspace members with broad system access potential — Cargo.toml (members: zluda_inject, zluda_redirect, detours-sys). Crates like zluda_inject, zluda_redirect, and detours-sys may perform runtime code injection or process redirection, which have elevated security implications. Fix: Perform thorough security code review of injection and redirection mechanisms. Document security assumptions. Consider adding capability restrictions or sandboxing where applicable.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

vosen/ZLUDA

Embed the "Healthy" badge

Onboarding doc

Onboarding: vosen/ZLUDA

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add support for a new cuBLAS function

Add support for a new CUDA math library (e.g., cuRAND)

Fix a PTX-to-HIP shader compilation issue

Optimize kernel compilation caching

🔧Why these technologies

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive integration tests for CUDA API coverage across trace modules

Add validation tests for PTX parsing and LLVM compilation pipeline

Add GitHub Actions workflow for cross-platform binary compatibility validation

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next