wang-xinyu/tensorrtx

Item: wang-xinyu/tensorrtx
Rating: 5
Author: RepoPilot

Implementation of popular deep learning networks with TensorRT network definition API

Healthy

Healthy across the board

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 2mo ago
✓19 active contributors
✓Distributed ownership (top contributor 37% of recent commits)

Show 3 more →

✓MIT licensed
✓CI configured
✓Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/wang-xinyu/tensorrtx)](https://repopilot.app/r/wang-xinyu/tensorrtx)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/wang-xinyu/tensorrtx on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: wang-xinyu/tensorrtx

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/wang-xinyu/tensorrtx shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 2mo ago
19 active contributors
Distributed ownership (top contributor 37% of recent commits)
MIT licensed
CI configured
Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live wang-xinyu/tensorrtx repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/wang-xinyu/tensorrtx.

What it runs against: a local clone of wang-xinyu/tensorrtx — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in wang-xinyu/tensorrtx | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | Last commit ≤ 93 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>wang-xinyu/tensorrtx</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of wang-xinyu/tensorrtx. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/wang-xinyu/tensorrtx.git
#   cd tensorrtx
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of wang-xinyu/tensorrtx and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "wang-xinyu/tensorrtx(\\.git)?\\b" \\
  && ok "origin remote is wang-xinyu/tensorrtx" \\
  || miss "origin remote is not wang-xinyu/tensorrtx (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 93 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~63d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/wang-xinyu/tensorrtx"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

TensorRTx implements popular deep learning network architectures (ResNet, YOLO, EfficientNet, Vision Transformer, etc.) using NVIDIA TensorRT's native C++ network definition API instead of ONNX/UFF parsers. It exports PyTorch/TensorFlow weights to .wts plaintext files, then builds optimized TensorRT inference engines for production deployment with full network introspection and custom layer control. Flat monorepo: each deep learning model gets its own top-level folder (alexnet/, yolo5/, centernet/, convnextv2/, etc.), each containing {CMakeLists.txt, gen_wts.py, inference .cpp/.py, logging.h, utils.h}. Shared patterns: Python script exports weights → .wts file; C++ code loads .wts, builds TensorRT INetworkDefinition, serializes engine. Custom CUDA kernels live in dedicated plugin subdirectories (dcnv2Plugin/*, convnextv2/src/).

👥Who it's for

Machine learning engineers and inference optimization specialists who need production-ready TensorRT engines with architectural transparency, custom layer modifications, and integrated pre/post-processing—particularly those deploying computer vision models (object detection, face recognition, image classification) on NVIDIA GPUs at scale.

🌱Maturity & risk

Actively maintained as of Jan 2026 (recent YOLO13/Vision Transformer additions, TensorRT 7-10 SDK support). 4.3M lines of C++, 745K Python, and established CI/pre-commit pipelines indicate production-grade code. However, it's a community reference implementation repo, not an official NVIDIA library, with variable quality across 100+ network implementations.

High fragmentation risk: 100+ independent network folders (yolo*, efficientnet, arcface, etc.) with inconsistent patterns and potential API skew across TensorRT SDK versions 7-10. Single-maintainer core (wang-xinyu) with community PRs of variable quality. No comprehensive test suite visible; each network folder has ad-hoc validation. CUDA custom plugins (dcnv2Plugin, LayerNormPlugin) add compilation complexity.

Active areas of work

Recent activity (Jan-Mar 2026): YOLOv13/YOLO12/YOLO11 variants added; Vision Transformer implementation; refactor of legacy CV models to support TensorRT SDK 7-10 uniformly; first Tripy (TensorRT Python) examples for LeNet. Heavy focus on YOLO variants and object detection; convnextv2 and arcface actively refined.

🚀Get running

git clone https://github.com/wang-xinyu/tensorrtx.git && cd tensorrtx && pip install opencv-python-headless numpy torch nvtripy. Then cd into a specific model folder (e.g., yolo5/) and follow its README.md for PyTorch weight export (python gen_wts.py) and TensorRT engine build (mkdir build && cd build && cmake .. && make && ./yolo).

Daily commands: Model-specific; pick a folder like yolo5/: (1) python gen_wts.py to export PyTorch weights to yolo5.wts, (2) mkdir build && cd build && cmake .. && make to compile the C++ engine builder, (3) ./yolo -s ../yolo5.wts yolo5.engine m to serialize TensorRT engine, (4) ./yolo -d yolo5.engine ../sample/input.jpg to run inference. See individual README.md files for exact parameters.

🗺️Map of the codebase

alexnet/alexnet.cc: Simplest complete example: demonstrates full workflow from loading .wts weights to building INetworkDefinition to serializing TensorRT engine—ideal onboarding reference.
alexnet/gen_wts.py: Template for exporting PyTorch model weights to plaintext .wts format; shows OrderedDict iteration and binary serialization pattern used across all models.
yolo5/yolo.cc: Production-scale example with post-processing (NMS), batched inference, and multi-GPU support; demonstrates IExecutionContext usage and tensor marshaling.
convnextv2/src/LayerNormPlugin.h: Reference custom CUDA plugin interface; shows how to wrap unsupported layers (LayerNorm) via IPlugin API for ops not in TensorRT native set.
centernet/dcnv2Plugin/dcnv2Plugin.cpp: Complex CUDA plugin (Deformable Convolution v2); demonstrates kernel binding, serialization, and multi-GPU plugin lifecycle.
CMakeLists.txt: Boilerplate CMake configuration for TensorRT, CUDA, OpenCV discovery and linking; copy-paste template for new models.

🛠️How to make changes

Start with a reference model folder (alexnet/ or yolo5/ are simplest). (1) Modify gen_wts.py to export different PyTorch model variants. (2) Edit the .cc/.cpp file's network building code (around ILayer* layer = ...) to add/remove/modify layers. (3) Update CMakeLists.txt if new .cu files added. (4) For custom CUDA ops, create a new plugin directory (see convnextv2/src/LayerNormPlugin.cu as template) and register in the main .cpp via IPluginCreator.

🪤Traps & gotchas

(1) .wts weight file format is binary (not text despite .txt name); gen_wts.py must match the .cc file's expected tensor order exactly or inference silently produces garbage. (2) TensorRT engine files (.engine) are GPU-architecture-specific; building on RTX3090 won't run on A100 without rebuild. (3) Some model folders have stale code; branch to trt10 for TensorRT 10.x compatibility. (4) Custom CUDA plugins require matching CUDA Compute Capability (SM_75, SM_80, etc.) set in CMakeLists.txt; wrong value silently fails at runtime. (5) ONNX/UFF parsers intentionally NOT used—this repo assumes you already have .wts weights; if you only have .onnx, you need external conversion tooling.

💡Concepts to learn

TensorRT Network Definition API (INetworkDefinition, ILayer, ITensor) — Core to TensorRTx; understanding the builder pattern (addConvolution, addActivation, setOutput) is essential to modify any model or debug layer connections.
.wts plaintext weight serialization format — TensorRTx's custom format replaces ONNX/UFF; gen_wts.py exports to it, C++ loads from it; format order must match layer-by-layer to avoid silent inference corruption.
NVIDIA TensorRT Plugin API (IPlugin, IPluginCreator) — Required to add custom CUDA layers unsupported by native TensorRT (LayerNorm, Deformable Conv); seen in convnextv2/src/ and centernet/dcnv2Plugin/.
IExecutionContext and tensor binding — Runtime inference abstraction; allocates GPU memory for inputs/outputs, enqueues kernels, binds tensors by index—critical for batched inference in yolo.cc.
CUDA kernel compilation and Compute Capability (SM_xx) — Custom plugins (dcnv2, LayerNorm) require matching GPU architecture; CMakeLists.txt sets -gencode arch=compute_XX; wrong SM version silently fails at runtime.
Engine serialization and deserialization (IHostMemory, ICudaEngine) — TensorRTx builds engines once (slow, on GPU), serializes to .engine file (portable), then deserializes for fast inference; enables deployment without recompiling.
Deformable Convolution v2 (DCNv2) — Specialized layer used in CenterNet and other detection models; requires custom CUDA kernel in centernet/dcnv2Plugin/ since TensorRT doesn't natively support geometric offset-based convolution.

NVIDIA/TensorRT — Official NVIDIA TensorRT repository; TensorRTx builds directly on its C++ API and serves as a reference implementation library for that API.
onnx/onnx-tensorrt — ONNX parser for TensorRT—the alternative approach that TensorRTx explicitly avoids; understand this to appreciate TensorRTx's flexibility tradeoff.
wang-xinyu/pytorchx — Sister repo by same author; provides PyTorch implementations of networks exported by gen_wts.py scripts in TensorRTx; tight coupling for weight extraction pipeline.
Ultralytics/yolov5 — Official YOLOv5 PyTorch repo; TensorRTx's yolo5/ folder depends on this for model definitions and weight export baseline.
NVIDIA/TensorRT-Incubator — Newer TensorRT experimentation (Tripy, nvtripy package); TensorRTx is adopting Tripy for future Python-native model definitions as seen in recent LeNet example.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add CMake integration tests for model build workflows

The repo has CMakeLists.txt files across multiple model directories (alexnet, arcface, convnextv2, crnn, csrnet, dbnet, etc.) but lacks automated CI validation that these builds actually succeed. Currently, .github/workflows/pre-commit.yml only checks formatting. Adding a workflow to test CMake builds for at least 2-3 representative models (small ones like alexnet and crnn to keep CI fast) would catch build breakages early. This is high-value because contributors regularly modify C++ code and CMake configs, and build failures can go unnoticed until deployment.

[ ] Create .github/workflows/cmake-build-test.yml to test CMake builds
[ ] Test at minimum: alexnet/CMakeLists.txt, crnn/CMakeLists.txt, convnextv2/CMakeLists.txt builds
[ ] Verify the workflow runs on pull requests and reports failures clearly
[ ] Document in contributing.md which models are tested in CI and why

Consolidate and document custom CUDA plugins (LayerNormPlugin, dcnv2Plugin, prelu)

The repo has custom CUDA kernels scattered across different model directories: convnextv2/src/LayerNormPlugin.cu, centernet/dcnv2Plugin/dcn_v2_im2col_cuda.cu, and arcface/prelu.cu. These are duplicated or specialized implementations that could benefit from centralized documentation and a shared plugin base class pattern. Create a new plugins/ directory with documented plugin templates and refactor existing plugins to follow consistent structure. This reduces maintenance burden and makes it easier for contributors to add new custom layers.

[ ] Create plugins/ directory with base plugin class and CMakeLists.txt
[ ] Document plugin interface requirements in plugins/README.md with examples from LayerNormPlugin and dcnv2Plugin
[ ] Refactor convnextv2/src/LayerNormPlugin.* and centernet/dcnv2Plugin/* to inherit from base class
[ ] Update arcface/prelu.cu to follow the same pattern for consistency

Add weight file format documentation and validation utility

Multiple model directories have gen_wts.py scripts (alexnet/gen_wts.py, arcface/gen_wts.py, csrnet/gen_wts.py, etc.) that export weights to .wts files, but the binary format is undocumented. This creates friction for contributors who need to understand or modify weight export/import. Create a utils/wts_format.md documenting the .wts binary format, and add a python validation script (utils/validate_wts.py) that can verify .wts file integrity and dump metadata. This reduces debugging time when weight loading fails.

[ ] Create utils/wts_format.md documenting the .wts file binary structure (header, data types, offsets, checksums if any)
[ ] Create utils/validate_wts.py script that can read/validate .wts files and report statistics
[ ] Test validate_wts.py against existing generated .wts files from at least 2 models (alexnet, arcface)
[ ] Update relevant model READMEs to reference the format documentation

🌿Good first issues

Add unit tests for weight export in gen_wts.py across all model variants (alexnet, yolo5, efficientnet)—currently each folder validates manually; could write Python test cases that verify exported .wts integrity.
Document CUDA Compute Capability requirements and verification steps in top-level README.md and each model README—currently unclear which SM versions are supported; add a troubleshooting table.
Refactor shared utilities (logging.h, macros.h, utils.h) duplicated in 30+ model folders into a single tensorrtx/common/ directory and update CMakeLists.txt includes—reduces maintenance burden.

⭐Top contributors

Click to expand

@wang-xinyu — 37 commits
@lindsayshuo — 19 commits
@mpj1234 — 14 commits
@zgjja — 4 commits
@fazligorkembal — 4 commits

📝Recent commits