Tencent/ncnn

Item: Tencent/ncnn
Rating: 3
Author: RepoPilot

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Mixed

Mixed signals — read the receipts

ConcernsDependency

non-standard license (Other)

HealthyFork & modify

Has a license, tests, and CI — clean foundation to fork and modify.

HealthyLearn from

Documented and popular — useful reference codebase to read through.

HealthyDeploy as-is

No critical CVEs, sane security posture — runnable as-is.

⚠Concentrated ownership — top contributor handles 59% of recent commits
⚠Non-standard license (Other) — review terms
✓Last commit today
✓18 active contributors
✓Other licensed
✓CI configured
✓Tests present

What would improve this?

→Use as dependency Concerns → Mixed if: clarify license terms

Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/tencent/ncnn?axis=fork)](https://repopilot.app/r/tencent/ncnn)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card

This card auto-renders when someone shares https://repopilot.app/r/tencent/ncnn on X, Slack, or LinkedIn.

Ask AI about tencent/ncnn

Grounded in the actual source code. Pick a starter question or write your own.

What does this repo do, in one paragraph?How would I get started using it?What are the main alternatives?Show me the entry point.

Or write your own question →

Onboarding doc

Onboarding: Tencent/ncnn

Generated by RepoPilot · 2026-06-24 · Source

🎯Verdict

WAIT — Mixed signals — read the receipts

Last commit today
18 active contributors
Other licensed
CI configured
Tests present
⚠ Concentrated ownership — top contributor handles 59% of recent commits
⚠ Non-standard license (Other) — review terms

<sub>Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests</sub>

⚡TL;DR

ncnn is a high-performance neural network inference framework optimized for mobile CPUs, with zero third-party dependencies and cross-platform support (Android, iOS, Linux ARM/x86, WebAssembly, RISC-V). It enables efficient deployment of deep learning models on edge devices, currently powering Tencent's QQ, WeChat, and other production applications. Monolithic but modular: core inference engine in src/net.cpp and src/layer.cpp, platform-specific optimizations in src/arm/, src/x86/, src/vulkan/; tools/pnnx/ handles model conversion from PyTorch/ONNX; benchmarks and unit tests in examples/; CMake-based build system with per-platform CI in .github/workflows/.

👥Who it's for

Mobile app developers and ML engineers who need to run inference on smartphones and embedded devices without heavy dependencies. Contributors are typically C++ systems engineers optimizing for ARM/NEON, x86 SIMD, and GPU acceleration (Vulkan/Metal) across diverse hardware.

🌱Maturity & risk

Highly mature and production-grade: used in multiple Tencent applications at scale, extensive CI/CD coverage (50+ GitHub Actions workflows across platforms), active development with recent commits, comprehensive test infrastructure. This is battle-tested enterprise software.

Low risk overall: zero third-party dependencies by design reduces supply-chain concerns. However, the codebase is large (17MB C++, 12MB C), maintenance relies heavily on Tencent's internal teams, and rapid platform support additions (RISC-V, LoongArch, HarmonyOS) may create maintenance burden. Binary size comparisons and architecture-specific optimizations are actively tracked but add complexity.

Active areas of work

Active development across multiple fronts: RISC-V support (riscv32/riscv64 workflows), LoongArch64 optimization, HarmonyOS integration, Vulkan GPU inference refinement, PNNX model format standardization, and continuous binary size optimization. The separate pnnx GitHub project (mentioned in QQ groups) suggests ongoing tooling investment.

🚀Get running

Clone the repository: git clone https://github.com/Tencent/ncnn.git && cd ncnn. Build for your platform using CMake: mkdir build && cd build && cmake .. && make -j4. For Android: follow .github/workflows/android.yml or the wiki's Android build guide. Python bindings available via pip install ncnn (see requirements in python dependencies).

Daily commands: Run inference example: ./examples/squeezenet (after build). Benchmark: ./benchmark/benchncnn. Android: build APK following .github/workflows/android.yml. Python: import ncnn; net = ncnn.Net(); net.load_param('model.param'); net.load_model('model.bin'). WebAssembly: build with -DCMAKE_TOOLCHAIN_FILE=toolchain-emscripten.cmake.

🗺️Map of the codebase

CMakeLists.txt — Root build configuration defining the entire ncnn compilation process, platform targets, and dependency management across all 600 files
src/layer.h — Core abstract layer class that every neural network operation inherits from; fundamental to understanding ncnn's plugin architecture
src/net.h — Network class responsible for loading, parsing, and executing .param/.bin model files; the primary inference engine entry point
src/mat.h — Tensor/matrix data structure used throughout ncnn for all tensor operations; understanding memory layout is critical for optimization
tools/pnnx — PyTorch model converter toolchain that transforms PyTorch ONNX models to ncnn format; essential for model deployment workflow
benchmark/benchncnn.cpp — Performance benchmarking suite demonstrating how to load and run inference on various models; reference implementation for integration
cmake/ncnn_add_layer.cmake — CMake macro for registering new layers; required pattern for adding custom neural network operations

🛠️How to make changes

Add a New Layer Implementation

Create layer header in src/layer/ (e.g., src/layer/customop.h) with class inheriting from Layer, defining void forward() and int load_param() methods (src/layer/customop.h)
Implement the layer compute logic, optionally with ARM NEON variant in src/layer/customop_arm.cpp for mobile optimization (src/layer/customop_arm.cpp)
Register the layer in CMakeLists.txt using the ncnn_add_layer() macro and add to src/layer_type_enum.h (cmake/ncnn_add_layer.cmake)
Add layer instantiation logic to src/net.cpp in the create_layer() factory function (src/net.cpp)
Write a benchmark entry in benchmark/benchncnn.cpp to validate performance on target hardware (benchmark/benchncnn.cpp)

Add Platform-Specific SIMD Optimization

Create AVX/SSE source file (e.g., src/layer/convolution_x86_avx.cpp) with _mm256 intrinsics, following ncnn's naming convention (src/layer/convolution_x86_avx.cpp)
Update layer's forward() method in src/layer/convolution.cpp with CPU feature detection using cpu_support_x86_avx flag (src/cpu.cpp)
Register the optimized source in CMakeLists.txt within the appropriate platform conditional block (if(X86_OPTIMIZATIONS)) (CMakeLists.txt)
Validate with benchmark suite: cd build && ./benchncnn 10 4 0 to measure speedup against baseline (benchmark/benchncnn.cpp)

Convert a PyTorch Model to NCNN Format

Install PNNX converter: build tools/pnnx/ or use prebuilt binary, validate with pnnx --version (tools/pnnx/CMakeLists.txt)
Convert model: pnnx model.pt inputshape=[1,3,224,224] to generate model.pnnx (intermediate graph format) (tools/pnnx)
Use ncnnoptimize to transform .pnnx to ncnn: tools/ncnnoptimize model.pnnx model.param model.bin to generate .param and .bin files (tools/ncnnoptimize)
Load and run inference in C++: create ncnn::Net, call load_param() and load_model(), then Net::forward() with ncnn::Mat input (src/net.h)
Optionally quantize for mobile: tools/quantize model.param model.bin model_quant.param model_quant.bin calibration_images/ to reduce binary size by 4x (tools/quantize)

Enable GPU Inference with Vulkan Backend

Configure CMake with -DNCNN_VULKAN=ON and ensure Vulkan SDK is installed (check VulkanSDK CMake find module) (CMakeLists.txt)
Rebuild ncnn library: cd build && cmake .. -DNCNN_VULKAN=ON && make to compile Vulkan kernels (SPIR-V shaders) (cmake/ncnn_add_shader.cmake)
In inference code, create VulkanDevice context before ncnn::Net: ncnn::VulkanDevice gpu; net.opt.use_vulkan_compute = 1; (src/gpu.h)
Test GPU inference with benchmark: ./benchncnn 10 4 0 to see Vulkan compute shader dispatch overhead vs CPU (benchmark/benchncnn.cpp)

🔧Why these technologies

C++17 with SIMD intrinsics (SSE/AVX/NEON) — Achieves 10–100× speedup over scalar code on mobile/desktop CPUs; enables real-time inference (10–100 FPS) on resource-constrained devices
Custom memory allocator with arena pooling — Eliminates malloc/free overhead in inference loops; critical for millisecond-latency requirements on mobile where GC pauses are unacceptable
Vulkan GPU compute shaders — Cross-platform GPU acceleration without Vulkan driver overhead on Android; SPIR-V shaders compiled at

🪤Traps & gotchas

Platform-specific gotchas: ARM NEON intrinsics require -mfpu=neon or equivalent flags (handled in CMake but easy to miss in custom builds). Vulkan requires appropriate SDK and device support; fallback to CPU is not automatic. Model file format (.param/.bin) is binary with specific endianness—tools/pnnx/ is the only supported converter; hand-editing param files will fail silently. Threading model is single-threaded by default; multi-threaded inference requires explicit OpenMP or custom pooling. Memory layout is NCHW (not NHWC); mismatched input preprocessing causes silent numerical errors. No dynamic shapes: all input/output dimensions must be known at model load time.

🏗️Architecture

💡Concepts to learn

SIMD (Single Instruction Multiple Data) — ncnn's performance advantage on mobile comes from hand-optimized NEON (ARM), SSE/AVX (x86), and MSA (MIPS) intrinsics; understanding vector instructions is essential for contributing to src/arm/ and src/x86/.
Quantization (INT8/FP16) — ncnn supports fixed-point inference for memory and speed; .github/ISSUE_TEMPLATE/quantization.md indicates this is a supported workflow; reduces model size 4–8x on mobile.
Compute Shaders (Vulkan) — GPU inference path (src/vulkan/) uses GLSL compute shaders (1.2MB GLSL in repo); essential for leveraging Mali/Adreno/PowerVR on Android devices.
Model Serialization (.param/.bin format) — ncnn uses a custom binary model format (parsed by src/param_dict.cpp); understanding serialization is critical for model conversion, debugging, and supporting third-party tooling.
Operator Fusion — ncnn merges consecutive layers (e.g., Conv+BatchNorm+ReLU) into single kernels to reduce memory bandwidth; PNNX IR enables this via graph optimization before conversion.
Cross-Compilation — Building for ARM, MIPS, RISC-V from x86 is common in ncnn CI/CD (50+ workflows); CMake toolchains in cmake/ handle target-specific flags and ABIs.
Memory Layout (NCHW vs NHWC) — ncnn uses NCHW internally (Channels first); mismatched preprocessing or output interpretation causes silent numerical errors; critical for integrating with frameworks using NHWC.

google/mediapipe — Alternative mobile ML inference framework; uses TFLite but shares ncnn's goal of on-device ML for Android/iOS with minimal dependencies
apache/tvm — Compiler-based inference optimization; complementary approach to ncnn's hand-tuned SIMD—TVM auto-optimizes for hardware, ncnn manually optimizes per-architecture
Tencent/pnnx — Sibling project (mentioned in ncnn's issue templates and QQ groups): PyTorch-to-ncnn model converter based on MLIR intermediate representation
pytorch/pytorch — Upstream training framework; ncnn inference models originate from PyTorch (converted via PNNX), understanding PyTorch ops and export is essential
onnx/onnx — Model interchange format; PNNX tool converts ONNX to ncnn, and ONNX serves as a bridge between PyTorch and ncnn deployment

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive CI workflow for WebAssembly (WASM) build validation

The repo has a web-assembly.yml workflow file but lacks detailed build validation. Given ncnn's focus on mobile/edge deployment, WASM is a critical target platform. A contributor could enhance the workflow to include: artifact size comparison tracking (similar to compare-binary-size.yml), performance benchmarks against the benchmark/ suite, and automated testing of WASM-specific optimizations. This directly supports ncnn's mission of efficient inference on constrained environments.

[ ] Review existing .github/workflows/web-assembly.yml and compare-binary-size.yml patterns
[ ] Add WASM artifact build size tracking to .github/workflows/web-assembly.yml
[ ] Integrate benchmark/benchncnn.cpp execution for WASM targets
[ ] Add validation that WASM builds don't regress performance metrics
[ ] Document WASM-specific build options in CONTRIBUTING.md

Create unit tests for model conversion pipeline (PNNX integration)

The repo has .github/workflows/pnnx.yml indicating an active PNNX (PyTorch Neural Network eXchange) model converter, but the benchmark/ directory shows only parameter files (.param) without corresponding conversion validation tests. A contributor should add tests verifying: correct param/bin file generation from standard PyTorch models, numerical accuracy of converted models, and edge cases (quantized models, dynamic shapes). This ensures users can reliably convert their models.

[ ] Examine .github/workflows/pnnx.yml to understand current PNNX integration
[ ] Create tests/test_pnnx_conversion.cpp or Python test suite
[ ] Add standard model files (ResNet, MobileNet, YOLOv* variants) to tests/
[ ] Validate generated .param and .bin files against reference outputs
[ ] Test quantization-aware conversion paths mentioned in .github/ISSUE_TEMPLATE/quantization.md

Add cross-platform binary size regression detection with automated reports

While compare-binary-size.yml and compare-binary-size-pr-comment.yml exist, they appear PR-focused. The repo would benefit from a scheduled workflow that: tracks library size trends across all platforms (ARM, ARM64, x86, RISC-V, LoongArch, MIPS, PPC64 - all have dedicated workflows), generates weekly reports, and alerts maintainers to unexpected bloat. This is critical for mobile-first framework where every byte matters.

[ ] Review existing .github/workflows/compare-binary-size*.yml implementation details
[ ] Create new workflow: .github/workflows/binary-size-tracking.yml with cron schedule
[ ] Collect artifacts from all platform workflows (linux-arm.yml, linux-aarch64.yml, linux-x64-cpu-gcc.yml, windows.yml, macos.yml, ios.yml, etc.)
[ ] Store size metrics in GitHub Discussions or wiki (see sync-wiki.yml pattern)
[ ] Add regression detection logic (alert if single architecture grows >5%)
[ ] Document baseline expectations in CONTRIBUTING.md

🌿Good first issues

Add unit tests for underrepresented layer operations (check which layers in src/layer/ lack corresponding tests/test_*.cpp files) to improve test coverage from current ~60%.
Port example applications from examples/ to additional platforms (e.g., create WebAssembly examples using emscripten toolchain) to match the broad platform support advertised in .github/workflows/.
Document the binary .param/.bin format specification and create a reference parser in Python (currently only PNNX converter exists); this would enable third-party tooling and reduce PNNX dependency.

⭐Top contributors

Click to expand

@nihui — 59 commits
@futz12 — 10 commits
@ihb2032 — 8 commits
@dependabot[bot] — 6 commits
@crafcat7 — 4 commits

📝Recent commits

Click to expand

8775d9c — fix asan error via x86 tmp buffer alignment (#6703) (nihui)
485d173 — feat(pnnx): convert prelu[num_parameters=1] to leakyrelu, so that it can be fused with conv (#6344) (w43322)
476901a — pnnx print flops and memops (#5836) (nihui)
aa01024 — pnnx: support npy input tensors (#6700) (nihui)
0d29a8d — update pnnx torch-2.11 (#6701) (nihui)
d95679b — fix windows-arm build (#6699) (nihui)
d0d5063 — [OPT] x86: optimize PixelShuffle with SIMD block transpose (#6690) (crafcat7)
020b8b0 — Indicate correct build dependencies for RHEL/Centos (#6692) (bkmgit)
dc99678 — Bump docker/setup-qemu-action from 3 to 4 (#6575) (dependabot[bot])
47e637a — Bump actions/cache from 4 to 5 (#6596) (dependabot[bot])

🔒Security observations

The ncnn repository demonstrates a reasonably secure posture with established CI/CD workflows and GitHub security integrations (CodeQL analysis is configured). However, the primary security concern is the lack of strict version pinning in Python dependencies, which creates supply chain risks. The codebase itself appears to be a C++ neural network framework with minimal Python integration based on the visible structure. Recommendations focus on dependency management hardening, automated vulnerability scanning, and transparency measures like SBOM generation. No evidence of hardcoded secrets, SQL injection risks, or critical infrastructure misconfigurations was identified from the provided file structure.

Medium · Dependency on opencv-python without version pinning — Dependencies/Package file (opencv-python). The dependencies file lists 'opencv-python' without a specific version constraint. This could lead to unexpected behavior or security issues if a compromised or vulnerable version is installed during dependency resolution. Fix: Pin opencv-python to a specific secure version, e.g., 'opencv-python==4.8.1.78' or use a version range with known-good constraints: 'opencv-python>=4.8.0,<5.0.0'
Low · Unversioned dependencies in package file — Dependencies/Package file (all dependencies). Multiple dependencies (numpy, tqdm, requests, portalocker) lack version specifications. While these are common libraries, this practice increases supply chain risk and makes builds non-reproducible. Fix: Implement strict version pinning for all dependencies. Use a requirements.txt with exact versions: 'numpy==1.24.3', 'tqdm==4.65.0', 'requests==2.31.0', 'portalocker==2.7.0', etc.
Low · No evidence of dependency vulnerability scanning — .github/dependabot.yml, .github/workflows/. While GitHub Actions workflows are present, there is no visible integration of automated dependency scanning tools (e.g., Dependabot checks, OWASP Dependency-Check, or Snyk) in the CI/CD pipeline to catch known vulnerabilities in third-party packages. Fix: Enable Dependabot alerts in repository settings, add automated security scanning workflows, and regularly audit dependencies using 'pip audit' or similar tools.
Low · No evidence of SBOM generation — Repository root (missing SBOM files). The codebase shows no indication of Software Bill of Materials (SBOM) generation, which is increasingly important for supply chain security transparency. Fix: Integrate SBOM generation into the build pipeline using tools like 'cyclonedx-bom' or 'syft' to create CycloneDX or SPDX format SBOMs for releases.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/Tencent/ncnn shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live Tencent/ncnn repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/Tencent/ncnn.

What it runs against: a local clone of Tencent/ncnn — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in Tencent/ncnn | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>Tencent/ncnn</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of Tencent/ncnn. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/Tencent/ncnn.git
#   cd ncnn
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of Tencent/ncnn and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "Tencent/ncnn(\\.git)?\\b" \\
  && ok "origin remote is Tencent/ncnn" \\
  || miss "origin remote is not Tencent/ncnn (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "CMakeLists.txt" \\
  && ok "CMakeLists.txt" \\
  || miss "missing critical file: CMakeLists.txt"
test -f "src/layer.h" \\
  && ok "src/layer.h" \\
  || miss "missing critical file: src/layer.h"
test -f "src/net.h" \\
  && ok "src/net.h" \\
  || miss "missing critical file: src/net.h"
test -f "src/mat.h" \\
  && ok "src/mat.h" \\
  || miss "missing critical file: src/mat.h"
test -f "tools/pnnx" \\
  && ok "tools/pnnx" \\
  || miss "missing critical file: tools/pnnx"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/Tencent/ncnn"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/tencent/ncnn"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>