nezhahq/nezha
:trollface: Self-hosted, lightweight server and website monitoring and O&M tool
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 5w ago
- ✓8 active contributors
- ✓Apache-2.0 licensed
Show all 6 evidence items →Show less
- ✓CI configured
- ✓Tests present
- ⚠Concentrated ownership — top contributor handles 55% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/nezhahq/nezha)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/nezhahq/nezha on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: nezhahq/nezha
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/nezhahq/nezha shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 5w ago
- 8 active contributors
- Apache-2.0 licensed
- CI configured
- Tests present
- ⚠ Concentrated ownership — top contributor handles 55% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live nezhahq/nezha
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/nezhahq/nezha.
What it runs against: a local clone of nezhahq/nezha — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in nezhahq/nezha | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 62 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of nezhahq/nezha. If you don't
# have one yet, run these first:
#
# git clone https://github.com/nezhahq/nezha.git
# cd nezha
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of nezhahq/nezha and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "nezhahq/nezha(\\.git)?\\b" \\
&& ok "origin remote is nezhahq/nezha" \\
|| miss "origin remote is not nezhahq/nezha (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "cmd/dashboard/main.go" \\
&& ok "cmd/dashboard/main.go" \\
|| miss "missing critical file: cmd/dashboard/main.go"
test -f "cmd/dashboard/controller/controller.go" \\
&& ok "cmd/dashboard/controller/controller.go" \\
|| miss "missing critical file: cmd/dashboard/controller/controller.go"
test -f "cmd/dashboard/rpc/rpc.go" \\
&& ok "cmd/dashboard/rpc/rpc.go" \\
|| miss "missing critical file: cmd/dashboard/rpc/rpc.go"
test -f "model/server.go" \\
&& ok "model/server.go" \\
|| miss "missing critical file: model/server.go"
test -f "model/alertrule.go" \\
&& ok "model/alertrule.go" \\
|| miss "missing critical file: model/alertrule.go"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 62 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~32d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/nezhahq/nezha"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Nezha is a self-hosted, lightweight monitoring dashboard written in Go that tracks server health (CPU, memory, disk, network) and website availability via HTTP/TCP/Ping checks, with native support for SSL certificate monitoring, alert routing to multiple channels, scheduled task execution, and web-based terminal access. It solves the problem of centralizing infrastructure observability without the overhead of Prometheus/Grafana or cloud-dependent SaaS monitoring. Classic monorepo structure: cmd/dashboard/main.go orchestrates the core server (Gin-based REST + gRPC endpoints), cmd/dashboard/controller/ houses HTTP handlers, pkg/ likely contains domain logic (referenced in imports), and cmd/dashboard/admin-dist/ and user-dist/ hold pre-built frontend assets. Database layer uses GORM with SQLite driver (gorm.io/driver/sqlite v1.6.0).
👥Who it's for
DevOps engineers, system administrators, and small-to-medium infrastructure teams who need to monitor multiple VPS/servers and websites without complex setup, plus self-hosted enthusiasts who want their monitoring data to stay private and on-premises.
🌱Maturity & risk
Active and production-ready: the project has versioned releases via GitHub Actions (see .github/workflows/release.yml), maintained dependencies (Go 1.26, Gin 1.12, GORM 1.31), CI/CD pipelines including CodeQL security scans, and active translations via Weblate. Evidence of regular updates and a growing contributor base (naiba, UUBulb, Akkia visible in commit history) indicates healthy ongoing development.
Low-to-moderate risk: single primary maintainer (naiba) with dependency on external dashboard frontends (nezhahq/admin-frontend, hamster1963/nezha-dash) creates maintenance bottleneck; relatively lightweight dependency tree (Gin, GORM, VictoriaMetrics) reduces supply-chain risk. No visible breaking-change warnings, but tight coupling to specific frontend releases could cause deployment friction.
Active areas of work
Active development on multi-protocol monitoring expansion (HTTP/TCP/Ping), alert rule engine (alertrule.go controller), terminal feature (terminal.go), DNS/DDNS support (ddns.go), and WAF integration (waf/). GitHub workflows show recent release automation and code synchronization to Atom Git mirror, plus ongoing translation efforts.
🚀Get running
git clone https://github.com/nezhahq/nezha.git
cd nezha
go mod download
go run cmd/dashboard/main.go
Or use the dev container: docker build -f .devcontainer/build.sh . for containerized development environment.
Daily commands:
go run cmd/dashboard/main.go
Expects config via environment variables or YAML (koanf/v2 config management). Access dashboard at http://localhost:8008 (inferred from standard Gin defaults; check cmd/dashboard/main.go for exact port).
🗺️Map of the codebase
cmd/dashboard/main.go— Dashboard entry point that initializes the monitoring web server, routes, and RPC communication with agents.cmd/dashboard/controller/controller.go— Central router and middleware setup for all API endpoints; every endpoint in the dashboard flows through here.cmd/dashboard/rpc/rpc.go— gRPC handler for agent-to-dashboard communication; core data ingestion path for real-time server monitoring.model/server.go— Data model for monitored servers; defines schema for persistent storage and API serialization.model/alertrule.go— Alert rule model and logic; defines trigger thresholds and notification conditions.model/notification.go— Notification channel abstraction (email, webhook, etc.); glues alerting system to delivery mechanisms.go.mod— Go module definition; declares heavy dependencies (Gin, gRPC, VictoriaMetrics) that shape architecture.
🛠️How to make changes
Add a New Monitoring Alert Rule Type
- Define the alert rule model in model/alertrule.go with new threshold fields (
model/alertrule.go) - Add alert rule API endpoints (create/update/delete) in cmd/dashboard/controller/alertrule.go (
cmd/dashboard/controller/alertrule.go) - Extend the RPC trigger logic in cmd/dashboard/rpc/rpc.go to evaluate the new rule condition (
cmd/dashboard/rpc/rpc.go) - Update model/notification.go to map rule to notification channels when triggered (
model/notification.go)
Add a New Notification Channel (e.g., Slack, Discord)
- Create notification config model in model/notification.go for the new channel type (
model/notification.go) - Implement send logic in cmd/dashboard/controller/notification.go with the external API client (
cmd/dashboard/controller/notification.go) - Add REST endpoint to manage the channel config via cmd/dashboard/controller/controller.go (
cmd/dashboard/controller/controller.go) - Test integration by triggering alerts from cmd/dashboard/rpc/rpc.go to verify delivery (
cmd/dashboard/rpc/rpc.go)
Add a New Scheduled Task Feature
- Define task model and schema in model/cron.go or create model/mytask.go (
model/cron.go) - Add REST endpoints for task CRUD in cmd/dashboard/controller/cron.go (
cmd/dashboard/controller/cron.go) - Implement task execution logic by registering a goroutine or using the cron scheduler (
cmd/dashboard/controller/cron.go) - Expose status/logs via WebSocket in cmd/dashboard/controller/ws.go for real-time UI updates (
cmd/dashboard/controller/ws.go)
Add a New API Endpoint
- Create request/response models in model/myfeature_api.go following naming convention (
model/api.go) - Implement handler function in cmd/dashboard/controller/myfeature.go (
cmd/dashboard/controller/controller.go) - Register route with Gin router in cmd/dashboard/controller/controller.go (e.g., r.POST('/api/myfeature')) (
cmd/dashboard/controller/controller.go) - Protect endpoint with JWT middleware by calling gin-jwt in cmd/dashboard/controller/jwt.go (
cmd/dashboard/controller/jwt.go)
🔧Why these technologies
- Go + Gin framework — Lightweight, fast HTTP server suitable for self-hosted monitoring; minimal resource footprint for resource-constrained environments.
- gRPC + Protobuf — Efficient agent-to-dashboard communication; binary protocol reduces bandwidth for frequent telemetry updates.
- WebSocket (Gorilla) — Real-time push of alerts and metrics to browser UI without polling overhead.
- VictoriaMetrics — Time-series storage for historical monitoring data; optimized for high-cardinality metrics (many servers, many labels).
- JWT + OAuth2 — Stateless authentication for API and third-party integrations; scales without session storage.
⚖️Trade-offs already made
-
Single monolithic Go binary vs. microservices
- Why: Self-hosted use case requires minimal operational complexity; easier deployment on single VPS.
- Consequence: Simpler scaling; no service-to-service RPC overhead; but vertical scaling only—cannot isolate alert engine or notification sender to separate processes.
-
gRPC for agent communication vs. REST polling
- Why: Streaming reduces round-trip latency and bandwidth; agents can push metrics continuously.
- Consequence: Agents must implement gRPC client; agent availability directly tied to dashboard uptime (no buffering).
-
In-memory alert rule evaluation vs. external rule engine
- Why: Simplicity and low latency; rules evaluated synchronously on every metric update.
- Consequence: Alert rule changes take effect immediately; but high-frequency alerts on many rules can spike CPU; no audit trail of rule executions.
-
WebSocket broadcast to all connected browsers vs. per-user queues
- Why: Simpler implementation; all users see the same alerts in near-real-time.
- Consequence: Reduced isolation; all users receive all alerts regardless of permissions (may leak sensitive alerts to unauthorized users if not careful).
🚫Non-goals (don't propose these)
- Does not provide agent-side local alerting or offline task execution; agent is purely a telemetry collector.
- Does not implement distributed tracing or observability into agent networks; monitoring is one-way (agent → dashboard).
- Does not support multi-tenancy at the database level; single dashboard instance = single tenant.
🪤Traps & gotchas
Database required: SQLite database must be created on first run; no auto-init migration visible in config examples—check koanf initialization. Frontend split: admin UI and user UI are separate repos (nezhahq/admin-frontend, hamster1963/nezha-dash); pre-built assets must exist in admin-dist/ and user-dist/ or routes will 404. gRPC agent dependency: monitoring agents communicate via gRPC (google.golang.org/grpc v1.79.3); agent repo (nezhahq/agent) must be deployed separately and configured to point to dashboard host. No visible env defaults: uses koanf for config loading; requires explicit YAML or env vars—missing config will silently fail or use insecure defaults. OAuth2 integration: oauth2.go controller expects upstream provider config; incomplete setup breaks login.
🏗️Architecture
💡Concepts to learn
- gRPC and Protocol Buffers — Nezha uses google.golang.org/grpc for agent-to-dashboard communication; understanding gRPC streaming, unary RPCs, and proto3 message definitions is essential to extending the monitoring protocol
- WebSocket (Gorilla WebSockets) — Real-time dashboard updates via gorilla/websocket eliminate polling overhead; critical for live metric streaming to web clients
- JWT Authentication & Token Refresh — appleboy/gin-jwt/v2 handles stateless auth for dashboard API; understanding token expiry, refresh tokens, and claim validation is necessary for API integration and security
- Time-Series Data & Metrics Storage — VictoriaMetrics (v1.134.0) provides the backend for efficient metric storage and querying; understanding scrape intervals, retention policies, and PromQL-like query language is essential for metric retention design
- Cron Jobs & Scheduled Task Execution — robfig/cron/v3 powers the cron.go controller for alert rules and maintenance tasks; required for implementing recurring checks and alert evaluations
- Configuration Management with Koanf — koanf/v2 handles multi-source config loading (env, files, YAML); understanding provider hierarchy and hot-reload is critical for deployment and operational configuration
- ORM & Database Abstraction (GORM) — gorm.io/gorm abstracts SQLite operations; understanding hooks, migrations, and query patterns is necessary for adding new data models (e.g., new alert types, server metadata)
🔗Related repos
nezhahq/admin-frontend— Official admin dashboard UI (React/Vue) that must be built and deployed alongside the backend; critical for management interfacehamster1963/nezha-dash— Community-maintained user-facing frontend theme for displaying monitoring status publicly; represents extensible theming systemnezhahq/agent— Monitoring agent that runs on target servers and sends metrics to dashboard via gRPC; required for any operational Nezha deploymentVictoriaMetrics/VictoriaMetrics— Time-series database backend used by Nezha for storing and querying historical metrics; core dependency for data persistenceprometheus/prometheus— Industry-standard alternative for metrics collection and storage; Nezha positions itself as a lighter-weight, self-contained replacement
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for JWT controller (jwt.go, jwt_test.go)
The jwt_test.go file exists but appears minimal. JWT is critical for authentication/authorization. Current test coverage likely doesn't cover edge cases like token expiration, refresh flows, invalid signatures, or role-based access control. This is high-value because auth vulnerabilities can compromise the entire monitoring system.
- [ ] Expand jwt_test.go with table-driven tests for token generation, validation, and refresh scenarios
- [ ] Add tests for edge cases: expired tokens, tampered tokens, missing claims, role validation
- [ ] Test integration with user.go controller for permission enforcement
- [ ] Ensure coverage of oauth2.go flows if JWT is used in OAuth2 exchange
Add missing unit tests for model configuration and validation (model/config.go, model/config_test.go)
Configuration is loaded from files and environment via koanf. The config_test.go exists but config.go handles critical settings (database, server URLs, DDNS providers, etc.). Missing tests could allow invalid configs to reach production. This is high-value for reliability.
- [ ] Add tests in model/config_test.go for config validation logic in config.go
- [ ] Test environment variable override behavior (koanf/providers/env integration)
- [ ] Test YAML/config file parsing with invalid/malformed inputs
- [ ] Test default value fallbacks for missing critical fields
Add missing HTTP integration tests for WAF controller (cmd/dashboard/controller/waf/waf.go)
The WAF controller exists with waf.go and waf.html but there are no visible test files for it. WAF rules are security-critical. Other controllers (alertrule, server, etc.) lack dedicated test files too, but WAF is highest-risk. Testing should cover rule creation, validation, and application.
- [ ] Create cmd/dashboard/controller/waf/waf_test.go with HTTP handler tests
- [ ] Test WAF rule CRUD operations with valid and invalid inputs
- [ ] Test rule precedence and conflict detection
- [ ] Test rule matching against sample requests
- [ ] Follow pattern from existing controller tests (use gin.Context, httptest)
🌿Good first issues
- Add unit tests for cmd/dashboard/controller/jwt.go—jwt_test.go exists but appears incomplete; add coverage for token refresh, expiration, and claim validation
- Expand WAF rule documentation: cmd/dashboard/controller/waf/waf.html exists but is sparse; add inline examples for common attack patterns (SQL injection, XSS) and rule syntax reference
- Create migration guide from Prometheus/Grafana to Nezha in docs/: repo lacks explicit comparison and conversion docs; would help users evaluating alternatives
⭐Top contributors
Click to expand
Top contributors
- @naiba — 55 commits
- @uubulb — 15 commits
- @github-actions[bot] — 14 commits
- @dependabot[bot] — 7 commits
- @weblate — 6 commits
📝Recent commits
Click to expand
Recent commits
50dc8e6— chore: upgrade frontend (naiba)9acffc1— chore: bump Go to 1.26, update dependencies and frontend templates (naiba)4e95135— Merge pull request #1177 from nezhahq/dependabot/go_modules/google.golang.org/grpc-1.79.3 (naiba)91a636c— chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.79.3 (dependabot[bot])a5d4537— fix: 恢复被误删的 AuthCodeURL 调用,修复编译失败 (naiba)589563e— chore: upgrade frontend (naiba)d57d7b7— Fix: 设置Cookie Secure属性增强安全性 (naiba)69ac37d— update contributors[no ci] (github-actions[bot])be8ff11— fix: upgrade CodeQL Action to v3 and generate swagger docs before build (naiba)c48c63c— update contributors[no ci] (github-actions[bot])
🔒Security observations
- High · Outdated golang.org/x/crypto dependency —
go.mod. The project uses golang.org/x/crypto v0.49.0, which is significantly outdated. Current stable versions are v0.24.0+. This may contain known security vulnerabilities in cryptographic operations. Fix: Update golang.org/x/crypto to the latest stable version (v0.24.0 or higher). Run 'go get -u golang.org/x/crypto' and test thoroughly. - High · Outdated golang.org/x/net dependency —
go.mod. The project uses golang.org/x/net v0.52.0, which is outdated. This package contains critical networking and HTTP utilities. Outdated versions may have security issues in TLS handling and HTTP client implementations. Fix: Update golang.org/x/net to the latest stable version. Run 'go get -u golang.org/x/net' to patch potential CVEs in networking code. - Medium · JWT implementation security concerns —
cmd/dashboard/controller/jwt.go. The project uses appleboy/gin-jwt v2.10.3 for JWT authentication. The presence of jwt.go and jwt_test.go suggests custom JWT handling. JWT misconfigurations are common attack vectors for authentication bypass. Fix: Review JWT implementation for: proper signature verification, secure key management, token expiration validation, and algorithm whitelisting. Ensure no 'none' algorithm acceptance. - Medium · OAuth2 implementation review needed —
cmd/dashboard/controller/oauth2.go and model/oauth2*.go. OAuth2 implementation present (oauth2.go, oauth2config.go, oauth2bind.go) without visible security context. OAuth2 flows can have redirect_uri validation, state parameter, and token handling vulnerabilities. Fix: Verify: PKCE support for public clients, proper redirect_uri validation against whitelist, secure state parameter handling, secure token storage, and HTTPS enforcement. - Medium · Database ORM potential SQL injection —
model/*_api.go files throughout model/. Project uses GORM (gorm.io/gorm) with SQLite backend. While GORM provides ORM protection, the presence of API endpoints in model files (*_api.go) suggests potential for improper query construction if raw SQL or insecure query building is used. Fix: Audit all database queries to ensure GORM parameterized queries are used. Avoid string concatenation for SQL. Validate and sanitize all user inputs before database operations. - Medium · WAF and file management endpoints security —
cmd/dashboard/controller/fm.go, cmd/dashboard/controller/waf.go, cmd/dashboard/controller/ddns.go. The presence of fm.go (file manager), waf.go (Web Application Firewall), and ddns.go suggests file operations and system-level configurations. These are high-risk endpoints for unauthorized access. Fix: Implement strict authentication and authorization checks. Validate file paths to prevent path traversal. Implement rate limiting on configuration endpoints. Log all administrative actions. - Medium · WebSocket implementation (ws.go) security —
cmd/dashboard/controller/ws.go. WebSocket endpoint present (ws.go) which is used for real-time monitoring. WebSockets can be vulnerable to message injection, unauthorized subscriptions, and privilege escalation. Fix: Ensure: WebSocket connections require valid authentication tokens, message validation/sanitization, proper connection cleanup, and rate limiting on WebSocket messages. - Medium · Terminal functionality (terminal.go) critical risk —
cmd/dashboard/controller/terminal.go, model/terminal_api.go. Terminal API endpoint exists (terminal.go, terminal_api.go) which likely provides remote command execution. This is extremely sensitive and a common attack target. Fix: Implement: multi-factor authentication for terminal access, comprehensive audit logging of all commands, connection rate limiting, IP whitelisting, session timeouts, and consider disabling terminal feature if not essential. - Low · Docker base image - busybox security —
undefined. Dockerfile uses busybox:stable-musl as the final stage. While minimal, Fix: undefined
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.