nezhahq/nezha

Item: nezhahq/nezha
Rating: 5
Author: RepoPilot

:trollface: Self-hosted, lightweight server and website monitoring and O&M tool

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 5w ago
✓8 active contributors
✓Apache-2.0 licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Concentrated ownership — top contributor handles 55% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/nezhahq/nezha)](https://repopilot.app/r/nezhahq/nezha)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/nezhahq/nezha on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: nezhahq/nezha

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/nezhahq/nezha shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 5w ago
8 active contributors
Apache-2.0 licensed
CI configured
Tests present
⚠ Concentrated ownership — top contributor handles 55% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live nezhahq/nezha repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/nezhahq/nezha.

What it runs against: a local clone of nezhahq/nezha — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in nezhahq/nezha | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 62 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>nezhahq/nezha</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of nezhahq/nezha. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/nezhahq/nezha.git
#   cd nezha
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of nezhahq/nezha and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "nezhahq/nezha(\\.git)?\\b" \\
  && ok "origin remote is nezhahq/nezha" \\
  || miss "origin remote is not nezhahq/nezha (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "cmd/dashboard/main.go" \\
  && ok "cmd/dashboard/main.go" \\
  || miss "missing critical file: cmd/dashboard/main.go"
test -f "cmd/dashboard/controller/controller.go" \\
  && ok "cmd/dashboard/controller/controller.go" \\
  || miss "missing critical file: cmd/dashboard/controller/controller.go"
test -f "cmd/dashboard/rpc/rpc.go" \\
  && ok "cmd/dashboard/rpc/rpc.go" \\
  || miss "missing critical file: cmd/dashboard/rpc/rpc.go"
test -f "model/server.go" \\
  && ok "model/server.go" \\
  || miss "missing critical file: model/server.go"
test -f "model/alertrule.go" \\
  && ok "model/alertrule.go" \\
  || miss "missing critical file: model/alertrule.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 62 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~32d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/nezhahq/nezha"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Nezha is a self-hosted, lightweight monitoring dashboard written in Go that tracks server health (CPU, memory, disk, network) and website availability via HTTP/TCP/Ping checks, with native support for SSL certificate monitoring, alert routing to multiple channels, scheduled task execution, and web-based terminal access. It solves the problem of centralizing infrastructure observability without the overhead of Prometheus/Grafana or cloud-dependent SaaS monitoring. Classic monorepo structure: cmd/dashboard/main.go orchestrates the core server (Gin-based REST + gRPC endpoints), cmd/dashboard/controller/ houses HTTP handlers, pkg/ likely contains domain logic (referenced in imports), and cmd/dashboard/admin-dist/ and user-dist/ hold pre-built frontend assets. Database layer uses GORM with SQLite driver (gorm.io/driver/sqlite v1.6.0).

👥Who it's for

DevOps engineers, system administrators, and small-to-medium infrastructure teams who need to monitor multiple VPS/servers and websites without complex setup, plus self-hosted enthusiasts who want their monitoring data to stay private and on-premises.

🌱Maturity & risk

Active and production-ready: the project has versioned releases via GitHub Actions (see .github/workflows/release.yml), maintained dependencies (Go 1.26, Gin 1.12, GORM 1.31), CI/CD pipelines including CodeQL security scans, and active translations via Weblate. Evidence of regular updates and a growing contributor base (naiba, UUBulb, Akkia visible in commit history) indicates healthy ongoing development.

Low-to-moderate risk: single primary maintainer (naiba) with dependency on external dashboard frontends (nezhahq/admin-frontend, hamster1963/nezha-dash) creates maintenance bottleneck; relatively lightweight dependency tree (Gin, GORM, VictoriaMetrics) reduces supply-chain risk. No visible breaking-change warnings, but tight coupling to specific frontend releases could cause deployment friction.

Active areas of work

Active development on multi-protocol monitoring expansion (HTTP/TCP/Ping), alert rule engine (alertrule.go controller), terminal feature (terminal.go), DNS/DDNS support (ddns.go), and WAF integration (waf/). GitHub workflows show recent release automation and code synchronization to Atom Git mirror, plus ongoing translation efforts.

🚀Get running

git clone https://github.com/nezhahq/nezha.git
cd nezha
go mod download
go run cmd/dashboard/main.go

Or use the dev container: docker build -f .devcontainer/build.sh . for containerized development environment.

Daily commands:

go run cmd/dashboard/main.go

Expects config via environment variables or YAML (koanf/v2 config management). Access dashboard at http://localhost:8008 (inferred from standard Gin defaults; check cmd/dashboard/main.go for exact port).

🗺️Map of the codebase

cmd/dashboard/main.go — Dashboard entry point that initializes the monitoring web server, routes, and RPC communication with agents.
cmd/dashboard/controller/controller.go — Central router and middleware setup for all API endpoints; every endpoint in the dashboard flows through here.
cmd/dashboard/rpc/rpc.go — gRPC handler for agent-to-dashboard communication; core data ingestion path for real-time server monitoring.
model/server.go — Data model for monitored servers; defines schema for persistent storage and API serialization.
model/alertrule.go — Alert rule model and logic; defines trigger thresholds and notification conditions.
model/notification.go — Notification channel abstraction (email, webhook, etc.); glues alerting system to delivery mechanisms.
go.mod — Go module definition; declares heavy dependencies (Gin, gRPC, VictoriaMetrics) that shape architecture.

🛠️How to make changes

Add a New Monitoring Alert Rule Type

Define the alert rule model in model/alertrule.go with new threshold fields (model/alertrule.go)
Add alert rule API endpoints (create/update/delete) in cmd/dashboard/controller/alertrule.go (cmd/dashboard/controller/alertrule.go)
Extend the RPC trigger logic in cmd/dashboard/rpc/rpc.go to evaluate the new rule condition (cmd/dashboard/rpc/rpc.go)
Update model/notification.go to map rule to notification channels when triggered (model/notification.go)

Add a New Notification Channel (e.g., Slack, Discord)

Create notification config model in model/notification.go for the new channel type (model/notification.go)
Implement send logic in cmd/dashboard/controller/notification.go with the external API client (cmd/dashboard/controller/notification.go)
Add REST endpoint to manage the channel config via cmd/dashboard/controller/controller.go (cmd/dashboard/controller/controller.go)
Test integration by triggering alerts from cmd/dashboard/rpc/rpc.go to verify delivery (cmd/dashboard/rpc/rpc.go)

Add a New Scheduled Task Feature

Define task model and schema in model/cron.go or create model/mytask.go (model/cron.go)
Add REST endpoints for task CRUD in cmd/dashboard/controller/cron.go (cmd/dashboard/controller/cron.go)
Implement task execution logic by registering a goroutine or using the cron scheduler (cmd/dashboard/controller/cron.go)
Expose status/logs via WebSocket in cmd/dashboard/controller/ws.go for real-time UI updates (cmd/dashboard/controller/ws.go)

Add a New API Endpoint

Create request/response models in model/myfeature_api.go following naming convention (model/api.go)
Implement handler function in cmd/dashboard/controller/myfeature.go (cmd/dashboard/controller/controller.go)
Register route with Gin router in cmd/dashboard/controller/controller.go (e.g., r.POST('/api/myfeature')) (cmd/dashboard/controller/controller.go)
Protect endpoint with JWT middleware by calling gin-jwt in cmd/dashboard/controller/jwt.go (cmd/dashboard/controller/jwt.go)

🔧Why these technologies

Go + Gin framework — Lightweight, fast HTTP server suitable for self-hosted monitoring; minimal resource footprint for resource-constrained environments.
gRPC + Protobuf — Efficient agent-to-dashboard communication; binary protocol reduces bandwidth for frequent telemetry updates.
WebSocket (Gorilla) — Real-time push of alerts and metrics to browser UI without polling overhead.
VictoriaMetrics — Time-series storage for historical monitoring data; optimized for high-cardinality metrics (many servers, many labels).
JWT + OAuth2 — Stateless authentication for API and third-party integrations; scales without session storage.

⚖️Trade-offs already made

Single monolithic Go binary vs. microservices
- Why: Self-hosted use case requires minimal operational complexity; easier deployment on single VPS.
- Consequence: Simpler scaling; no service-to-service RPC overhead; but vertical scaling only—cannot isolate alert engine or notification sender to separate processes.
gRPC for agent communication vs. REST polling
- Why: Streaming reduces round-trip latency and bandwidth; agents can push metrics continuously.
- Consequence: Agents must implement gRPC client; agent availability directly tied to dashboard uptime (no buffering).
In-memory alert rule evaluation vs. external rule engine
- Why: Simplicity and low latency; rules evaluated synchronously on every metric update.
- Consequence: Alert rule changes take effect immediately; but high-frequency alerts on many rules can spike CPU; no audit trail of rule executions.
WebSocket broadcast to all connected browsers vs. per-user queues
- Why: Simpler implementation; all users see the same alerts in near-real-time.
- Consequence: Reduced isolation; all users receive all alerts regardless of permissions (may leak sensitive alerts to unauthorized users if not careful).

🚫Non-goals (don't propose these)

Does not provide agent-side local alerting or offline task execution; agent is purely a telemetry collector.
Does not implement distributed tracing or observability into agent networks; monitoring is one-way (agent → dashboard).
Does not support multi-tenancy at the database level; single dashboard instance = single tenant.

🪤Traps & gotchas

Database required: SQLite database must be created on first run; no auto-init migration visible in config examples—check koanf initialization. Frontend split: admin UI and user UI are separate repos (nezhahq/admin-frontend, hamster1963/nezha-dash); pre-built assets must exist in admin-dist/ and user-dist/ or routes will 404. gRPC agent dependency: monitoring agents communicate via gRPC (google.golang.org/grpc v1.79.3); agent repo (nezhahq/agent) must be deployed separately and configured to point to dashboard host. No visible env defaults: uses koanf for config loading; requires explicit YAML or env vars—missing config will silently fail or use insecure defaults. OAuth2 integration: oauth2.go controller expects upstream provider config; incomplete setup breaks login.

🏗️Architecture

💡Concepts to learn

gRPC and Protocol Buffers — Nezha uses google.golang.org/grpc for agent-to-dashboard communication; understanding gRPC streaming, unary RPCs, and proto3 message definitions is essential to extending the monitoring protocol
WebSocket (Gorilla WebSockets) — Real-time dashboard updates via gorilla/websocket eliminate polling overhead; critical for live metric streaming to web clients
JWT Authentication & Token Refresh — appleboy/gin-jwt/v2 handles stateless auth for dashboard API; understanding token expiry, refresh tokens, and claim validation is necessary for API integration and security
Time-Series Data & Metrics Storage — VictoriaMetrics (v1.134.0) provides the backend for efficient metric storage and querying; understanding scrape intervals, retention policies, and PromQL-like query language is essential for metric retention design
Cron Jobs & Scheduled Task Execution — robfig/cron/v3 powers the cron.go controller for alert rules and maintenance tasks; required for implementing recurring checks and alert evaluations
Configuration Management with Koanf — koanf/v2 handles multi-source config loading (env, files, YAML); understanding provider hierarchy and hot-reload is critical for deployment and operational configuration
ORM & Database Abstraction (GORM) — gorm.io/gorm abstracts SQLite operations; understanding hooks, migrations, and query patterns is necessary for adding new data models (e.g., new alert types, server metadata)

nezhahq/admin-frontend — Official admin dashboard UI (React/Vue) that must be built and deployed alongside the backend; critical for management interface
hamster1963/nezha-dash — Community-maintained user-facing frontend theme for displaying monitoring status publicly; represents extensible theming system
nezhahq/agent — Monitoring agent that runs on target servers and sends metrics to dashboard via gRPC; required for any operational Nezha deployment
VictoriaMetrics/VictoriaMetrics — Time-series database backend used by Nezha for storing and querying historical metrics; core dependency for data persistence
prometheus/prometheus — Industry-standard alternative for metrics collection and storage; Nezha positions itself as a lighter-weight, self-contained replacement

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for JWT controller (jwt.go, jwt_test.go)

The jwt_test.go file exists but appears minimal. JWT is critical for authentication/authorization. Current test coverage likely doesn't cover edge cases like token expiration, refresh flows, invalid signatures, or role-based access control. This is high-value because auth vulnerabilities can compromise the entire monitoring system.

[ ] Expand jwt_test.go with table-driven tests for token generation, validation, and refresh scenarios
[ ] Add tests for edge cases: expired tokens, tampered tokens, missing claims, role validation
[ ] Test integration with user.go controller for permission enforcement
[ ] Ensure coverage of oauth2.go flows if JWT is used in OAuth2 exchange

Add missing unit tests for model configuration and validation (model/config.go, model/config_test.go)

Configuration is loaded from files and environment via koanf. The config_test.go exists but config.go handles critical settings (database, server URLs, DDNS providers, etc.). Missing tests could allow invalid configs to reach production. This is high-value for reliability.

[ ] Add tests in model/config_test.go for config validation logic in config.go
[ ] Test environment variable override behavior (koanf/providers/env integration)
[ ] Test YAML/config file parsing with invalid/malformed inputs
[ ] Test default value fallbacks for missing critical fields

Add missing HTTP integration tests for WAF controller (cmd/dashboard/controller/waf/waf.go)

The WAF controller exists with waf.go and waf.html but there are no visible test files for it. WAF rules are security-critical. Other controllers (alertrule, server, etc.) lack dedicated test files too, but WAF is highest-risk. Testing should cover rule creation, validation, and application.

[ ] Create cmd/dashboard/controller/waf/waf_test.go with HTTP handler tests
[ ] Test WAF rule CRUD operations with valid and invalid inputs
[ ] Test rule precedence and conflict detection
[ ] Test rule matching against sample requests
[ ] Follow pattern from existing controller tests (use gin.Context, httptest)

🌿Good first issues

Add unit tests for cmd/dashboard/controller/jwt.go—jwt_test.go exists but appears incomplete; add coverage for token refresh, expiration, and claim validation
Expand WAF rule documentation: cmd/dashboard/controller/waf/waf.html exists but is sparse; add inline examples for common attack patterns (SQL injection, XSS) and rule syntax reference
Create migration guide from Prometheus/Grafana to Nezha in docs/: repo lacks explicit comparison and conversion docs; would help users evaluating alternatives

⭐Top contributors

Click to expand

@naiba — 55 commits
@uubulb — 15 commits
@github-actions[bot] — 14 commits
@dependabot[bot] — 7 commits
@weblate — 6 commits

📝Recent commits