gorse-io/gorse
AI powered open source recommender system engine supports classical/LLM rankers and multimodal content via embedding
Healthy across all four use cases
Permissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ⚠Concentrated ownership — top contributor handles 68% of recent commits
- ⚠No test directory detected
- ✓Last commit today
- ✓14 active contributors
- ✓Apache-2.0 licensed
- ✓CI configured
Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/gorse-io/gorse)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card
This card auto-renders when someone shares https://repopilot.app/r/gorse-io/gorse on X, Slack, or LinkedIn.
Ask AI about gorse-io/gorse
Grounded in the actual source code. Pick a starter question or write your own.
Onboarding doc
Onboarding: gorse-io/gorse
Generated by RepoPilot · 2026-06-24 · Source
🎯Verdict
GO — Healthy across all four use cases
- Last commit today
- 14 active contributors
- Apache-2.0 licensed
- CI configured
- ⚠ Concentrated ownership — top contributor handles 68% of recent commits
- ⚠ No test directory detected
<sub>Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests</sub>
⚡TL;DR
Gorse is an AI-powered open-source recommender system engine written in Go that automatically generates personalized recommendations by training on user-item interaction data. It supports multi-source recommendation strategies (collaborative filtering, item-to-item, user-to-user, latest items), multimodal content via embeddings, and both classical and LLM-based rankers, exposing recommendations via REST APIs and a web dashboard. Monolithic Go application split into specialized binaries: cmd/gorse-in-one (all-in-one playground), cmd/gorse-master (coordination), cmd/gorse-worker (model training), cmd/gorse-server (API/dashboard). The client/ directory contains SDK/test configurations; core logic lives in the root-level packages (inferred from build structure). Dashboard is embedded from github.com/gorse-io/dashboard as a dependency.
👥Who it's for
Backend engineers and ML practitioners building recommendation features into SaaS platforms, e-commerce sites, or content platforms who need a self-hosted alternative to proprietary recommendation services and want to avoid vendor lock-in.
🌱Maturity & risk
Actively developed and production-ready. The codebase is substantial (1.4M LOC in Go), has comprehensive CI/CD workflows (.github/workflows/ with build_test.yml, build_release.yml, build_docker.yml), extensive Docker support across multiple variants (CUDA, MKL, OpenBLAS), and appears to have regular releases. The repository structure and tooling (Makefile, .golangci.yml) indicate professional maintenance.
Moderate dependency footprint with cloud storage SDKs (Google Cloud, Azure), vector DB integrations (Milvus), and ML frameworks (GoMLX), which increases surface area. The single primary binary entry points (cmd/gorse-master, cmd/gorse-worker, cmd/gorse-server) suggest potential single points of failure in distributed deployment. No specific evidence of breaking API changes in visible data, but distributed system complexity warrants careful upgrade planning.
Active areas of work
Recent activity includes updates to Go module dependencies (version 1.26 target), integration of LLM rankers and multimodal embeddings, and Docker image maintenance across multiple hardware variants (CUDA, MKL, OpenBLAS, Windows). GitHub workflows are actively maintained for Docker Hub publishing and release automation.
🚀Get running
Clone the repository: git clone https://github.com/gorse-io/gorse.git && cd gorse. Run the all-in-one playground immediately: docker run -p 8088:8088 zhenghaoz/gorse-in-one --playground. For local development, review the Makefile (present in repo) and .devcontainer/ for container setup, then examine cmd/gorse-in-one/main.go as the entry point.
Daily commands:
All-in-one mode: docker run -p 8088:8088 zhenghaoz/gorse-in-one --playground (accessible at http://localhost:8088). For distributed mode, run gorse-master, gorse-worker, and gorse-server separately (Dockerfiles in cmd/gorse-master/, cmd/gorse-worker/, cmd/gorse-server/). Local Go build: go build -o gorse-in-one ./cmd/gorse-in-one/main.go (assuming Go 1.26+ installed).
🗺️Map of the codebase
cmd/gorse-master/main.go— Master node entry point orchestrating distributed recommendation training and model serving across the cluster.cmd/gorse-server/main.go— API server entry point handling user/item/interaction ingestion and serving recommendations to clients.cmd/gorse-worker/main.go— Worker node entry point executing model training jobs distributed by the master node.common/ann/ann.go— Approximate Nearest Neighbor abstraction supporting HNSW and brute-force vector search for embedding-based recommendations.common/bfloats/bfloats.go— High-performance bfloat16 vector operations with SIMD acceleration (AVX, AVX512, NEON, RVV) critical for embedding scoring.common/blas/blas.go— Linear algebra abstraction supporting MKL and OpenBLAS backends for large-scale matrix operations in model training.go.mod— Module dependencies including ML frameworks (Goptuna), cloud storage (GCS, Azure), embeddings, and observability.
🛠️How to make changes
Add a new ANN implementation
- Create a new file implementing the ANN interface (Search, Insert, Build methods) in common/ann/ (
common/ann/newindex.go) - Add a case to the NewANN factory function in common/ann/ann.go selecting your implementation (
common/ann/ann.go) - Add corresponding test cases in common/ann/ann_test.go verifying search correctness and performance (
common/ann/ann_test.go)
Add platform-specific SIMD optimization for vector operations
- Create new Go file for target architecture (e.g., bfloats_riscv64.go) in common/bfloats/ (
common/bfloats/bfloats_riscv64.go) - Implement assembly source (.s file) or C source in common/bfloats/src/ and compile via Makefile (
common/bfloats/src/Makefile) - Register fallback pure-Go implementation with appropriate build constraints (e.g., //go:build riscv64) (
common/bfloats/bfloats_noasm.go) - Add platform-specific test cases in common/bfloats/bfloats_riscv64_test.go (
common/bfloats/bfloats_riscv64_test.go)
Add a new distributed command (master/server/worker variant)
- Create cmd/gorse-newrole/main.go as entry point with configuration parsing and startup logic (
cmd/gorse-newrole/main.go) - Create Dockerfile variants (base, cuda, mkl, openblas, windows) in cmd/gorse-newrole/ (
cmd/gorse-newrole/Dockerfile) - Register build artifacts in .github/workflows/build_docker.yml to trigger image builds (
.github/workflows/build_docker.yml) - Update client integration docs in client/README.md if exposing new APIs (
client/README.md)
Integrate a new cloud storage backend
- Add dependency in go.mod for cloud provider SDK (e.g., github.com/aws/aws-sdk-go-v2) (
go.mod) - Create storage adapter (e.g., common/storage/s3.go) implementing unified interface (
common/storage/s3.go) - Add provider selector logic to instantiate storage from configuration strings (
client/config.go) - Add integration tests using cloud storage fakes (e.g., fsouza/fake-gcs-server pattern) (
client/client_test.go)
🔧Why these technologies
- Go — Compiled language with excellent concurrency primitives (goroutines) for distributed systems; small binary footprint for containerization.
- BLAS (MKL/OpenBLAS) — Pluggable high-performance linear algebra backends enabling GPU acceleration via Intel MKL or leveraging OpenBLAS for CPU-only deployments.
- bfloat16 SIMD — Reduces memory footprint of embeddings by 50% vs float32 while maintaining precision; platform-specific assembly (AVX512, NEON, RVV) maximizes throughput.
- HNSW (Hierarchical Navigable Small World) — State-of-the-art approximate nearest neighbor algorithm trading small accuracy loss for 10–100× speedup in high-dimensional embedding search.
- Goptuna — Bayesian hyperparameter optimization framework automating model tuning to maximize recommendation quality without manual grid search.
- Cloud storage SDKs (GCS, Azure Blob, S3) — Multi-cloud portability; enables model versioning, checkpointing, and data persistence independent of compute nodes.
- Docker multi-stage builds with CUDA/MKL variants — Hardware-targeted images reduce bloat; CUDA enables GPU training; users select variant matching their infrastructure.
⚖️Trade-offs already made
-
Single-master distributed architecture
- Why: Simplifies coordination of training jobs and model distribution; avoids complex consensus protocols.
- Consequence: Master becomes single point of failure for job scheduling (mitigated by persistent job queue); does not scale horizontally for metadata coordination.
-
bfloat16 encoding for embeddings
- Why: Cuts memory/network overhead of vectors by 50
- Consequence: undefined
🪤Traps & gotchas
Database requirement: PostgreSQL or MySQL must be running and configured via client/config.toml before starting any component (no embedded fallback). Playground mode data: requires internet access to download from GitRec (https://gitrec.gorse.io/); offline mode requires pre-seeded data. Distributed setup complexity: master/worker/server must coordinate via shared database; single database failure cascades across entire system. Hardware variants: Docker images are variant-specific (CUDA for GPU, MKL/OpenBLAS for CPU math acceleration); pulling wrong variant silently degrades performance. Go version: explicitly requires Go 1.26+; older versions will fail module resolution.
🏗️Architecture
💡Concepts to learn
- Collaborative Filtering (User-User & Item-Item) — Core recommendation strategy in Gorse; understanding similarity-based neighborhood selection and matrix factorization is essential to interpreting model training and ranking output
- Vector Embeddings & Multimodal Learning — Gorse uses embeddings to represent text, images, and videos uniformly for content-based recommendations; critical for understanding the Milvus integration and LLM ranker capability
- Sparse Matrix Representations (BitSet) — Gorse uses github.com/bits-and-blooms/bitset for efficient storage of user-item interaction matrices at scale; understanding sparsity is key to tuning memory usage
- Rate Limiting & Backoff (juju/ratelimit, cenkalti/backoff) — Gorse uses these libraries to prevent overwhelming downstream services (databases, cloud storage) during batch training; critical for production stability
- Distributed Task Scheduling (Master-Worker Pattern) — Gorse's architecture (gorse-master coordinating gorse-worker processes) implements distributed model training; understanding task queuing and checkpointing is essential for scaling
- Bayesian Optimization (c-bata/goptuna) — Gorse uses Goptuna for hyperparameter tuning of recommendation models; understanding the exploration-exploitation tradeoff helps explain model training convergence
- OpenTelemetry Instrumentation (XSAM/otelsql) — Gorse integrates tracing and metrics via OpenTelemetry; essential for debugging distributed deployments and understanding component latency
🔗Related repos
recommenders/recommenders— Microsoft's suite of collaborative filtering and content-based recommender algorithms; reference implementation for classical techniques Gorse supportszlpure/MachineLearning— Educational ML implementations in Go; overlaps with Gorse's algorithmic foundation for training coremilvus-io/milvus— Vector database that Gorse integrates with (milvus-sdk-go/v2 in dependencies) for multimodal embedding storage and retrievalgorse-io/dashboard— Companion frontend repo (embedded as dependency in go.mod) providing the web UI for recommendation pipeline editing and monitoringgorse-io/gorse-go— Official Go SDK/client library (gorse-go v0.5.0-alpha.3 in dependencies) for applications integrating Gorse recommendations
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for common/bfloats package across all architectures
The bfloats package has architecture-specific implementations (amd64, arm64) with existing tests only for amd64 (_amd64_test.go). This creates a gap in test coverage for arm64 implementations, which is increasingly important for deployment on edge devices and ARM-based infrastructure. New contributors can add architecture-agnostic tests and arm64-specific tests to ensure cross-platform reliability.
- [ ] Review common/bfloats/bfloats_amd64_test.go to understand existing test patterns
- [ ] Create common/bfloats/bfloats_arm64_test.go with parallel test cases for arm64-specific optimizations
- [ ] Add common/bfloats/bfloats_test.go with architecture-agnostic tests that run on all platforms
- [ ] Verify tests pass locally and update codecov.yml if needed to track bfloats coverage
Add integration tests for Docker build workflows in build_docker.yml GitHub Action
The repo has 8 different Dockerfile variants (cuda, mkl, openblas, windows) across 4 different binaries (gorse-in-one, master, server, worker) but the build_docker.yml workflow doesn't appear to have comprehensive validation. New contributors can create a test matrix that validates each Dockerfile builds successfully and basic health checks (e.g., binary executes, version flag works) to catch build breakages early.
- [ ] Review .github/workflows/build_docker.yml to understand current build process
- [ ] Create a new test workflow file that builds and tests each Dockerfile variant against a test matrix
- [ ] Add basic sanity checks (e.g., gorse-server --version, binary size validation) for built images
- [ ] Document the Docker build test process in CONTRIBUTING.md with examples for local testing
Add unit tests for client/config.go configuration parsing and validation
The client package has config.go and config.toml but client_test.go appears minimal based on the file listing. Configuration parsing is critical for the Gorse client library (gorse-go dependency). New contributors can add comprehensive unit tests for config structure unmarshaling, TOML parsing, validation rules, and error handling to ensure robust client initialization across different deployment scenarios.
- [ ] Review client/config.go to identify all config fields and validation logic
- [ ] Examine client/config.toml to create test fixtures for valid/invalid configurations
- [ ] Add test cases in client_test.go covering: valid config parsing, missing required fields, type mismatches, and edge cases
- [ ] Add fixtures directory (client/testdata/) with sample config files for different deployment scenarios (docker-compose, standalone, cluster)
🌿Good first issues
- Add integration tests for cmd/gorse-worker/main.go covering model training pipeline with mock data; currently client_test.go only covers API client, not core training loop
- Document the exact schema setup required in PostgreSQL/MySQL for cmd/gorse-master initialization (likely missing from README beyond quick-start); add SQL migrations to client/ directory
- Implement request validation tests for REST endpoint parameter bounds in cmd/gorse-server (verify API contract via go-playground/validator/v10 integration already present in dependencies but possibly incomplete coverage)
⭐Top contributors
Click to expand
Top contributors
- @zhenghaoz — 68 commits
- @zhangzhenghao — 17 commits
- @Goodnight77 — 3 commits
- @slaout — 2 commits
- @guandeng — 1 commits
📝Recent commits
Click to expand
Recent commits
ae86dfa— feat: add API event recorder interface for billing (#1266) (zhangzhenghao)d54a2ea— fix: tolerate Index already exists in Redis Cluster FT.CREATE (#1265) (guandeng)cda4171— feat: add separate access log for RESTful API (#1264) (zhangzhenghao)8ff190e— chore: upgrade GORM and database drivers (#1262) (zhangzhenghao)09bd49a— perf: optimize Sqrt function by using math32.Sqrt (#1258) (Goodnight77)efe92fd— refactor: drop redundant int64 in newrandomgenerator (#1256) (Goodnight77)a2abb9e— refactor: drop duplicate NewNormalVector method (#1257) (Goodnight77)1dac895— refactor: remove duplicate bfloat16 conversion functions from floats package (#1254) (zhangzhenghao)6822196— Reduce memory usage in dataset loading and item cache (#1252) (zhangzhenghao)bf4dd35— Store embedding vectors in item cache to BF16 (#1250) (zhangzhenghao)
🔒Security observations
- High · Hardcoded Database Credentials in docker-compose.yml —
docker-compose.yml. The docker-compose.yml file contains hardcoded credentials for MySQL database in plaintext, including MYSQL_ROOT_PASSWORD, MYSQL_USER, and MYSQL_PASSWORD. This is a significant security risk as credentials are exposed in version control and accessible to anyone with repository access. Fix: Use environment variables or Docker secrets to manage sensitive credentials. Move sensitive data to a .env file (added to .gitignore) or use Docker Compose secrets. Example: Replace MYSQL_ROOT_PASSWORD: root_pass with MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD} - High · Outdated Go Version —
go.mod. The go.mod specifies 'go 1.26', which appears to be a future/invalid version number (as of current Go release cycles). This may indicate a configuration error or version mismatch that could lead to unexpected behavior or use of unsupported tooling. Fix: Verify and use a valid, stable Go version (e.g., go 1.22 or go 1.23). Ensure the go version matches your development and production environments. - High · Multiple Outdated and Vulnerable Dependencies —
go.mod (dependencies section). Several dependencies have known security vulnerabilities or are significantly outdated: 'github.com/lib/pq v1.12.3' (PostgreSQL driver with known CVEs), 'github.com/go-sql-driver/mysql v1.9.3', and others. Outdated dependencies may contain unpatched security flaws. Fix: Run 'go get -u' to update dependencies to their latest versions. Use 'go list -u -m all' to identify outdated packages. Consider using 'go vulnerabilities' tool or Dependabot to track security updates regularly. - Medium · Potential SQL Injection Risk - Unclear Query Handling —
client/, cmd/ directories (database integration points). The codebase includes multiple database drivers (MySQL, PostgreSQL, ClickHouse, Milvus) but without visibility into the actual SQL query construction patterns in the code, there's a risk of SQL injection vulnerabilities if raw string concatenation is used instead of parameterized queries. Fix: Audit all database queries to ensure parameterized queries are used exclusively. Use prepared statements and bound parameters. Avoid string concatenation for query construction. Consider using an ORM like GORM or sqlc for safer query building. - Medium · Missing CORS and Security Headers Configuration —
cmd/gorse-server/, cmd/gorse-master/ (REST API endpoints). The project uses 'github.com/emicklei/go-restful/v3' for REST API but there's no visible configuration for CORS policies or security headers (X-Frame-Options, Content-Security-Policy, etc.) in the file structure, which could lead to common web vulnerabilities. Fix: Implement middleware for security headers. Configure CORS to restrict allowed origins. Add X-Frame-Options, X-Content-Type-Options, Content-Security-Policy headers. Use the 'github.com/gorilla/securecookie' package already in dependencies for session management. - Medium · Exposed Ports in Docker Configuration —
docker-compose.yml. The docker-compose.yml exposes database ports directly (3306 for MySQL) to all interfaces without authentication or network restrictions, making the database potentially accessible from outside the intended network. Fix: Change port mappings to restrict access: use '127.0.0.1:3306:3306' instead of '3306:3306'. Implement network segmentation using Docker networks. Require strong authentication credentials. Use firewall rules to restrict access. - Medium · Third-Party OpenID Connect Integration without Validation —
Integration with 'github.com/coreos/go-oidc/v3'. The codebase includes 'github.com/coreos/go-oidc/v3' for OIDC authentication but there's no visible configuration for secure token validation, CSRF protection, or state parameter validation in the file structure. Fix: Implement strict OIDC validation: verify issuer, audience, and signature. Use state parameters for CSRF protection. Validate all token claims. Implement secure session management. Review OIDC configuration in the actual implementation code.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/gorse-io/gorse shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live gorse-io/gorse
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/gorse-io/gorse.
What it runs against: a local clone of gorse-io/gorse — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in gorse-io/gorse | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of gorse-io/gorse. If you don't
# have one yet, run these first:
#
# git clone https://github.com/gorse-io/gorse.git
# cd gorse
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of gorse-io/gorse and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "gorse-io/gorse(\\.git)?\\b" \\
&& ok "origin remote is gorse-io/gorse" \\
|| miss "origin remote is not gorse-io/gorse (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "cmd/gorse-master/main.go" \\
&& ok "cmd/gorse-master/main.go" \\
|| miss "missing critical file: cmd/gorse-master/main.go"
test -f "cmd/gorse-server/main.go" \\
&& ok "cmd/gorse-server/main.go" \\
|| miss "missing critical file: cmd/gorse-server/main.go"
test -f "cmd/gorse-worker/main.go" \\
&& ok "cmd/gorse-worker/main.go" \\
|| miss "missing critical file: cmd/gorse-worker/main.go"
test -f "common/ann/ann.go" \\
&& ok "common/ann/ann.go" \\
|| miss "missing critical file: common/ann/ann.go"
test -f "common/bfloats/bfloats.go" \\
&& ok "common/bfloats/bfloats.go" \\
|| miss "missing critical file: common/bfloats/bfloats.go"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/gorse-io/gorse"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.
Embed this chat in your README →
Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.
<iframe src="https://repopilot.app/embed/gorse-io/gorse" width="100%" height="500" style="border:1px solid #d0d7de; border-radius:8px;" allow="microphone" loading="lazy" ></iframe>