go-ego/riot

Item: go-ego/riot
Rating: 3
Author: RepoPilot

Go Open Source, Distributed, Simple and efficient Search Engine; Warning: This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.

Mixed

Stale — last commit 6y ago

weakest axis

Use as dependencyMixed

last commit was 6y ago; top contributor handles 97% of recent commits

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓4 active contributors
✓Apache-2.0 licensed
✓CI configured

Show all 7 evidence items →

✓Tests present
⚠Stale — last commit 6y ago
⚠Small team — 4 contributors active in recent commits
⚠Single-maintainer risk — top contributor 97% of recent commits

What would change the summary?

→Use as dependency Mixed → Healthy if: 1 commit in the last 365 days

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Forkable](https://repopilot.app/api/badge/go-ego/riot?axis=fork)](https://repopilot.app/r/go-ego/riot)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/go-ego/riot on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: go-ego/riot

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/go-ego/riot shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Stale — last commit 6y ago

4 active contributors
Apache-2.0 licensed
CI configured
Tests present
⚠ Stale — last commit 6y ago
⚠ Small team — 4 contributors active in recent commits
⚠ Single-maintainer risk — top contributor 97% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live go-ego/riot repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/go-ego/riot.

What it runs against: a local clone of go-ego/riot — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in go-ego/riot | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 2062 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>go-ego/riot</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of go-ego/riot. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/go-ego/riot.git
#   cd riot
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of go-ego/riot and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "go-ego/riot(\\.git)?\\b" \\
  && ok "origin remote is go-ego/riot" \\
  || miss "origin remote is not go-ego/riot (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "engine.go" \\
  && ok "engine.go" \\
  || miss "missing critical file: engine.go"
test -f "core/indexer.go" \\
  && ok "core/indexer.go" \\
  || miss "missing critical file: core/indexer.go"
test -f "core/ranker.go" \\
  && ok "core/ranker.go" \\
  || miss "missing critical file: core/ranker.go"
test -f "engine/engine.go" \\
  && ok "engine/engine.go" \\
  || miss "missing critical file: engine/engine.go"
test -f "counters.go" \\
  && ok "counters.go" \\
  || miss "missing critical file: counters.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 2062 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~2032d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/go-ego/riot"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Riot is a distributed, full-text search engine written in Go that indexes and searches documents with support for Chinese word segmentation, BM25 scoring, logical queries, and token proximity calculation. It achieves 19K search QPS with 1.65ms response time and can index 1M documents (500MB data) in 28 seconds, designed as a simpler alternative to Elasticsearch for Go applications. Monorepo structure: core/ contains the indexer and ranker logic (core/indexer.go, core/ranker.go, core/data.go), data/ contains distributed server implementations with heartbeat support (data/riot/, data/riot1/), docs/ houses implementation guides (BM25, token proximity, custom scoring), and examples in data/client/. Configuration uses TOML files (data/conf/riot.toml).

👥Who it's for

Go backend developers and DevOps engineers building search features into applications who want a lightweight, distributed search engine with Chinese language support without the operational overhead of Elasticsearch; also suitable for teams needing custom scoring criteria and real-time indexing capabilities.

🌱Maturity & risk

Experimental and beta-stage: the README explicitly warns 'This is V1 and beta version, because of big memory consume, and the V2 will be rewrite all code.' Active CI/CD pipelines (CircleCI, Travis, AppVeyor) and test files present (core/*_test.go), but the planned v2 rewrite indicates current design has fundamental limitations. Not recommended for new production deployments.

High risk: the project explicitly states memory consumption is a problem triggering a complete v2 rewrite, indicating v1 is not production-grade. Dependencies on stable packages (badger for storage, gse for segmentation) but the core indexing/ranking logic may have unfixed issues. Last visible activity unclear from provided data, but the v2 rewrite announcement suggests v1 is in maintenance mode rather than active development.

Active areas of work

The repository appears to be in maintenance rather than active feature development—the README's explicit warning about v2 rewriting all code suggests the team is in planning/development for a complete redesign to address memory issues. No recent milestones or active PR information visible in provided file list.

🚀Get running

Clone the repo: git clone https://github.com/go-ego/riot.git && cd riot. Install dependencies: go mod download (uses go 1.13+). Review examples in data/client/main.go or run test suites: go test ./core/... to verify the build.

Daily commands: No explicit Makefile visible; run test suite: go test -v ./core/. For distributed example: start primary server go run data/riot/main.go and secondary go run data/riot1/main.go, then use data/client/main.go to query. Configure via data/conf/riot.toml (indexing parameters) and data/conf/log.toml (logging).

🗺️Map of the codebase

engine.go — Main entry point for the Riot search engine; defines core Engine struct and primary indexing/search APIs that all consumers use.
core/indexer.go — Core indexing logic that handles document tokenization, inverted index construction, and index persistence—fundamental to search functionality.
core/ranker.go — Ranking and scoring engine that implements BM25 and custom scoring criteria; critical for result relevance.
engine/engine.go — Extended engine implementation with distributed search capabilities and configuration management.
counters.go — Thread-safe counter management for document tracking and statistics; essential for index consistency and monitoring.
go.mod — Declares key dependencies (badger for persistent storage, gse for Chinese segmentation, grpclb for distributed features).

🧩Components & responsibilities

Engine (Go, gRPC) — User-facing API; coordinates indexing, searching, and deletion; manages document lifecycle
- Failure mode: If Engine panics, all indexing/search halts; corrupted index requires rebuild
Indexer (gse, BadgerDB) — Tokenizes documents, builds inverted indices, merges postings, persists to storage
- Failure mode: Corrupted postings or token lists cause irrelevant search results or queries to miss documents
Ranker (BM25 algorithm) — Scores candidate documents using BM25 or custom criteria; orders results by relevance
- Failure mode: Incorrect scoring weights return poor relevance order; custom criteria bugs produce nonsensical ranks
BadgerDB Storage (BadgerDB) — Persists inverted index, document metadata, and counters; provides fast key-value lookups
- Failure mode: Disk corruption or loss deletes entire index; no backup means data gone
Distributed Data Server (gRPC, heartbeat monitoring) — Replicates index across nodes; handles shard assignment and inter-node communication
- Failure mode: Network partition or node crash loses shard; no automatic failover in V1

🔀Data flow

Client → Engine — Document with fields, content, and metadata
Engine → Indexer — Parsed DocIndexData; receives tokenized terms and field values
Indexer → BadgerDB — Inverted index postings, document vectors, term frequencies, and counter state
Client → Engine — Query string or structured query parameters
Engine → Indexer — Tokenized query; receives candidate document IDs and posting lists
Indexer → Ranker — Candidate docs with term frequencies and field values
Ranker → Engine — Ranked results with BM25 or custom scores
Engine → Client — Top-K matching documents with relevance scores

🛠️How to make changes

Add a Custom Scoring Criteria

Define a struct implementing the ScoringCriteria interface with custom scoring logic (core/ranker.go)
Implement CalcScore(field, token, freq) method to compute custom relevance scores (examples/weibo/custom_scoring_criteria.go)
Pass custom criteria to engine via engine.IndexOptions or engine.SearchOptions (engine.go)

Index and Search a New Document Type

Create document with DocIndexData struct containing id, fields, and content (core/data.go)
Call engine.Index(doc) to tokenize and add to inverted index (engine.go)
Retrieve with engine.Search(query) using supported query syntax (engine.go)
Optionally persist to badger store via engine/engine.go data server (data/main.go)

Set Up Distributed Multi-Node Search

Configure primary riot node with peer addresses and replication settings (data/riot/main.go)
Start secondary replica nodes to distribute index shards (data/riot1/main.go)
Use gRPC client to send index and search requests across nodes (engine/engine.go)
Monitor node health via heartbeat config and metrics endpoints (data/riot/heartb/main.go)

Add Chinese Text Support

Dictionary is loaded from data/dict/dictionary.txt automatically (data/dict/dictionary.txt)
gse library (go-ego/gse) handles Chinese segmentation in core indexer (core/indexer.go)
Stop tokens can be configured in data/dict/stop_tokens.txt (data/dict/stop_tokens.txt)
Use pinyin matching examples for Romanized search (examples/pinyin/main.go)

🔧Why these technologies

BadgerDB — Embedded key-value store for fast, persistent document and inverted index storage without external dependencies
gse (go-ego/gse) — Provides efficient Chinese text segmentation and tokenization critical for CJK language support
gRPC + grpclb — Enables distributed indexing and search across multiple nodes with built-in load balancing
BM25 Ranking — Industry-standard probabilistic ranking model proven effective for information retrieval relevance

⚖️Trade-offs already made

In-process indexing vs. external index service
- Why: Simplicity and single-process deployment for smaller use cases
- Consequence: Memory usage scales with index size; V2 planned to address memory consumption issues
Badger persistence over distributed consensus
- Why: Lightweight, embeddable storage without Raft/Paxos complexity
- Consequence: Limited to single-node fault tolerance; replication via separate data servers
Custom scoring over pluggable ML models
- Why: Keep search deterministic and fast without external model inference
- Consequence: Cannot adapt to user behavior or real-time relevance feedback

🚫Non-goals (don't propose these)

Real-time distributed consensus or ACID transactions across shards
Authentication and access control
Full SQL query language or complex joins
Built-in web crawler or document ingestion from external sources
Memory-efficient indexing (V1 acknowledged as high memory consumer)

🪤Traps & gotchas

Memory consumption is a known showstopper (README warns v2 will rewrite for this reason)—don't expect v1 to handle large-scale corpora efficiently. Configuration expects TOML files in data/conf/; missing or malformed config will cause silent failures. Chinese word segmentation via gse requires dictionary files (data/dict/dictionary.txt, stop_tokens.txt); ensure they're accessible. Distributed mode uses heartbeat (data/riot/heartb/hb.toml) for node discovery—timing misconfigurations can cause split-brain. The warning label 'beta' means no stability guarantees between releases.

🏗️Architecture

💡Concepts to learn

Inverted Index — The core data structure Riot uses in core/indexer.go—maps tokens to document IDs and positions, enabling fast full-text search
BM25 Ranking Algorithm — The relevance scoring method implemented in core/ranker.go; critical to understand for tuning search result quality and custom scoring in Riot
Token Proximity — Riot feature documented in docs/en/token_proximity.md that scores documents higher when query tokens appear close together; stored as token positions in the index
Chinese Word Segmentation — Chinese lacks space delimiters, so Riot uses gse to split text into tokens before indexing; critical for accurate search on Chinese documents
Distributed Indexing with Heartbeat — Riot's data/riot/ and data/riot1/ example show how to shard indexes across nodes with heartbeat-based node discovery; essential for scaling beyond single-machine limits
Persistent Storage Backends — Riot supports pluggable storage (badger, leveldb); the choice affects memory overhead and crash recovery, directly addressing v1's memory problem
Logical Query Parsing — Riot supports AND/OR/NOT queries (mentioned in docs/en/logic.md); requires parsing and evaluating query trees against the inverted index

go-ego/gse — Tokenizer dependency used by Riot for Chinese word segmentation; understanding this is necessary for customizing text processing
dgraph-io/badger — Embedded KV store backend used by Riot for persistent storage; Riot examples depend on understanding badger's API
blevesearch/bleve — Alternative Go full-text search library; similar use case but different design philosophy and architecture
go-ego/murmur — Hashing library used internally by Riot for token fingerprinting in the indexer
elastic/elasticsearch — Industry standard distributed search engine Riot is designed as a lighter-weight alternative to; useful for understanding feature parity and design tradeoffs

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive integration tests for distributed search in core/

The repo has docs/en/distributed_indexing_and_search.md documenting distributed capabilities, but core/ lacks integration tests. Currently only core/indexer_test.go and core/ranker_test.go exist with limited coverage. This PR would add integration tests for multi-node indexing/search scenarios, validating the distributed architecture works end-to-end.

[ ] Create core/distributed_test.go with tests for cross-node document indexing
[ ] Add test fixtures in core/test_utils.go for multi-engine setup
[ ] Test consistency between distributed and single-node ranking results
[ ] Validate heartbeat/health check mechanism (data/riot/heartb/) in tests

Add missing unit tests for BM25 ranking algorithm implementation

The repo documents BM25 scoring extensively (docs/en/bm25.md and docs/zh/bm25.md) and has core/ranker.go implementing it, but core/ranker_test.go appears minimal. This PR would add comprehensive tests validating BM25 calculation correctness with known test cases and edge cases.

[ ] Create test cases in core/ranker_test.go with known BM25 scores for validation
[ ] Test parameter variations (k1, b constants) in core/ranker.go
[ ] Add edge case tests: empty documents, single-term queries, IDF calculation
[ ] Validate token proximity scoring mentioned in docs/en/token_proximity.md

Add GitHub Actions workflow for Go module vulnerability scanning and dependency management

The repo has .github/workflows/go.yml but it lacks security-focused CI. With v1 beta status and known memory consumption issues, plus outdated dependencies (go 1.13, old badger version), a vulnerability scanner and dependency update workflow would help maintain security. Existing workflows are CircleCI/Travis/Appveyor (legacy).

[ ] Create .github/workflows/security.yml using nancy or gosec for vulnerability detection
[ ] Add dependabot.yml or renovate.json for automated dependency updates
[ ] Test against multiple Go versions (1.13+) in the workflow
[ ] Add badge to README.md for security scan status

🌿Good first issues

Add benchmarking tests for core/ranker.go similar to core/ranker_test.go pattern to establish performance baselines and catch regressions during v2 planning
Write integration tests in a new data/integration_test.go that exercise the full indexing → ranking → retrieval pipeline with sample docs from docs/en/codelab.md examples
Document the uint64.go utility module (core/uint64.go) with comments and add examples showing how it's used in indexer.go for bit-packing token positions

⭐Top contributors

Click to expand

@vcaesar — 97 commits
@liubog2008 — 1 commits
@DmitryOlshansky — 1 commits
@h8liu — 1 commits

📝Recent commits