toshi-search/Toshi
A full-text search engine in rust
Healthy across all four use cases
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓7 active contributors
- ✓MIT licensed
- ✓CI configured
Show all 6 evidence items →Show less
- ✓Tests present
- ⚠Stale — last commit 2y ago
- ⚠Concentrated ownership — top contributor handles 75% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/toshi-search/toshi)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/toshi-search/toshi on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: toshi-search/Toshi
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/toshi-search/Toshi shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- 7 active contributors
- MIT licensed
- CI configured
- Tests present
- ⚠ Stale — last commit 2y ago
- ⚠ Concentrated ownership — top contributor handles 75% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live toshi-search/Toshi
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/toshi-search/Toshi.
What it runs against: a local clone of toshi-search/Toshi — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in toshi-search/Toshi | Confirms the artifact applies here, not a fork |
| 2 | License is still MIT | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 869 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of toshi-search/Toshi. If you don't
# have one yet, run these first:
#
# git clone https://github.com/toshi-search/Toshi.git
# cd Toshi
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of toshi-search/Toshi and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "toshi-search/Toshi(\\.git)?\\b" \\
&& ok "origin remote is toshi-search/Toshi" \\
|| miss "origin remote is not toshi-search/Toshi (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
&& ok "license is MIT" \\
|| miss "license drift — was MIT at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "toshi-server/src/bin/toshi.rs" \\
&& ok "toshi-server/src/bin/toshi.rs" \\
|| miss "missing critical file: toshi-server/src/bin/toshi.rs"
test -f "toshi-server/src/router.rs" \\
&& ok "toshi-server/src/router.rs" \\
|| miss "missing critical file: toshi-server/src/router.rs"
test -f "toshi-server/src/index.rs" \\
&& ok "toshi-server/src/index.rs" \\
|| miss "missing critical file: toshi-server/src/index.rs"
test -f "toshi-server/src/handlers/mod.rs" \\
&& ok "toshi-server/src/handlers/mod.rs" \\
|| miss "missing critical file: toshi-server/src/handlers/mod.rs"
test -f "toshi-types/src/lib.rs" \\
&& ok "toshi-types/src/lib.rs" \\
|| miss "missing critical file: toshi-types/src/lib.rs"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 869 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~839d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/toshi-search/Toshi"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Toshi is a distributed, full-text search engine written in Rust that mirrors Elasticsearch's functionality. It leverages Tantivy (Rust's Lucene equivalent) for indexing and includes built-in clustering via Raft consensus (toshi-raft) for multi-node deployments, serving as a production-alternative to Java-based search systems with memory safety guarantees. Workspace monorepo (Cargo.toml defines toshi-server, toshi-client, toshi-types as active members; toshi-proto and toshi-raft are commented out). Core architecture: toshi-server/ contains the HTTP API and indexing logic, toshi-client/ provides async/sync query clients (with hyper and isahc variants), toshi-proto/ generates gRPC/cluster stubs from .proto files, toshi-raft/ implements Raft state machine via handle.rs and rpc_server.rs. Configuration driven by config/*.toml files.
👥Who it's for
Backend engineers and DevOps teams replacing Elasticsearch deployments who want a memory-safe, single-binary search engine; Rust systems developers who want a reference implementation of distributed indexing and Raft consensus in Rust.
🌱Maturity & risk
Early-stage/experimental: the README explicitly states 'far from production ready' and development is slow. CI/CD is set up (GitHub Actions in .github/workflows/rust.yml), codecov integration exists, but the workspace comments out toshi-proto and toshi-raft members, suggesting core clustering features are incomplete. Active development visible but not production-ready.
High risk: core distributed features (Raft consensus in toshi-raft/, protobuf schemas in toshi-proto/) are disabled in Cargo.toml workspace, indicating incomplete/unstable implementations. Single-maintainer project ('I'm just slow'), no visible contribution guidelines, and the codebase toggles major features on/off via comments rather than feature flags, risking breaking changes. Dependency on Tantivy (external) and custom proto code generation (toshi-proto/build.rs) adds coupling.
Active areas of work
Unable to determine from file list alone; check GitHub Issues/PRs. However, code structure suggests active work on: (1) stabilizing toshi-proto protobuf compilation (toshi-proto/build.rs exists but members disabled), (2) completing Raft consensus in toshi-raft/, (3) client library variants (isahc_client.rs alongside hyper_client.rs suggests transport layer exploration). Coverage tracking via codecov.yml and CI pipeline active.
🚀Get running
git clone https://github.com/toshi-search/Toshi.git
cd Toshi
rustup update stable
cargo build --release
cargo run --bin toshi-server -- --config config/config.toml
Daily commands:
Server: cargo run --bin toshi-server -- --config config/config.toml (config/config.toml specifies host=127.0.0.1, port=8080, data path, writer memory, merge policy). Tests: cargo test (runs across workspace). Examples: cargo run --example bool_query --manifest-path toshi-client/Cargo.toml (see toshi-client/examples/ for sync_search, range_query, etc).
🗺️Map of the codebase
toshi-server/src/bin/toshi.rs— Entry point for the search engine server; initializes router, handlers, and startup logic.toshi-server/src/router.rs— Defines HTTP routing and request dispatching to all handler endpoints.toshi-server/src/index.rs— Core abstraction for managing search indexes; handles indexing and query execution.toshi-server/src/handlers/mod.rs— Handler module registry; documents the separation of concerns across search, bulk, index, and list operations.toshi-types/src/lib.rs— Shared type definitions for queries, errors, and client-server contracts; used across all crates.toshi-client/src/lib.rs— Client library for querying the search engine; demonstrates query construction patterns.Cargo.toml— Workspace manifest defining crate dependencies and compilation profiles; critical for build configuration.
🧩Components & responsibilities
- toshi-server (Tokio, hyper, Tantivy, serde) — HTTP API server; receives queries, manages indexes, executes searches, returns results
- Failure mode: If server crashes, all in-memory indexes are lost unless persisted to disk; no failover
- toshi-types (serde, serde_json) — Shared type definitions: Query, SearchResults, error types; ensures API contract consistency
- Failure mode: Type mismatch between client and server; serialization errors if types change incompatibly
- toshi-client (hyper, isahc, serde_json) — Rust client library for building and executing queries against server
- Failure mode: Network errors or server unavailability prevent queries; no built-in retry logic
- undefined — undefined
🛠️How to make changes
Add a New Search Query Type
- Create a new query variant file (e.g., regex_query.rs) in toshi-types/src/query/ (
toshi-types/src/query/regex.rs) - Define the query struct with serde serialization support (
toshi-types/src/query/regex.rs) - Add the variant to the Query enum in toshi-types/src/query/mod.rs (
toshi-types/src/query/mod.rs) - Implement conversion to tantivy query in toshi-server/src/index.rs in the execute_query method (
toshi-server/src/index.rs) - Add example in toshi-client/examples/ demonstrating the new query type (
toshi-client/examples/regex_query.rs)
Add a New HTTP Handler Endpoint
- Create new handler file in toshi-server/src/handlers/ (e.g., status.rs) (
toshi-server/src/handlers/status.rs) - Define async handler function and register in mod.rs (
toshi-server/src/handlers/mod.rs) - Add route to router.rs mapping HTTP method/path to the handler (
toshi-server/src/router.rs) - Return response using types from toshi-types/src/server.rs (
toshi-server/src/handlers/status.rs)
Extend Configuration and Settings
- Update toshi-server/src/settings.rs to add new config struct fields with serde attributes (
toshi-server/src/settings.rs) - Add corresponding TOML section to config/config.toml for default values (
config/config.toml) - Access settings in handlers via dependency injection or thread-local storage (
toshi-server/src/bin/toshi.rs)
🔧Why these technologies
- Rust — Memory safety without garbage collection; performance critical for search engine throughput and latency
- Tantivy — Pure Rust full-text search library; avoids JVM overhead and provides fine-grained indexing control
- Tokio async runtime — Handles concurrent HTTP requests efficiently with minimal thread overhead
- Raft consensus (proto) — Enables distributed search across multiple nodes with strong consistency guarantees
⚖️Trade-offs already made
-
Single-crate server (toshi-server) instead of microservices
- Why: Simpler deployment and testing; avoids distributed system complexity during early development
- Consequence: Limited horizontal scaling; all operations must fit within one process
-
Raft clustering code commented out in workspace
- Why: Raft implementation incomplete or unstable; focusing on single-node stability first
- Consequence: No distributed consensus yet; must run separate instances with external coordination
-
Multiple client implementations (hyper, isahc)
- Why: Support different async runtimes and HTTP backends
- Consequence: Additional maintenance burden; must keep both implementations in sync
🚫Non-goals (don't propose these)
- Does not provide authentication or authorization (no user/role management)
- Does not handle document versioning or temporal queries
- Does not support transactions across multiple indexes
- Not a distributed system in the current stable release (Raft disabled)
🪤Traps & gotchas
Workspace state: toshi-proto and toshi-raft are commented out in Cargo.toml members—do NOT uncomment without understanding build.rs in toshi-proto (requires protoc compiler). Config paths: default config/config.toml uses relative path 'data2/' for indices; running from wrong directory causes index misses. Memory: writer_memory=200000000 (200 MB) in config.toml must fit in available RAM or indexing stalls; tuning required for large datasets. Experimental features: experimental=true and master=true under [experimental_features] enable untested clustering—avoid in testing. Logging: logging.toml defines log levels; if log_level='debug' in config, verbose output may slow searches.
🏗️Architecture
💡Concepts to learn
- Tantivy Indexing — Toshi delegates all full-text search semantics to Tantivy; understanding Tantivy's segment-based index, tokenization, and scoring is essential to tune Toshi's search_quality and writer_memory config parameters.
- Raft Consensus — toshi-raft/ implements distributed log replication for cluster coordination; understanding Raft (leader election, log replication, commit index) is critical to enable the currently-disabled clustering features.
- Protocol Buffers (protobuf) — toshi-proto/ generates gRPC stubs from .proto files for inter-node RPC; protoc compiler and build.rs codegen are required to compile cluster features.
- Memory-Mapped I/O — Tantivy and Toshi use mmap for fast index file access; understanding memory-mapped segments explains why writer_memory tuning and data path configuration critical for performance.
- Inverted Index — Fundamental data structure in full-text search; Tantivy's inverted index design (term → document postings) underpins how Toshi tokenizes and scores queries like BoolQuery and RangeQuery.
- Log-Structured Merge Trees (LSM) — Toshi's merge_policy in config.toml controls Tantivy's segment merging strategy; LSM concepts (min_merge_size, level_log_size) are tuning knobs for write-heavy vs. read-heavy workloads.
- Async/Await (Tokio) — toshi-client examples (sync_search.rs, hyper_example.rs) and server all use async Rust; understanding Tokio runtime, Future traits, and blocking vs non-blocking I/O is essential for extending the server or client.
🔗Related repos
tantivy-search/tantivy— Core dependency for full-text indexing; Toshi is a wrapper that adds distribution and REST API around Tantivy's index engine.elastic/elasticsearch-rs— Elasticsearch Rust client; provides an alternative client library for Elasticsearch if comparing protocol/API design patterns.etcd-io/etcd— Reference implementation of Raft consensus (in Go); toshi-raft mirrors its state machine design for distributed coordination.meilisearch/meilisearch— Similar Rust-based search engine with REST API; different architecture (simpler, no Raft clustering, but production-ready).quickwit-oss/quickwit— Modern Rust search engine emphasizing cloud-native design and log analytics; represents evolved architectural choices vs. Toshi's Tantivy-centric approach.
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration tests for search handlers in toshi-server/src/handlers/search.rs
The search.rs handler is a critical path in the full-text search engine but has no visible test coverage in the file structure. Adding comprehensive integration tests would validate query parsing, result ranking, and edge cases (empty queries, malformed input, large result sets). This directly improves reliability of the core search functionality.
- [ ] Create toshi-server/tests/search_integration_test.rs with test fixtures
- [ ] Add tests for basic keyword search, boolean queries, and range queries (reference existing examples in toshi-client/examples/)
- [ ] Test error handling for malformed search payloads
- [ ] Run tests with
cargo testand ensure CI integration in .github/workflows/rust.yml
Implement missing handlers' test coverage for bulk.rs and index.rs operations
toshi-server/src/handlers/ has bulk.rs and index.rs handlers managing critical index operations (document insertion, indexing), but no dedicated test files exist. These operations directly affect data persistence and consistency. Adding tests would catch regressions in bulk indexing and index management.
- [ ] Create toshi-server/tests/bulk_handler_test.rs with tests for batch document insertion
- [ ] Create toshi-server/tests/index_handler_test.rs for index creation/deletion/listing operations
- [ ] Test error cases: duplicate document IDs, invalid index names, concurrent operations
- [ ] Verify tests work with the workspace config in Cargo.toml and document in README.md
Add SECURITY.md vulnerability disclosure policy details and dependency audit workflow
SECURITY.md exists but the file structure shows it's minimal. The repo has dependabot.yml configured but no automated security scanning workflow in .github/workflows/. Add a GitHub Actions workflow to run cargo audit on PRs and a detailed security policy in SECURITY.md to help contributors report vulnerabilities responsibly.
- [ ] Expand SECURITY.md with reporting instructions, supported versions, and response timeline
- [ ] Create .github/workflows/security-audit.yml to run
cargo auditon push/PR events - [ ] Configure the audit workflow to fail the build on vulnerability findings (blocking merge)
- [ ] Document the new workflow in CONTRIBUTING.md or README.md for contributor awareness
🌿Good first issues
- Add integration tests for toshi-client query types (bool_query.rs, range_query.rs exist as examples but no tests in toshi-client/tests/). Create toshi-client/tests/integration_test.rs validating each example against a running toshi-server instance.
- Document the Raft consensus architecture in docs/ (currently toshi-raft/src/lib.rs and proposal.rs exist but are unlinked to main README). Write docs/raft_clustering.md explaining cluster.proto, handle.rs state machine, and rpc_server.rs flow with examples.
- Add feature flag for toshi-proto and toshi-raft in Cargo.toml (replace comments with [features] clustering = ["toshi-proto", "toshi-raft"]) and document in README how to build with clustering enabled, unblocking cluster feature development.
⭐Top contributors
Click to expand
Top contributors
- @hntd187 — 75 commits
- @dependabot[bot] — 16 commits
- @dependabot-preview[bot] — 4 commits
- @sigmaris — 2 commits
- @liov — 1 commits
📝Recent commits
Click to expand
Recent commits
a13a518— Update Cargo.toml (hntd187)3583d93— Update Cargo.toml (hntd187)dfef9a9— Update dependabot.yml (hntd187)c613027— Update dependabot.yml (hntd187)c2a802e— Bump tantivy from 0.19.0 to 0.19.2 (#952) (dependabot[bot])3487f42— Bump hyper from 0.14.23 to 0.14.26 (#969) (dependabot[bot])f043301— Bump openssl from 0.10.48 to 0.10.57 (#970) (dependabot[bot])a9df87e— Bump webpki from 0.22.0 to 0.22.4 (#971) (dependabot[bot])dd70f49— Bump h2 from 0.3.15 to 0.3.17 (#968) (dependabot[bot])3897910— Bump openssl from 0.10.45 to 0.10.48 (#965) (dependabot[bot])
🔒Security observations
Toshi is a Rust-based full-text search engine with moderate security posture. Key concerns include: (1) exposed security contact email in SECURITY.md, (2) incomplete security policy lacking disclosure timelines, (3) insufficient visibility into dependency security practices, and (4) missing code review enforcement mechanisms. The codebase itself
- High · Security Contact Information Exposed —
SECURITY.md. SECURITY.md file contains a personal email address (shcarman AT gmail DOT com) for vulnerability reporting. This exposes the security contact to potential social engineering, spam, and harassment attacks. Fix: Use a dedicated security email address (e.g., security@toshi-search.org) or implement a proper responsible disclosure process using a security.txt file at /.well-known/security.txt. Consider using a security contact form or a dedicated bug bounty platform. - Medium · Incomplete Security Policy —
SECURITY.md. SECURITY.md is incomplete (cut off with 'Report security issues for...'). The security policy does not clearly define supported versions, vulnerability disclosure timeline, or responsible disclosure guidelines. Fix: Complete the SECURITY.md file with: (1) clear list of supported versions, (2) response time expectations for security reports, (3) vulnerability disclosure timeline, (4) whether a bug bounty program exists, (5) credit/acknowledgment policy. - Medium · Missing Dependency Pinning in Workspace —
Cargo.toml, toshi-client/Cargo.toml, toshi-server/Cargo.toml. Cargo.toml shows a workspace configuration but commented out members (toshi-proto, toshi-raft). The Cargo.lock file should be committed to version control, but dependency versions in Cargo.toml files of individual members are not visible. Unpinned dependencies could lead to supply chain vulnerabilities. Fix: Review all Cargo.toml files to ensure critical dependencies have specific version requirements (not wildcard versions like '*'). Use cargo-audit regularly to scan for known vulnerabilities in dependencies. - Medium · Debug Information Disabled in Release Builds —
Cargo.toml (release profile). Release profile has 'debug = false' and 'debug-assertions = false', which is good for hardening. However, without visibility into actual dependency versions and build pipeline, there's a risk of shipping vulnerable dependencies. Fix: Implement dependency scanning in CI/CD pipeline using cargo-audit or similar tools. Enable SBOM (Software Bill of Materials) generation in the build process. - Low · Missing CODEOWNERS File —
Repository root. No CODEOWNERS file is present in the repository. This makes it difficult to enforce security reviews and ensure that code changes are reviewed by appropriate team members. Fix: Create a .github/CODEOWNERS file that designates required reviewers for sensitive areas (e.g., security-related code, authentication, data handling). - Low · No Evidence of Input Validation Documentation —
toshi-server/src/handlers/. The codebase includes handlers for bulk operations, search, and indexing (handlers/bulk.rs, handlers/search.rs, handlers/index.rs), but file structure doesn't show explicit validation or sanitization patterns documented. Fix: Audit all handler implementations for proper input validation, especially in bulk.rs (batch operations), to prevent injection attacks. Document validation strategies and implement comprehensive input validation middleware. - Low · CI/CD Configuration Incomplete Visibility —
.github/workflows/rust.yml. GitHub Actions workflow file (rust.yml) is referenced but content is not provided for analysis. The build pipeline's security properties cannot be fully assessed. Fix: Ensure CI/CD pipeline includes: (1) dependency vulnerability scanning, (2) SAST (Static Application Security Testing), (3) signed commits enforcement, (4) artifact signing for releases. - Low · No Evidence of HTTPS/TLS Configuration Guidance —
config/ directory. The project includes configuration files (config/*.toml) but no visible TLS/SSL configuration examples or security best practices documentation for deployment. Fix: Document TLS/SSL configuration requirements. Provide example configurations for secure deployment. Ensure all communication endpoints enforce HTTPS in production environments.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.