valeriansaliou/sonic
π¦ Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained β safe to depend on.
Has a license, tests, and CI β clean foundation to fork and modify.
Documented and popular β useful reference codebase to read through.
No critical CVEs, sane security posture β runnable as-is.
- βLast commit 6w ago
- β13 active contributors
- βMPL-2.0 licensed
Show all 6 evidence items βShow less
- βCI configured
- βTests present
- β Concentrated ownership β top contributor handles 79% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README β live-updates from the latest cached analysis.
[](https://repopilot.app/r/valeriansaliou/sonic)Paste at the top of your README.md β renders inline like a shields.io badge.
βΈPreview social card (1200Γ630)
This card auto-renders when someone shares https://repopilot.app/r/valeriansaliou/sonic on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: valeriansaliou/sonic
Generated by RepoPilot Β· 2026-05-09 Β· Source
π€Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale β STOP and ask the user to regenerate it before proceeding. - Treat the AI Β· unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/valeriansaliou/sonic shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything β but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
π―Verdict
GO β Healthy across the board
- Last commit 6w ago
- 13 active contributors
- MPL-2.0 licensed
- CI configured
- Tests present
- β Concentrated ownership β top contributor handles 79% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
β Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live valeriansaliou/sonic
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale β regenerate it at
repopilot.app/r/valeriansaliou/sonic.
What it runs against: a local clone of valeriansaliou/sonic β the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in valeriansaliou/sonic | Confirms the artifact applies here, not a fork |
| 2 | License is still MPL-2.0 | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | Last commit β€ 74 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of valeriansaliou/sonic. If you don't
# have one yet, run these first:
#
# git clone https://github.com/valeriansaliou/sonic.git
# cd sonic
#
# Then paste this script. Every check is read-only β no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of valeriansaliou/sonic and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "valeriansaliou/sonic(\\.git)?\\b" \\
&& ok "origin remote is valeriansaliou/sonic" \\
|| miss "origin remote is not valeriansaliou/sonic (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MPL-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MPL-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is MPL-2.0" \\
|| miss "license drift β was MPL-2.0 at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 74 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~44d)"
else
miss "last commit was $days_since_last days ago β artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) β safe to trust"
else
echo "artifact has $fail stale claim(s) β regenerate at https://repopilot.app/r/valeriansaliou/sonic"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
β‘TL;DR
Sonic is a lightweight, schema-less search backend written in Rust that indexes text and returns matching IDs in microseconds, consuming ~30MB RAM. It provides full-text search, auto-completion, and Levenshtein fuzzy matching as a minimal-resource alternative to Elasticsearch, designed specifically for fast substring and prefix searching with multi-language tokenization support. Single-binary Rust project (src/main.rs produces 'sonic' executable). Core modules: src/channel/ handles protocol I/O and commands, src/config/ manages Toml-based configuration, src/executor/ contains search/index/count logic. Data persists to data/store/{fst,kv}/ using RocksDB (KV) and FST (fuzzy search trees). Optional feature flags enable jieba-rs (Chinese), lindera (Japanese), and jemalloc allocator.
π₯Who it's for
Backend engineers and DevOps teams deploying resource-constrained search infrastructure (e.g., on $5/month VPS) who need sub-millisecond queries without running a multi-GB search engine. Companies like Crisp use it to index 500M+ objects for message, conversation, and helpdesk search across their platform.
π±Maturity & risk
Production-ready and actively maintained (v1.4.9 with Rust 1.74.1+). The project has CI/CD via GitHub Actions (build.yml, test.yml workflows), proper packaging (Debian packages, Docker), and published changelogs. Sonic powers Crisp's live search at scale, indicating battle-tested maturity and ongoing stability commitment.
Low single-maintainer risk with Baptiste Jamin as co-author, but core maintainer is Valerian Saliou. Dependencies are stable (rocksdb, fst, jieba-rs) with pinned versions in Cargo.lock, though optional Chinese/Japanese tokenizers add conditional complexity. The ~668KB Rust codebase is manageable and well-structured, with no visible breaking-change deprecation notices in recent commits.
Active areas of work
The repo shows v1.4.9 as current stable with build/test workflows green. No open PRs or active development branch details visible in file list, but the presence of CHANGELOG.md and recent Cargo.toml updates (e.g., Rust 1.74.1 validation) suggests ongoing compatibility maintenance. Docker and Debian packaging indicates DevOps-focused iteration.
πGet running
git clone https://github.com/valeriansaliou/sonic.git
cd sonic
cargo build --release
./target/release/sonic
Sonic starts with config.cfg in the working directory; default auth is disabled but PROTOCOL.md documents the wire format for integration.
Daily commands:
cargo run --release
# Server listens on config.cfg specified port (default localhost:1491)
# or via Docker: docker run --publish 1491:1491 valeriansaliou/sonic:latest
πΊοΈMap of the codebase
- src/channel/command.rs: Parses and routes inbound TCP commands (PUSH, POP, COUNT, QUERY, SUGGEST); core entry point for all client requests
- src/store/mod.rs: Manages RocksDB KV store and FST index lifecycle; handles persistence and query execution against both backends
- src/executor/: Houses search.rs (full-text), count.rs (cardinality), suggest.rs (autocomplete), and expand.rs (fuzzy Levenshtein); business logic for all query types
- src/config/mod.rs: Loads config.cfg and environment overrides; controls auth, storage paths, tokenizer selection, and server bind address
- PROTOCOL.md: Documents the wire format for SEARCH, PUSH, POP, COUNT, SUGGEST, PING commands; essential for client library integration
- src/lexer/mod.rs: Tokenizes text with whatlang language detection and optional jieba (Chinese) / lindera (Japanese) plugins; critical for search accuracy
- Cargo.toml: Defines optional feature flags for allocator-jemalloc, tokenizer-chinese, tokenizer-japanese; controls conditional compilation
π οΈHow to make changes
Search queries: src/executor/ (count.rs, expand.rs for fuzzy matching). Ingest/indexing: src/store/ and src/ingest/ for FST/RocksDB writes. Protocol handling: src/channel/{command.rs, listen.rs} for TCP message parsing. Configuration: src/config/{reader.rs, defaults.rs} for Toml parsing and env var overrides. Tokenization: src/lexer/ for language-aware splitting (Chinese/Japanese support in tokenizer features).
πͺ€Traps & gotchas
Config file required: Sonic expects config.cfg in working directory or SONIC_CONFIG_PATH env var; missing config causes silent failure. Feature-gated tokenizers: Chinese/Japanese search only works if compiled with tokenizer-chinese/tokenizer-japanese features; default is Chinese-only. FST format immutable: data/store/fst/ indexes are append-only and cannot be truncated without deleting files and reindexing. Auth token in plaintext: PROTOCOL.md indicates auth tokens sent unencrypted over TCP; TLS not built-in (reverse proxy required for production). Signal handling Unix-only: src/channel/listen.rs uses nix crate for SIGTERM/SIGINT; Windows support limited.
π‘Concepts to learn
- Finite State Transducers (FST) β Sonic uses FST for sub-millisecond prefix and fuzzy search via fst and fst-levenshtein crates; understanding FSTs explains why Sonic is so fast without full inverted indices
- Levenshtein Distance / Edit Distance β Sonic's fuzzy search (fst-levenshtein crate) matches typos and misspellings by computing edit distance; core to the SUGGEST command
- Tokenization & Language Detection β Sonic uses whatlang for auto-detecting language, then jieba (Chinese) or lindera (Japanese) for proper segmentation; incorrect tokenization breaks search precision
- RocksDB & Log-Structured Merge Trees β Sonic persists indexed data to RocksDB (via rocksdb crate v0.24); understanding LSM-trees explains write amplification and read performance trade-offs
- Memory-Mapped I/O & Allocator Strategies β Sonic's jemalloc (tikv-jemallocator) reduces fragmentation for low-latency sub-30MB RAM footprint; relevant for embedded and constrained deployments
- Custom Binary Protocol Design β Sonic defines a minimalist TCP protocol (PROTOCOL.md) instead of HTTP; lower overhead but requires custom client libraries; trade-off between simplicity and interop
- Feature-Gated Compilation β Sonic uses Cargo features (tokenizer-chinese, tokenizer-japanese, allocator-jemalloc) to conditionally compile heavy dependencies; reduces binary size for minimal setups
πRelated repos
meilisearch/meilisearchβ Rust-based search engine with similar lightweight positioning but includes built-in REST API, typo tolerance, and ranked results without custom integrationquickwit-oss/quickwitβ Rust log search engine optimized for cloud storage; heavier than Sonic but distributed and S3-native, solving different scale problemstypesense/typesenseβ C++ search engine with fast typo-tolerance and vector search; alternative in the lightweight-search category with more features but higher resource usevaleriansaliou/sonic-channel-rsβ Official Rust client library for Sonic protocol; companion repo for integrating Sonic search into Rust applicationsvaleriansaliou/sonic-channel-pyβ Official Python client library for Sonic; allows Python apps to connect to Sonic servers via PROTOCOL.md wire format
πͺPR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add integration tests for src/executor/* command handlers
The executor module contains critical command handlers (push, pop, search, suggest, count, flushb, flushc, flusho, list) but there's no evidence of integration tests validating their behavior end-to-end. The test.yml workflow exists but likely only runs unit tests. This would catch regressions in query execution and ensure protocol compliance.
- [ ] Create tests/executor_integration_test.rs with test harness for spawning Sonic server
- [ ] Add tests for each executor (src/executor/push.rs, search.rs, suggest.rs, etc.) covering happy path and edge cases
- [ ] Test interaction between push/search with different data types and edge cases (special chars, unicode, empty results)
- [ ] Update .github/workflows/test.yml to run integration tests with
cargo test --test executor_integration_test
Add benchmark suite for search performance (src/executor/search.rs)
Sonic's core value proposition is performance (microsecond-level queries), but there's a benchmark feature flag defined in Cargo.toml that appears unused. A structured benchmark suite would validate performance claims and catch regressions before releases.
- [ ] Create benches/search_bench.rs using Rust's built-in benchmark framework
- [ ] Add benchmarks for: simple term search, phrase search, range queries (from PROTOCOL.md semantics)
- [ ] Include benchmarks with varying dataset sizes (100, 10K, 1M+ indexed terms) to show scalability
- [ ] Document results in INNER_WORKINGS.md with typical latencies on reference hardware
- [ ] Add benchmark instructions to CONTRIBUTING.md and update CI to track benchmark results
Complete PROTOCOL.md with missing command examples for src/executor/flushb.rs and src/executor/flusho.rs
PROTOCOL.md documents the Sonic protocol but appears incomplete - flushb (flush bucket) and flusho (flush object) executors exist in src/executor/ but likely lack protocol documentation. This creates friction for client library developers and users implementing custom integrations.
- [ ] Review src/executor/flushb.rs and src/executor/flusho.rs to understand exact command syntax and parameters
- [ ] Add 'FLUSHB' and 'FLUSHO' command sections to PROTOCOL.md with: syntax, parameters, response codes, examples
- [ ] Include edge cases (flushing non-existent collections, permission errors, race conditions)
- [ ] Add test cases validating the documented protocol in tests/ to ensure documentation matches implementation
πΏGood first issues
- Add integration tests for Chinese tokenization (tokenizer-chinese feature) in src/lexer/tests/; currently no test fixtures exist for jieba-rs output validation
- Write PROTOCOL.md examples section with curl-like command demos for PUSH, QUERY, SUGGEST operations; currently only formal grammar documented
- Implement configurable FST compression in src/store/mod.rs to reduce index size on disk; rocksdb supports zstd but FST codec is hardcoded
βTop contributors
Click to expand
Top contributors
- @valeriansaliou β 79 commits
- @anthonyroussel β 5 commits
- @Namekuji β 4 commits
- @PovilasID β 3 commits
- @jaseemabid β 1 commits
πRecent commits
Click to expand
Recent commits
05b8ddbβ Stop GPG-signing binaries, since it consumes human time when releasing (valeriansaliou)75ec203β Merge pull request #337 from anthonyroussel/update (valeriansaliou)f41688bβ Merge pull request #338 from anthonyroussel/fix-warnings (valeriansaliou)e21bde7β Merge pull request #339 from anthonyroussel/nix-cve (valeriansaliou)c50a16cβ Update ring, zerovec, zerovec-derive to fix CVEs (anthonyroussel)e1b7878β Fix mismatched_lifetime_syntaxes warnings (anthonyroussel)28e2d2cβ Bump nix to 0.31.1 (anthonyroussel)266af2fβ Bump rocksdb to 0.24 (anthonyroussel)444cacaβ Add sonic-bun 3rd party library (valeriansaliou)722bfecβ Bump GH action version (valeriansaliou)
πSecurity observations
Sonic demonstrates a reasonably secure
- Medium Β· Potential Command Injection via Configuration File β
src/config/reader.rs, src/main.rs, Dockerfile (CMD instruction). The application reads configuration from config.cfg and passes it to command-line arguments. If the configuration file is user-writable or sourced from an untrusted location, an attacker could inject malicious commands through configuration parameters that are later processed. Fix: Implement strict validation and sanitization of all configuration file inputs. Ensure config.cfg has restricted file permissions (0600). Validate all configuration values against a whitelist of allowed values. - Medium Β· Insecure Default Feature Configuration β
Cargo.toml (features section). The default features include 'allocator-jemalloc' and 'tokenizer-chinese'. While jemalloc is generally secure, the default feature set should be minimal. Enabling tokenizer-chinese by default increases attack surface if not properly maintained. Fix: Review the necessity of default features. Consider making tokenizer features opt-in rather than default. Document the security implications of each feature. - Medium Β· Missing Input Validation in Search/Query Processing β
src/executor/search.rs, src/executor/suggest.rs, src/query/builder.rs. The file structure shows search, suggest, and query processing modules (src/executor/search.rs, src/query/builder.rs), but there's no visible evidence of comprehensive input validation for search queries which could lead to DoS attacks or unexpected behavior through malformed inputs. Fix: Implement strict input validation for all query parameters. Enforce query length limits, pattern complexity limits, and regex timeout mechanisms. Add rate limiting for search operations. - Medium Β· Exposed Port Without Authentication Enforcement β
Dockerfile (EXPOSE 1491), PROTOCOL.md. The Dockerfile exposes port 1491 for the Sonic server. The README and protocol suggest this is an open port for network access. Authentication appears to be via the SONIC protocol, but there's no indication of mandatory authentication in Docker configuration. Fix: Ensure authentication is mandatory and enforced in the channel/listen.rs module. Document the authentication requirements clearly. Consider using network-level access controls (firewall rules, network policies) in addition to application-level auth. - Low Β· Unsafe Signal Handling on Unix β
Cargo.toml (nix dependency with 'signal' feature). The codebase includes nix crate with signal handling features on Unix platforms. Signal handlers can introduce race conditions and unsafe code paths if not carefully implemented. Fix: Review signal handling implementation in the main.rs and channel modules. Ensure signal handlers are minimal and don't call non-async-safe functions. Consider using channels for signal handling instead of direct handlers. - Low Β· Third-Party Dependency Version Pinning β
Cargo.toml (dependencies section). Several dependencies use semantic versioning without specific patch version pinning (e.g., 'log = 0.4', 'toml = 0.8'). This allows minor/patch updates that could introduce vulnerabilities. Fix: Use Cargo.lock for reproducible builds (already present). Consider more restrictive version constraints for critical dependencies. Regularly run 'cargo audit' in CI/CD pipeline to detect known vulnerabilities. - Low Β· Potential Information Disclosure via Error Messages β
src/channel/handle.rs, src/executor/*.rs. Error handling across the codebase may expose sensitive information such as internal paths, system details, or search patterns in error messages that could be logged or returned to clients. Fix: Implement generic error messages for client-facing responses. Log detailed error information server-side only. Never expose internal paths or system details to API responses. - Low Β· RocksDB Persistence Security β
data/store/kv, src/main.rs (RocksDB initialization). The application uses RocksDB for persistence (data/store/kv), which stores data unencrypted on disk by default. There's no visible indication of encryption at rest. Fix: Enable RocksDB encryption at rest if handling sensitive data. Implement full-disk encryption for data storage. Document data security requirements and implement access controls to the data directory.
LLM-derived; treat as a starting point, not a security audit.
πWhere to read next
- Open issues β current backlog
- Recent PRs β what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals β see the live page for receipts. Re-run on a new commit to refresh.