ankane/searchkick
Intelligent search made easy
Healthy across all four use cases
Permissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 3mo ago
- ✓5 active contributors
- ✓MIT licensed
Show 3 more →Show less
- ✓CI configured
- ✓Tests present
- ⚠Single-maintainer risk — top contributor 96% of recent commits
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/ankane/searchkick)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/ankane/searchkick on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: ankane/searchkick
Generated by RepoPilot · 2026-05-10 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ankane/searchkick shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across all four use cases
- Last commit 3mo ago
- 5 active contributors
- MIT licensed
- CI configured
- Tests present
- ⚠ Single-maintainer risk — top contributor 96% of recent commits
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live ankane/searchkick
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/ankane/searchkick.
What it runs against: a local clone of ankane/searchkick — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in ankane/searchkick | Confirms the artifact applies here, not a fork |
| 2 | License is still MIT | Catches relicense before you depend on it |
| 3 | Default branch master exists | Catches branch renames |
| 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code |
| 5 | Last commit ≤ 110 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ankane/searchkick. If you don't
# have one yet, run these first:
#
# git clone https://github.com/ankane/searchkick.git
# cd searchkick
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of ankane/searchkick and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ankane/searchkick(\\.git)?\\b" \\
&& ok "origin remote is ankane/searchkick" \\
|| miss "origin remote is not ankane/searchkick (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
&& ok "license is MIT" \\
|| miss "license drift — was MIT at generation time"
# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
&& ok "default branch master exists" \\
|| miss "default branch master no longer exists"
# 4. Critical files exist
test -f "lib/searchkick.rb" \\
&& ok "lib/searchkick.rb" \\
|| miss "missing critical file: lib/searchkick.rb"
test -f "lib/searchkick/model.rb" \\
&& ok "lib/searchkick/model.rb" \\
|| miss "missing critical file: lib/searchkick/model.rb"
test -f "lib/searchkick/query.rb" \\
&& ok "lib/searchkick/query.rb" \\
|| miss "missing critical file: lib/searchkick/query.rb"
test -f "lib/searchkick/index.rb" \\
&& ok "lib/searchkick/index.rb" \\
|| miss "missing critical file: lib/searchkick/index.rb"
test -f "lib/searchkick/indexer.rb" \\
&& ok "lib/searchkick/indexer.rb" \\
|| miss "missing critical file: lib/searchkick/indexer.rb"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 110 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~80d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/ankane/searchkick"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Searchkick is a Ruby gem that integrates Elasticsearch or OpenSearch into Rails applications and Mongoid ODM models, providing intelligent full-text search with automatic stemming, synonym support, typo tolerance, and learning-based result ranking. It abstracts away the complexity of managing search indices and query DSL, letting developers use familiar SQL-like syntax while automatically handling special characters, misspellings, and language-specific processing. Monolithic library structure: lib/searchkick.rb is the entry point, lib/searchkick/ contains 30+ specialized modules (index.rb for index management, query.rb for query building, indexer.rb for bulk operations, reindex_*.rb for zero-downtime reindexing). Each major concern (indexing, querying, reranking, results) gets its own file. Tests sit in test/, examples/ has runnable demos, and benchmark/ contains performance profiling scripts.
👥Who it's for
Rails and Mongoid developers building applications that need production-grade search functionality without writing low-level Elasticsearch queries. Specifically: backend engineers at companies like Instacart who need search to improve with user behavior, and product teams who want autocomplete and 'Did you mean' features without custom infrastructure.
🌱Maturity & risk
Highly mature and production-ready. Version 6.0 recently released (see CHANGELOG.md), has comprehensive test coverage via GitHub Actions CI/CD (workflows/build.yml), actively maintained by ankane, and battle-tested at Instacart at scale. Supports multiple ActiveRecord versions (7.2, 8.0) and Elasticsearch/OpenSearch versions (8, 9, 2, 3 respectively).
Low risk for a single-maintainer project. Dependencies are minimal and stable (elasticsearch/opensearch-ruby clients are official libraries). No obvious dormancy: CI runs on commits. Main risk is reliance on Elasticsearch/OpenSearch availability—if your search backend goes down, so does this functionality. ActiveRecord/Mongoid version pinning in gemfiles (gemfiles/activerecord*.gemfile) requires maintenance as Rails evolves.
Active areas of work
Active development: CHANGELOG.md shows recent work on version 6.0 features, multiple gemfiles for testing across ActiveRecord 7.2/8.0 and OpenSearch 2/3 suggest ongoing compatibility maintenance. Build workflow (.github/workflows/build.yml) indicates continuous testing. No explicit roadmap visible, but issue templates (ISSUE_TEMPLATE/) are in place for bug reports and feature requests.
🚀Get running
git clone https://github.com/ankane/searchkick.git && cd searchkick && bundle install. Requires Elasticsearch or OpenSearch running locally (brew install opensearch && brew services start opensearch for macOS). Then run tests with bundle exec rake or explore examples/ with bundle exec ruby examples/semantic.rb.
Daily commands: No traditional 'dev server'—this is a library. To run tests: bundle install && bundle exec rake. To use in a Rails app: add gem 'searchkick' to Gemfile, run bundle install, require Elasticsearch/OpenSearch locally, then Model.reindex && Model.search('query'). Examples in examples/ are runnable scripts: bundle exec ruby examples/semantic.rb.
🗺️Map of the codebase
lib/searchkick.rb— Entry point for the gem that sets up the Searchkick module and provides the main DSL for modelslib/searchkick/model.rb— Core module that extends ActiveRecord/Mongoid models with searchkick methods and search capabilitieslib/searchkick/query.rb— Builds and executes Elasticsearch queries; handles all search query logic and parameter processinglib/searchkick/index.rb— Manages Elasticsearch index lifecycle including creation, deletion, settings, and communications with the clusterlib/searchkick/indexer.rb— Coordinates the indexing pipeline including record data preparation and bulk operationslib/searchkick/results.rb— Wraps Elasticsearch response data and provides convenient access to hits, aggregations, and suggestionslib/searchkick/railtie.rb— Rails integration layer that sets up middleware, logging, and rake tasks
🛠️How to make changes
Add custom search filtering to a model
- Call searchkick() in your model with custom scope option (
lib/searchkick/model.rb) - Define a search_data method to customize what gets indexed (
lib/searchkick/record_data.rb) - Use where() clause in your search query to apply filters (
lib/searchkick/where.rb) - Call Model.search(query, where: {field: value}) to filter results (
lib/searchkick/query.rb)
Implement custom indexing logic and reindex on demand
- Add searchkick to your model class with custom settings (
lib/searchkick/model.rb) - Override search_data method to control what fields are indexed (
lib/searchkick/record_data.rb) - Call Model.reindex to trigger a full reindex with zero downtime (
lib/searchkick/bulk_reindex_job.rb) - Monitor progress via index status and check for errors (
lib/searchkick/index.rb)
Add aggregations and faceted search results
- Use the aggs option in your search query to request aggregations (
lib/searchkick/query.rb) - Access aggregation results from the response object (
lib/searchkick/results.rb) - Use aggregations to build faceted navigation in your controller (
lib/searchkick/relation.rb)
Add search suggestions and autocomplete
- Configure suggest option in searchkick() model call (
lib/searchkick/index_options.rb) - Use suggest() method on search results to get suggestions (
lib/searchkick/query.rb) - Return suggestions to your frontend for autocomplete UI (
lib/searchkick/results.rb)
🔧Why these technologies
- Elasticsearch — Provides full-text search, advanced filtering, aggregations, and learning-based ranking; the primary search backend
- ActiveRecord / Mongoid — Integrates with Rails ORMs to automatically index model changes and provide model-aware search queries
- Sidekiq / ActiveJob — Offloads bulk reindexing and queue processing to background workers to avoid blocking request cycles
- Rails Middleware & Log Subscriber — Captures request-level metrics and logs search queries for performance monitoring and debugging
⚖️Trade-offs already made
-
Callback-based auto-indexing on model save
- Why: Keeps search index synchronized with database without explicit API calls
- Consequence: Creates overhead on every save; requires careful handling of bulk operations to avoid per-record indexing
-
Zero-downtime reindex using alias swapping
- Why: Allows reindexing large datasets without search downtime
- Consequence: More complex index management; requires coordinating alias updates and cleanup
-
In-memory index object caching
- Why: Avoids repeated Elasticsearch cluster calls for index metadata
- Consequence: Cache invalidation complexity; may have stale metadata if index schema changes
-
Relation-style query chaining API
- Why: Familiar DSL for Rails developers, similar to ActiveRecord querying
- Consequence: Less expressive than raw Elasticsearch DSL for very complex queries
🚫Non-goals (don't propose these)
- Real-time analytics dashboard (use Searchjoy for that)
- Query suggestion engine (use Autosuggest for that)
- Persistent query history or learning without Searchjoy integration
- Support for databases other than Elasticsearch/OpenSearch
- Full-text search without Elasticsearch backend
🪤Traps & gotchas
- Elasticsearch/OpenSearch must be running before any search operations; reindex will fail silently if the cluster is unavailable.
- Index aliases are used for zero-downtime reindexing (reindex_v2_job.rb); swapping aliases requires careful coordination if manually interfering.
- Background job workers (ProcessBatchJob, BulkReindexJob, etc.) expect a configured queue backend (Resque, Sidekiq, etc.); async reindexing won't work without one.
- Gemfiles in gemfiles/ are for CI matrix testing; lock your own Gemfile to compatible versions (see Elasticsearch 7 vs 8 breaking changes documented in CHANGELOG.md).
- Record data serialization happens in record_data.rb; custom object_attributes or serializers may not index as expected without explicit to_searchkick_json hooks.
- Multi-search batching (multi_search.rb) has internal batch size limits; very large result sets may require manual pagination.
🏗️Architecture
💡Concepts to learn
- Elasticsearch Query DSL — Searchkick abstracts this away but internally builds and sends JSON queries to Elasticsearch; understanding the DSL helps debug complex searches and customize behavior
- Stemming and Linguistic Analysis — Core to why Searchkick matches 'tomatoes' to 'tomato'—indexes configure language analyzers that normalize text; crucial for understanding search behavior across languages
- Index Aliases — Used by reindex_v2_job.rb for zero-downtime reindexing; a single alias points to the active index while background jobs build a new one, then swap atomically
- Bulk API — Searchkick's indexer.rb uses Elasticsearch Bulk API to insert/update thousands of records efficiently in one request instead of individual calls
- BM25 Relevance Scoring — Default ranking algorithm for full-text search; Searchkick lets you customize boosting and field weighting but relies on BM25 under the hood
- Tokenization and Analyzers — Text must be split into tokens before indexing; understanding how analyzers tokenize 'jalapeño' vs 'jalapeno' explains why special character handling works
- Reranking with Custom Scripts — reranking.rb allows Elasticsearch script scoring to reorder results post-search based on custom logic like user history; advanced feature for personalization
🔗Related repos
elasticsearch/elasticsearch-ruby— Official Elasticsearch Ruby client library that Searchkick wraps; understanding its API helps debug low-level search issuesopensearch-project/opensearch-ruby— Official OpenSearch Ruby client; drop-in alternative to elasticsearch-ruby for Searchkick backendankane/searchjoy— Companion analytics gem that tracks and visualizes search queries and user clicks to power Searchkick's learning featuresankane/autosuggest— Sibling gem providing query suggestions and autocomplete using Elasticsearch aggregations; often used alongside Searchkickrspec/rspec-rails— Test framework commonly used in Searchkick apps; examples/ and test/ follow similar RSpec patterns
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive tests for lib/searchkick/reranking.rb
The reranking.rb file exists in the lib structure but has no corresponding test file in the test directory. Given that reranking is a core search quality feature, this needs dedicated test coverage for different reranking scenarios, edge cases, and integration with the search pipeline.
- [ ] Create test/reranking_test.rb with tests for reranking algorithm accuracy
- [ ] Add tests for reranking with different model types and configurations
- [ ] Test edge cases: empty results, single result, disabled reranking
- [ ] Add integration tests showing reranking improving search results quality
- [ ] Verify compatibility with both Elasticsearch and OpenSearch versions
Add missing tests for lib/searchkick/query.rb and query building logic
The query.rb file is central to Searchkick's DSL but appears to lack comprehensive test coverage in the test directory. Query construction, filtering, sorting, and aggregation building need thorough testing to prevent regressions.
- [ ] Create test/query_test.rb to test Query class initialization and configuration
- [ ] Add tests for complex query building: filters, where clauses, sorting combinations
- [ ] Test query DSL edge cases: empty queries, invalid parameters, special characters
- [ ] Add tests for query caching and optimization behavior
- [ ] Test query building with all supported option combinations from index_options
Add comprehensive documentation and tests for semantic search and hybrid search features
The repo includes semantic.rb and hybrid.rb example files and a hybrid_test.rb, but there's no semantic_test.rb. These advanced features need dedicated test files and documentation to ensure they work correctly with different embedding models and search configurations.
- [ ] Create test/semantic_test.rb with tests for semantic search integration
- [ ] Add tests for semantic search with different embedding dimensions
- [ ] Test hybrid search combining semantic + keyword results weighting
- [ ] Add tests for edge cases: missing embeddings, embedding model failures
- [ ] Document semantic/hybrid search in README with examples (if not already present)
- [ ] Test compatibility with knn features (verify knn_test.rb adequately covers semantic scenarios)
🌿Good first issues
- Add integration tests for the semantic search example (examples/semantic.rb) to ensure hybrid BM25+vector search stays compatible across Elasticsearch version upgrades.
- Document the reindex_v2_job.rb zero-downtime reindexing process with a runnable example showing index alias swapping; currently only inline comments exist.
- Add support for returning raw Elasticsearch aggregation buckets in Results class without forcing array conversion; useful for faceted search UIs that need bucket metadata.
⭐Top contributors
Click to expand
Top contributors
- @ankane — 96 commits
- @james-reading — 1 commits
- @jaredmoody — 1 commits
- @khasinski — 1 commits
- @y-yagi — 1 commits
📝Recent commits
Click to expand
Recent commits
1009d03— Updated changelog [skip ci] (ankane)36a5b8c— Improved smart aggs behavior with _and (ankane)2a6a707— Version bump to 6.1.0 [skip ci] (ankane)8fb7e8a— Updated checkout action (ankane)442348f— Restored previous behavior for smart_aggs [skip ci] (ankane)1c8f3e2— Added smart_aggs test [skip ci] (ankane)b788666— Improved smart_aggs test (ankane)c9e3676— Improved smart_aggs test [skip ci] (ankane)e80c26f— Removed todo [skip ci] (ankane)edbdfb8— Added smart_aggs tests for _script [skip ci] (ankane)
🔒Security observations
The searchkick codebase demonstrates moderate security posture with primary concerns around query injection vulnerabilities, insufficient input validation for Elasticsearch queries, and potential information disclosure through logging. The absence of visible dependency manifests prevents complete assessment. Key recommendations include implementing strict input validation and parameterized queries for search operations, sanitizing error messages, adding authorization controls to bulk operations, and implementing proper concurrency controls. The project should adopt security scanning tools and maintain regular dependency audits.
- High · Potential Elasticsearch/OpenSearch Query Injection —
lib/searchkick/query.rb, lib/searchkick/where.rb. The codebase processes user search queries and constructs Elasticsearch queries through lib/searchkick/query.rb and lib/searchkick/where.rb. Without proper input validation and sanitization, attackers could craft malicious queries to access unauthorized data or cause denial of service through query manipulation. Fix: Implement strict input validation and parameterized queries. Validate all user inputs against expected types and formats. Use Elasticsearch's query DSL safely without string concatenation. Review how query parameters are constructed and ensure all user-supplied data is properly escaped. - Medium · Potential Information Disclosure via Error Messages —
lib/searchkick/log_subscriber.rb, lib/searchkick/controller_runtime.rb. The logging and error handling mechanisms in lib/searchkick/log_subscriber.rb may expose sensitive information about database structure, query patterns, or internal system details to attackers or unauthorized users through verbose error messages. Fix: Sanitize error messages before logging or displaying to users. Implement different log levels for development vs production. Ensure sensitive details (query structure, field names) are only logged in debug mode with restricted access. - Medium · Insecure Bulk Reindex Operations —
lib/searchkick/bulk_reindex_job.rb, lib/searchkick/reindex_v2_job.rb, lib/searchkick/process_batch_job.rb. The bulk reindex functionality in lib/searchkick/bulk_reindex_job.rb and lib/searchkick/reindex_v2_job.rb performs large-scale data operations without apparent access control. Background jobs could be vulnerable to privilege escalation or unauthorized data manipulation. Fix: Implement proper authorization checks before executing reindex jobs. Verify that only authenticated and authorized users/processes can trigger bulk operations. Add audit logging for all reindex activities. Consider rate limiting on reindex operations. - Medium · Potential Race Conditions in Reindex Queue —
lib/searchkick/reindex_queue.rb. The reindex queue mechanism in lib/searchkick/reindex_queue.rb may be vulnerable to race conditions in multi-threaded or distributed environments, potentially leading to data inconsistency or duplicate processing of records. Fix: Implement proper locking mechanisms and atomic operations for queue management. Use database transactions or distributed locks to prevent race conditions. Add idempotency checks to handle duplicate processing safely. - Low · Missing Security Headers Configuration —
lib/searchkick/middleware.rb. The middleware in lib/searchkick/middleware.rb does not appear to implement security-related HTTP headers such as CSP, X-Frame-Options, or X-Content-Type-Options based on the file structure. Fix: Add security headers middleware to set appropriate HTTP security headers. Configure Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, and other relevant headers for defense against XSS and clickjacking attacks. - Low · Dependency Management Opacity —
searchkick.gemspec, Gemfile. No dependency file content was provided (Gemfile, Gemfile.lock, gemspec) for analysis. Dependencies may contain known vulnerabilities without visibility into what versions are pinned. Fix: Regularly audit dependencies using tools like 'bundle audit' or 'dependabot'. Keep all gems updated to the latest secure versions. Pin specific versions in gemfile.lock and review security advisories regularly.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.