ankane/searchkick

Item: ankane/searchkick
Rating: 5
Author: RepoPilot

Intelligent search made easy

Healthy

Healthy across all four use cases

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 3mo ago
✓5 active contributors
✓MIT licensed

Show 3 more →

✓CI configured
✓Tests present
⚠Single-maintainer risk — top contributor 96% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/ankane/searchkick)](https://repopilot.app/r/ankane/searchkick)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/ankane/searchkick on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: ankane/searchkick

Generated by RepoPilot · 2026-05-10 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/ankane/searchkick shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

Last commit 3mo ago
5 active contributors
MIT licensed
CI configured
Tests present
⚠ Single-maintainer risk — top contributor 96% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live ankane/searchkick repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/ankane/searchkick.

What it runs against: a local clone of ankane/searchkick — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in ankane/searchkick | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 110 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>ankane/searchkick</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of ankane/searchkick. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/ankane/searchkick.git
#   cd searchkick
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of ankane/searchkick and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "ankane/searchkick(\\.git)?\\b" \\
  && ok "origin remote is ankane/searchkick" \\
  || miss "origin remote is not ankane/searchkick (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "lib/searchkick.rb" \\
  && ok "lib/searchkick.rb" \\
  || miss "missing critical file: lib/searchkick.rb"
test -f "lib/searchkick/model.rb" \\
  && ok "lib/searchkick/model.rb" \\
  || miss "missing critical file: lib/searchkick/model.rb"
test -f "lib/searchkick/query.rb" \\
  && ok "lib/searchkick/query.rb" \\
  || miss "missing critical file: lib/searchkick/query.rb"
test -f "lib/searchkick/index.rb" \\
  && ok "lib/searchkick/index.rb" \\
  || miss "missing critical file: lib/searchkick/index.rb"
test -f "lib/searchkick/indexer.rb" \\
  && ok "lib/searchkick/indexer.rb" \\
  || miss "missing critical file: lib/searchkick/indexer.rb"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 110 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~80d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/ankane/searchkick"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Searchkick is a Ruby gem that integrates Elasticsearch or OpenSearch into Rails applications and Mongoid ODM models, providing intelligent full-text search with automatic stemming, synonym support, typo tolerance, and learning-based result ranking. It abstracts away the complexity of managing search indices and query DSL, letting developers use familiar SQL-like syntax while automatically handling special characters, misspellings, and language-specific processing. Monolithic library structure: lib/searchkick.rb is the entry point, lib/searchkick/ contains 30+ specialized modules (index.rb for index management, query.rb for query building, indexer.rb for bulk operations, reindex_*.rb for zero-downtime reindexing). Each major concern (indexing, querying, reranking, results) gets its own file. Tests sit in test/, examples/ has runnable demos, and benchmark/ contains performance profiling scripts.

👥Who it's for

Rails and Mongoid developers building applications that need production-grade search functionality without writing low-level Elasticsearch queries. Specifically: backend engineers at companies like Instacart who need search to improve with user behavior, and product teams who want autocomplete and 'Did you mean' features without custom infrastructure.

🌱Maturity & risk

Highly mature and production-ready. Version 6.0 recently released (see CHANGELOG.md), has comprehensive test coverage via GitHub Actions CI/CD (workflows/build.yml), actively maintained by ankane, and battle-tested at Instacart at scale. Supports multiple ActiveRecord versions (7.2, 8.0) and Elasticsearch/OpenSearch versions (8, 9, 2, 3 respectively).

Low risk for a single-maintainer project. Dependencies are minimal and stable (elasticsearch/opensearch-ruby clients are official libraries). No obvious dormancy: CI runs on commits. Main risk is reliance on Elasticsearch/OpenSearch availability—if your search backend goes down, so does this functionality. ActiveRecord/Mongoid version pinning in gemfiles (gemfiles/activerecord*.gemfile) requires maintenance as Rails evolves.

Active areas of work

Active development: CHANGELOG.md shows recent work on version 6.0 features, multiple gemfiles for testing across ActiveRecord 7.2/8.0 and OpenSearch 2/3 suggest ongoing compatibility maintenance. Build workflow (.github/workflows/build.yml) indicates continuous testing. No explicit roadmap visible, but issue templates (ISSUE_TEMPLATE/) are in place for bug reports and feature requests.

🚀Get running

git clone https://github.com/ankane/searchkick.git && cd searchkick && bundle install. Requires Elasticsearch or OpenSearch running locally (brew install opensearch && brew services start opensearch for macOS). Then run tests with bundle exec rake or explore examples/ with bundle exec ruby examples/semantic.rb.

Daily commands: No traditional 'dev server'—this is a library. To run tests: bundle install && bundle exec rake. To use in a Rails app: add gem 'searchkick' to Gemfile, run bundle install, require Elasticsearch/OpenSearch locally, then Model.reindex && Model.search('query'). Examples in examples/ are runnable scripts: bundle exec ruby examples/semantic.rb.

🗺️Map of the codebase

lib/searchkick.rb — Entry point for the gem that sets up the Searchkick module and provides the main DSL for models
lib/searchkick/model.rb — Core module that extends ActiveRecord/Mongoid models with searchkick methods and search capabilities
lib/searchkick/query.rb — Builds and executes Elasticsearch queries; handles all search query logic and parameter processing
lib/searchkick/index.rb — Manages Elasticsearch index lifecycle including creation, deletion, settings, and communications with the cluster
lib/searchkick/indexer.rb — Coordinates the indexing pipeline including record data preparation and bulk operations
lib/searchkick/results.rb — Wraps Elasticsearch response data and provides convenient access to hits, aggregations, and suggestions
lib/searchkick/railtie.rb — Rails integration layer that sets up middleware, logging, and rake tasks

🛠️How to make changes

Add custom search filtering to a model

Call searchkick() in your model with custom scope option (lib/searchkick/model.rb)
Define a search_data method to customize what gets indexed (lib/searchkick/record_data.rb)
Use where() clause in your search query to apply filters (lib/searchkick/where.rb)
Call Model.search(query, where: {field: value}) to filter results (lib/searchkick/query.rb)

Implement custom indexing logic and reindex on demand

Add searchkick to your model class with custom settings (lib/searchkick/model.rb)
Override search_data method to control what fields are indexed (lib/searchkick/record_data.rb)
Call Model.reindex to trigger a full reindex with zero downtime (lib/searchkick/bulk_reindex_job.rb)
Monitor progress via index status and check for errors (lib/searchkick/index.rb)

Add aggregations and faceted search results

Use the aggs option in your search query to request aggregations (lib/searchkick/query.rb)
Access aggregation results from the response object (lib/searchkick/results.rb)
Use aggregations to build faceted navigation in your controller (lib/searchkick/relation.rb)

Add search suggestions and autocomplete

Configure suggest option in searchkick() model call (lib/searchkick/index_options.rb)
Use suggest() method on search results to get suggestions (lib/searchkick/query.rb)
Return suggestions to your frontend for autocomplete UI (lib/searchkick/results.rb)

🔧Why these technologies

Elasticsearch — Provides full-text search, advanced filtering, aggregations, and learning-based ranking; the primary search backend
ActiveRecord / Mongoid — Integrates with Rails ORMs to automatically index model changes and provide model-aware search queries
Sidekiq / ActiveJob — Offloads bulk reindexing and queue processing to background workers to avoid blocking request cycles
Rails Middleware & Log Subscriber — Captures request-level metrics and logs search queries for performance monitoring and debugging

⚖️Trade-offs already made

Callback-based auto-indexing on model save
- Why: Keeps search index synchronized with database without explicit API calls
- Consequence: Creates overhead on every save; requires careful handling of bulk operations to avoid per-record indexing
Zero-downtime reindex using alias swapping
- Why: Allows reindexing large datasets without search downtime
- Consequence: More complex index management; requires coordinating alias updates and cleanup
In-memory index object caching
- Why: Avoids repeated Elasticsearch cluster calls for index metadata
- Consequence: Cache invalidation complexity; may have stale metadata if index schema changes
Relation-style query chaining API
- Why: Familiar DSL for Rails developers, similar to ActiveRecord querying
- Consequence: Less expressive than raw Elasticsearch DSL for very complex queries

🚫Non-goals (don't propose these)

Real-time analytics dashboard (use Searchjoy for that)
Query suggestion engine (use Autosuggest for that)
Persistent query history or learning without Searchjoy integration
Support for databases other than Elasticsearch/OpenSearch
Full-text search without Elasticsearch backend

🪤Traps & gotchas

Elasticsearch/OpenSearch must be running before any search operations; reindex will fail silently if the cluster is unavailable.
Index aliases are used for zero-downtime reindexing (reindex_v2_job.rb); swapping aliases requires careful coordination if manually interfering.
Background job workers (ProcessBatchJob, BulkReindexJob, etc.) expect a configured queue backend (Resque, Sidekiq, etc.); async reindexing won't work without one.
Gemfiles in gemfiles/ are for CI matrix testing; lock your own Gemfile to compatible versions (see Elasticsearch 7 vs 8 breaking changes documented in CHANGELOG.md).
Record data serialization happens in record_data.rb; custom object_attributes or serializers may not index as expected without explicit to_searchkick_json hooks.
Multi-search batching (multi_search.rb) has internal batch size limits; very large result sets may require manual pagination.

🏗️Architecture

💡Concepts to learn

Elasticsearch Query DSL — Searchkick abstracts this away but internally builds and sends JSON queries to Elasticsearch; understanding the DSL helps debug complex searches and customize behavior
Stemming and Linguistic Analysis — Core to why Searchkick matches 'tomatoes' to 'tomato'—indexes configure language analyzers that normalize text; crucial for understanding search behavior across languages
Index Aliases — Used by reindex_v2_job.rb for zero-downtime reindexing; a single alias points to the active index while background jobs build a new one, then swap atomically
Bulk API — Searchkick's indexer.rb uses Elasticsearch Bulk API to insert/update thousands of records efficiently in one request instead of individual calls
BM25 Relevance Scoring — Default ranking algorithm for full-text search; Searchkick lets you customize boosting and field weighting but relies on BM25 under the hood
Tokenization and Analyzers — Text must be split into tokens before indexing; understanding how analyzers tokenize 'jalapeño' vs 'jalapeno' explains why special character handling works
Reranking with Custom Scripts — reranking.rb allows Elasticsearch script scoring to reorder results post-search based on custom logic like user history; advanced feature for personalization

elasticsearch/elasticsearch-ruby — Official Elasticsearch Ruby client library that Searchkick wraps; understanding its API helps debug low-level search issues
opensearch-project/opensearch-ruby — Official OpenSearch Ruby client; drop-in alternative to elasticsearch-ruby for Searchkick backend
ankane/searchjoy — Companion analytics gem that tracks and visualizes search queries and user clicks to power Searchkick's learning features
ankane/autosuggest — Sibling gem providing query suggestions and autocomplete using Elasticsearch aggregations; often used alongside Searchkick
rspec/rspec-rails — Test framework commonly used in Searchkick apps; examples/ and test/ follow similar RSpec patterns

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive tests for lib/searchkick/reranking.rb

The reranking.rb file exists in the lib structure but has no corresponding test file in the test directory. Given that reranking is a core search quality feature, this needs dedicated test coverage for different reranking scenarios, edge cases, and integration with the search pipeline.

[ ] Create test/reranking_test.rb with tests for reranking algorithm accuracy
[ ] Add tests for reranking with different model types and configurations
[ ] Test edge cases: empty results, single result, disabled reranking
[ ] Add integration tests showing reranking improving search results quality
[ ] Verify compatibility with both Elasticsearch and OpenSearch versions

Add missing tests for lib/searchkick/query.rb and query building logic

The query.rb file is central to Searchkick's DSL but appears to lack comprehensive test coverage in the test directory. Query construction, filtering, sorting, and aggregation building need thorough testing to prevent regressions.

[ ] Create test/query_test.rb to test Query class initialization and configuration
[ ] Add tests for complex query building: filters, where clauses, sorting combinations
[ ] Test query DSL edge cases: empty queries, invalid parameters, special characters
[ ] Add tests for query caching and optimization behavior
[ ] Test query building with all supported option combinations from index_options

Add comprehensive documentation and tests for semantic search and hybrid search features

The repo includes semantic.rb and hybrid.rb example files and a hybrid_test.rb, but there's no semantic_test.rb. These advanced features need dedicated test files and documentation to ensure they work correctly with different embedding models and search configurations.

[ ] Create test/semantic_test.rb with tests for semantic search integration
[ ] Add tests for semantic search with different embedding dimensions
[ ] Test hybrid search combining semantic + keyword results weighting
[ ] Add tests for edge cases: missing embeddings, embedding model failures
[ ] Document semantic/hybrid search in README with examples (if not already present)
[ ] Test compatibility with knn features (verify knn_test.rb adequately covers semantic scenarios)

🌿Good first issues

Add integration tests for the semantic search example (examples/semantic.rb) to ensure hybrid BM25+vector search stays compatible across Elasticsearch version upgrades.
Document the reindex_v2_job.rb zero-downtime reindexing process with a runnable example showing index alias swapping; currently only inline comments exist.
Add support for returning raw Elasticsearch aggregation buckets in Results class without forcing array conversion; useful for faceted search UIs that need bucket metadata.

⭐Top contributors

Click to expand

@ankane — 96 commits
@james-reading — 1 commits
@jaredmoody — 1 commits
@khasinski — 1 commits
@y-yagi — 1 commits

📝Recent commits

Click to expand

1009d03 — Updated changelog [skip ci] (ankane)
36a5b8c — Improved smart aggs behavior with _and (ankane)
2a6a707 — Version bump to 6.1.0 [skip ci] (ankane)
8fb7e8a — Updated checkout action (ankane)
442348f — Restored previous behavior for smart_aggs [skip ci] (ankane)
1c8f3e2 — Added smart_aggs test [skip ci] (ankane)
b788666 — Improved smart_aggs test (ankane)
c9e3676 — Improved smart_aggs test [skip ci] (ankane)
e80c26f — Removed todo [skip ci] (ankane)
edbdfb8 — Added smart_aggs tests for _script [skip ci] (ankane)

🔒Security observations

The searchkick codebase demonstrates moderate security posture with primary concerns around query injection vulnerabilities, insufficient input validation for Elasticsearch queries, and potential information disclosure through logging. The absence of visible dependency manifests prevents complete assessment. Key recommendations include implementing strict input validation and parameterized queries for search operations, sanitizing error messages, adding authorization controls to bulk operations, and implementing proper concurrency controls. The project should adopt security scanning tools and maintain regular dependency audits.

High · Potential Elasticsearch/OpenSearch Query Injection — lib/searchkick/query.rb, lib/searchkick/where.rb. The codebase processes user search queries and constructs Elasticsearch queries through lib/searchkick/query.rb and lib/searchkick/where.rb. Without proper input validation and sanitization, attackers could craft malicious queries to access unauthorized data or cause denial of service through query manipulation. Fix: Implement strict input validation and parameterized queries. Validate all user inputs against expected types and formats. Use Elasticsearch's query DSL safely without string concatenation. Review how query parameters are constructed and ensure all user-supplied data is properly escaped.
Medium · Potential Information Disclosure via Error Messages — lib/searchkick/log_subscriber.rb, lib/searchkick/controller_runtime.rb. The logging and error handling mechanisms in lib/searchkick/log_subscriber.rb may expose sensitive information about database structure, query patterns, or internal system details to attackers or unauthorized users through verbose error messages. Fix: Sanitize error messages before logging or displaying to users. Implement different log levels for development vs production. Ensure sensitive details (query structure, field names) are only logged in debug mode with restricted access.
Medium · Insecure Bulk Reindex Operations — lib/searchkick/bulk_reindex_job.rb, lib/searchkick/reindex_v2_job.rb, lib/searchkick/process_batch_job.rb. The bulk reindex functionality in lib/searchkick/bulk_reindex_job.rb and lib/searchkick/reindex_v2_job.rb performs large-scale data operations without apparent access control. Background jobs could be vulnerable to privilege escalation or unauthorized data manipulation. Fix: Implement proper authorization checks before executing reindex jobs. Verify that only authenticated and authorized users/processes can trigger bulk operations. Add audit logging for all reindex activities. Consider rate limiting on reindex operations.
Medium · Potential Race Conditions in Reindex Queue — lib/searchkick/reindex_queue.rb. The reindex queue mechanism in lib/searchkick/reindex_queue.rb may be vulnerable to race conditions in multi-threaded or distributed environments, potentially leading to data inconsistency or duplicate processing of records. Fix: Implement proper locking mechanisms and atomic operations for queue management. Use database transactions or distributed locks to prevent race conditions. Add idempotency checks to handle duplicate processing safely.
Low · Missing Security Headers Configuration — lib/searchkick/middleware.rb. The middleware in lib/searchkick/middleware.rb does not appear to implement security-related HTTP headers such as CSP, X-Frame-Options, or X-Content-Type-Options based on the file structure. Fix: Add security headers middleware to set appropriate HTTP security headers. Configure Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, and other relevant headers for defense against XSS and clickjacking attacks.
Low · Dependency Management Opacity — searchkick.gemspec, Gemfile. No dependency file content was provided (Gemfile, Gemfile.lock, gemspec) for analysis. Dependencies may contain known vulnerabilities without visibility into what versions are pinned. Fix: Regularly audit dependencies using tools like 'bundle audit' or 'dependabot'. Keep all gems updated to the latest secure versions. Pin specific versions in gemfile.lock and review security advisories regularly.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

ankane/searchkick

Embed the "Healthy" badge

Onboarding doc

Onboarding: ankane/searchkick

🤖Agent protocol

🎯Verdict

✅Verify before trusting

⚡TL;DR

👥Who it's for

🌱Maturity & risk

Active areas of work

🚀Get running

🗺️Map of the codebase

🛠️How to make changes

Add custom search filtering to a model

Implement custom indexing logic and reindex on demand

Add aggregations and faceted search results

Add search suggestions and autocomplete

🔧Why these technologies

⚖️Trade-offs already made

🚫Non-goals (don't propose these)

🪤Traps & gotchas

🏗️Architecture

💡Concepts to learn

🔗Related repos

🪄PR ideas

Add comprehensive tests for lib/searchkick/reranking.rb

Add missing tests for lib/searchkick/query.rb and query building logic

Add comprehensive documentation and tests for semantic search and hybrid search features

🌿Good first issues

⭐Top contributors

Top contributors

📝Recent commits

Recent commits

🔒Security observations

👉Where to read next