RepoPilotOpen in app →

kanwei/algorithms

Ruby algorithms and data structures. C extensions

Healthy

Healthy across the board

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 2mo ago
  • 15 active contributors
  • MIT licensed
Show 3 more →
  • CI configured
  • Tests present
  • Concentrated ownership — top contributor handles 77% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/kanwei/algorithms)](https://repopilot.app/r/kanwei/algorithms)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/kanwei/algorithms on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: kanwei/algorithms

Generated by RepoPilot · 2026-05-10 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/kanwei/algorithms shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 2mo ago
  • 15 active contributors
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Concentrated ownership — top contributor handles 77% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live kanwei/algorithms repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/kanwei/algorithms.

What it runs against: a local clone of kanwei/algorithms — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in kanwei/algorithms | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 100 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>kanwei/algorithms</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of kanwei/algorithms. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/kanwei/algorithms.git
#   cd algorithms
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of kanwei/algorithms and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "kanwei/algorithms(\\.git)?\\b" \\
  && ok "origin remote is kanwei/algorithms" \\
  || miss "origin remote is not kanwei/algorithms (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "lib/algorithms.rb" \\
  && ok "lib/algorithms.rb" \\
  || miss "missing critical file: lib/algorithms.rb"
test -f "lib/algorithms/sort.rb" \\
  && ok "lib/algorithms/sort.rb" \\
  || miss "missing critical file: lib/algorithms/sort.rb"
test -f "lib/containers/heap.rb" \\
  && ok "lib/containers/heap.rb" \\
  || miss "missing critical file: lib/containers/heap.rb"
test -f "ext/containers/deque/deque.c" \\
  && ok "ext/containers/deque/deque.c" \\
  || miss "missing critical file: ext/containers/deque/deque.c"
test -f "lib/containers/rb_tree_map.rb" \\
  && ok "lib/containers/rb_tree_map.rb" \\
  || miss "missing critical file: lib/containers/rb_tree_map.rb"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 100 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~70d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/kanwei/algorithms"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

A Ruby standard library for algorithms and data structures (heaps, Red-Black trees, splay trees, tries, deques, sorting/search algorithms) with optional C extensions for performance. It provides production-ready implementations of classical CS data structures that are missing from Ruby's standard library, with both pure Ruby and optimized C variants. Hybrid Ruby + C monorepo: lib/algorithms/ and lib/containers/ contain pure Ruby implementations (Containers::Heap, Algorithms::Sort.quicksort, etc.), while ext/containers/ and ext/algorithms/ provide drop-in C extensions (CDeque, CRBTreeMap, CSplayTreeMap) built via extconf.rb. Tests in spec/ validate both variants; benchmarks/ compare performance.

👥Who it's for

Ruby developers building performance-critical applications who need efficient data structures beyond Array/Hash, especially those doing competitive programming, data processing, or systems where algorithmic complexity directly impacts user experience.

🌱Maturity & risk

Actively maintained with CI/CD via GitHub Actions (.github/workflows/ci.yml), comprehensive test coverage (spec/ directory), and consistent API stability documented in CHANGELOG.markdown. Originated as a Google Summer of Code 2008 project and remains a go-to reference implementation, suggesting production-ready maturity with modest ongoing maintenance.

Single-maintainer repo (kanwei) with no visible dependency management crisis (basic Gemfile, no sprawling dependencies), but C extensions (ext/containers/, ext/algorithms/) add maintenance burden requiring C knowledge for contributor onboarding. Last activity and issue backlog not visible in provided data, so verify GitHub activity before critical adoption.

Active areas of work

No specific recent activity visible in provided file list. Review GitHub Actions workflow (ci.yml) execution history and git log to determine if actively developed or in maintenance mode. Check GitHub Issues for current discussion.

🚀Get running

git clone https://github.com/kanwei/algorithms.git && cd algorithms && bundle install && rake compile (to build C extensions)

Daily commands: No 'server' mode; this is a library. Test: rake spec or rspec spec/. Benchmark: ruby benchmarks/heap.rb (see benchmarks/ directory). Build C extensions: rake compile.

🗺️Map of the codebase

  • lib/algorithms.rb — Main entry point that requires all algorithm and container modules; every contributor must understand what gets exposed
  • lib/algorithms/sort.rb — Core sorting algorithms implementation; foundational algorithm module used across the codebase
  • lib/containers/heap.rb — Priority queue and heap implementation; critical data structure that underpins PriorityQueue and heap benchmarks
  • ext/containers/deque/deque.c — C extension for deque performance; demonstrates the C-extension pattern used throughout for optimizing critical structures
  • lib/containers/rb_tree_map.rb — Red-Black Tree Map wrapper around C extension; exemplifies how Ruby wraps native C implementations for tree structures
  • algorithms.gemspec — Gem specification defining dependencies, extensions to build, and package metadata for the entire project
  • Rakefile — Build and compilation configuration; controls C extension compilation and test execution

🛠️How to make changes

Add a new pure Ruby sorting algorithm

  1. Implement the algorithm method in lib/algorithms/sort.rb following the existing pattern (e.g., bubble_sort, insertion_sort) (lib/algorithms/sort.rb)
  2. Create corresponding test in spec/sort_spec.rb with test cases covering basic, reversed, and single-element arrays (spec/sort_spec.rb)
  3. Add benchmark comparison in benchmarks/sorts.rb to compare performance against existing sorts (benchmarks/sorts.rb)

Add a new C extension-backed container

  1. Create new directory ext/containers/{name}/ with {name}.c implementing the data structure in C (ext/containers/bst/bst.c)
  2. Create ext/containers/{name}/extconf.rb following the pattern from existing extensions (copy from ext/containers/deque/extconf.rb) (ext/containers/deque/extconf.rb)
  3. Create Ruby wrapper in lib/containers/{name}.rb that requires the C extension and provides the public API (lib/containers/rb_tree_map.rb)
  4. Add C extension to algorithms.gemspec in the extensions array (algorithms.gemspec)
  5. Create comprehensive tests in spec/{name}_spec.rb and optional spec/{name}_gc_mark_spec.rb for garbage collection (spec/deque_spec.rb)

Add a new pure Ruby container data structure

  1. Create new file lib/containers/{name}.rb with class definition and public methods (enqueue/dequeue for queue-like, push/pop for stack-like, etc.) (lib/containers/trie.rb)
  2. Require the new container in lib/algorithms.rb so it's available when the gem is loaded (lib/algorithms.rb)
  3. Create tests in spec/{name}_spec.rb covering initialization, basic operations, edge cases, and expected complexity (spec/trie_spec.rb)
  4. Optional: Add benchmark in benchmarks/{name}.rb comparing against similar containers or stdlib equivalents (benchmarks/heap.rb)

Add a new string algorithm or search algorithm

  1. For pure Ruby: implement in lib/algorithms/string.rb or lib/algorithms/search.rb; for C: implement in ext/algorithms/string/string.c (lib/algorithms/string.rb)
  2. Create test file spec/string_spec.rb or spec/search_spec.rb with test cases for typical inputs and edge cases (spec/string_spec.rb)
  3. If C extension, update ext/algorithms/string/extconf.rb and add extension to algorithms.gemspec (ext/algorithms/string/extconf.rb)

🔧Why these technologies

  • Ruby — Primary language for high-level API and pure Ruby data structures; provides clean, readable implementations that serve as reference implementations
  • C Extensions — Used for performance-critical data structures (deque, red-black trees, splay trees, string algorithms) where Ruby would be too slow for typical algorithmic workloads
  • RSpec — Standard Ruby testing framework used throughout spec/ for comprehensive test coverage and garbage collection mark testing
  • Rake — Build automation for compiling C extensions and running test suites; standard Ruby build tooling

⚖️Trade-offs already made

  • Hybrid pure Ruby + C extension architecture

    • Why: Provides both ease of understanding (Ruby code) and performance (C for hot paths); allows gradual optimization
    • Consequence: Increased maintenance burden: must keep Ruby and C implementations in sync, requires C knowledge for some contributions
  • Prioritize completeness of algorithms over 100% optimization

    • Why: Goal is to provide reference implementations and educate users on when to use different structures; not all are production-critical
    • Consequence: Some pure Ruby implementations may be slower than C; users must choose based on their constraints
  • No persistence or serialization layer

    • Why: Scope is in-memory data structures and algorithms; keeps library lightweight and focused
    • Consequence: Containers must be rebuilt from scratch each run; not suitable for long-lived distributed systems
  • Minimal dependencies (no external gems for core)

    • Why: Library should be self-contained and not introduce heavy transitive dependencies
    • Consequence: Must implement everything from scratch (e.g., heap, tree balancing); increases code volume but ensures stability

🚫Non-goals (don't propose these)

  • Does not provide thread-safe or concurrent data structures
  • Does not handle persistent/distributed storage
  • Does not provide GPU acceleration or SIMD optimization
  • Does not include

🪤Traps & gotchas

C extensions require a C compiler and Ruby development headers (ruby.h, ruby/version.h) to build via extconf.rb; build failures are common on systems without gcc/clang. Deque and other C containers (CRBTreeMap, CSplayTreeMap) have separate namespaces (e.g., Containers::CDeque vs Containers::Deque); mixing them may cause confusion. No indication of Ruby version constraints in provided data; verify Gemfile for compatibility if using modern Ruby versions.

🏗️Architecture

💡Concepts to learn

  • Red-Black Tree — Core data structure in this repo (Containers::RBTreeMap); understanding self-balancing invariants is essential to modify rb_tree_map.rb or rbtree.c correctly
  • Splay Tree — Alternative self-balancing tree in this repo (Containers::SplayTreeMap); different performance profile (amortized vs worst-case) than Red-Black trees
  • Heap (binary) — Foundation for Containers::Heap, Containers::MinHeap, Containers::MaxHeap and Algorithms::Sort.heapsort; core heap property needed to understand heap.rb
  • Knuth-Morris-Pratt algorithm — Implemented in Algorithms::Search.kmp_search (lib/algorithms/search.rb); demonstrates string pattern matching beyond naive O(nm) approach with failure function
  • Dual-Pivot Quicksort (Yaroslavskiy) — Advanced sorting algorithm in Algorithms::Sort.dualpivotquicksort; shows modern improvements over classic quicksort used in this repo's benchmarks
  • Trie (Prefix Tree) — Containers::Trie in lib/containers/trie.rb; efficient for string prefix queries and autocomplete use cases where this repo adds value
  • Suffix Array — Containers::SuffixArray in lib/containers/suffix_array.rb; space-efficient alternative to suffix trees for substring search in lib/algorithms/string.rb context
  • C extensions with Ruby FFI / native bindings — Critical to repo's performance strategy; ext/containers/ and ext/algorithms/ use extconf.rb pattern (Ruby 1.8+ C API); understanding this is essential for optimizing slow algorithms
  • ruby/set — Ruby stdlib data structure for comparison; algorithms repo fills gaps (Red-Black trees, heaps) stdlib lacks
  • evanphx/hash_ring — Ruby gem implementing consistent hashing; complements algorithms' tree/hash structures for distributed systems
  • mattheworiordan/ruby_extensions — Alternative Ruby extensions for performance; different approach to same problem of optimizing algorithmic operations
  • Pimpmychangelog/pimpmychangelog — Ecosystem: helps maintain the CHANGELOG.markdown format this repo uses consistently
  • google/BumbleBee — Inspiration/predecessor: Google Summer of Code mentoring tradition; algorithms repo is a GSOC 2008 project in same lineage

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive GC mark tests for all C extensions

The repo has GC mark tests for BST (bst_gc_mark_spec.rb) and Deque (deque_gc_mark_spec.rb), but C extensions for RBTreeMap, SplayTreeMap, and String are missing similar tests. This is critical for memory safety since Ruby's garbage collector must properly mark all reachable objects in C extensions to prevent crashes and memory leaks.

  • [ ] Create spec/rb_tree_map_gc_mark_spec.rb following the pattern in spec/bst_gc_mark_spec.rb
  • [ ] Create spec/splay_tree_map_gc_mark_spec.rb with similar structure
  • [ ] Create spec/string_gc_mark_spec.rb for ext/algorithms/string/string.c
  • [ ] Run existing GC mark tests to understand the testing pattern for marking internal Ruby object references
  • [ ] Verify all three new test files pass in CI

Add edge case and stress tests for priority_queue_spec.rb

The priority_queue is a wrapper around heap.rb but has minimal test coverage compared to other containers. The current spec/priority_queue_spec.rb lacks tests for edge cases (empty queue operations, single element, duplicate priorities) and stress tests that other data structures have. This is important for production reliability.

  • [ ] Review spec/heap_spec.rb and spec/bst_spec.rb for comprehensive test patterns
  • [ ] Add edge case tests: pop on empty queue, peek on empty queue, push/pop with duplicate priorities, with nil values
  • [ ] Add stress tests: bulk operations with 1000+ items, priority ordering under random insertions/deletions
  • [ ] Add tests for heap order property after mixed push/pop sequences
  • [ ] Verify tests pass and update ALGORITHM_INFO.md if needed with complexity notes

Complete missing benchmarks for all data structures in benchmarks/

The benchmarks directory only has 4 benchmark files (deque.rb, heap.rb, sorts.rb, treemaps.rb) but the repo includes 9+ data structures and algorithms (KDTree, Suffix Array, Trie, BST, Stacks, Queues, Search algorithms). Without benchmarks, contributors cannot validate performance improvements and users cannot understand performance characteristics.

  • [ ] Create benchmarks/bst.rb comparing Binary Search Tree operations (insert, search, delete) against hash/array
  • [ ] Create benchmarks/kd_tree.rb for nearest-neighbor and range queries using the test data in spec/kd_test_in.txt
  • [ ] Create benchmarks/trie.rb for insertion, search, and prefix lookup against alternatives
  • [ ] Create benchmarks/suffix_array.rb for pattern matching operations
  • [ ] Add these benchmarks to Rakefile and document in README.markdown how to run them

🌿Good first issues

  • Add comprehensive doctests or RDoc examples to lib/containers/suffix_array.rb and lib/containers/kd_tree.rb (no corresponding spec files visible in spec/ directory, only expected output kd_expected_out.txt)
  • Create performance regression tests in benchmarks/; add CI step to track benchmark results over commits (benchmarks/ exist but no integration with ci.yml)
  • Implement missing algorithm documentation: ALGORITHM_INFO.md is referenced but no details provided; add time/space complexity annotations and use-case guidance for each sort and search algorithm in lib/algorithms/

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 72bb2c8 — Also run CI on ruby 3.4 and 4.0 (kanwei)
  • 43d5c80 — Fix RBTreeMap#delete crash on missing keys (kanwei)
  • feaa5d5 — Release 1.1.0 (kanwei)
  • cea0610 — Fix Heap comparison not working on some Objects (#60) (kanwei)
  • b9d2a15 — Replace travis with Github Actions CI (#59) (kanwei)
  • ae48381 — Update bundler + dependencies (kanwei)
  • eb97a37 — Remove Enumerable from heap and priority queue (#54) (GarrisonJ)
  • d8834b2 — Release 1.0.1 (kanwei)
  • c254c97 — Add benchmarks for Heap (#52) (kanwei)
  • 3ccba10 — Update dependencies (kanwei)

🔒Security observations

The codebase is a pure algorithms and data structures library with moderate security posture. Primary concerns are around C extension code safety without visible hardening measures, and missing dependency audit visibility. The library itself doesn't appear to handle user input in security-sensitive ways (no SQL, crypto, or network functionality evident), reducing attack surface. However, as a library with native code, memory safety vulnerabilities in the C extensions could impact dependent applications. Recommendations focus on compiler hardening, dependency auditing, and establishing a security disclosure policy.

  • Medium · C Extension Compilation Security — ext/algorithms/string/extconf.rb, ext/containers/*/extconf.rb. Multiple C extensions (BST, Deque, RBTree, SplayTree) are compiled from source without visible security hardening flags. C extensions can be vulnerable to buffer overflows, integer overflows, and memory safety issues. The extconf.rb files don't show evidence of compiler security flags like -fstack-protector-strong, -D_FORTIFY_SOURCE=2, or other hardening options. Fix: Add compiler security flags to extconf.rb files. Include: $CFLAGS += ' -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O2'. Conduct security code review of C source files for buffer overflows and memory management issues.
  • Medium · Missing Dependency Lock File Content — Gemfile.lock. The Gemfile.lock is present in the file structure but its content was not provided for analysis. This prevents verification that dependencies are pinned to secure versions and don't have known vulnerabilities. Without visibility into locked versions, transitive dependency vulnerabilities cannot be assessed. Fix: Regularly audit Gemfile.lock with tools like bundler audit or bundle-audit. Pin all direct and transitive dependencies to specific versions. Review security advisories for all gems in the lock file.
  • Low · No Security Policy or Disclosure Process — Repository root. There is no visible SECURITY.md file or security policy documented for reporting vulnerabilities responsibly. Given that this is a data structures/algorithms library that could be used in security-sensitive contexts, this is a gap in project governance. Fix: Create a SECURITY.md file documenting responsible disclosure procedures. Include contact information and expected response timeframes for security vulnerabilities.
  • Low · Native Code Without Sanitizer Testing — .github/workflows/ci.yml, ext/. C extensions don't show evidence of being tested with memory sanitizers (AddressSanitizer, MemorySanitizer, UndefinedBehaviorSanitizer) in the CI configuration, which would help catch memory safety issues early. Fix: Integrate sanitizer testing in the CI pipeline. Add compilation and testing with ASAN/UBSAN flags. Document the sanitizer test results.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · kanwei/algorithms — RepoPilot