RepoPilotOpen in app →

roo-rb/roo

Roo provides an interface to spreadsheets of several sorts.

Healthy

Healthy across all four use cases

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 7mo ago
  • 16 active contributors
  • Distributed ownership (top contributor 48% of recent commits)
Show 4 more →
  • MIT licensed
  • CI configured
  • Tests present
  • Slowing — last commit 7mo ago

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/roo-rb/roo)](https://repopilot.app/r/roo-rb/roo)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/roo-rb/roo on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: roo-rb/roo

Generated by RepoPilot · 2026-05-10 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/roo-rb/roo shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • Last commit 7mo ago
  • 16 active contributors
  • Distributed ownership (top contributor 48% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present
  • ⚠ Slowing — last commit 7mo ago

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live roo-rb/roo repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/roo-rb/roo.

What it runs against: a local clone of roo-rb/roo — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in roo-rb/roo | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 251 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>roo-rb/roo</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of roo-rb/roo. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/roo-rb/roo.git
#   cd roo
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of roo-rb/roo and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "roo-rb/roo(\\.git)?\\b" \\
  && ok "origin remote is roo-rb/roo" \\
  || miss "origin remote is not roo-rb/roo (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "lib/roo.rb" \\
  && ok "lib/roo.rb" \\
  || miss "missing critical file: lib/roo.rb"
test -f "lib/roo/base.rb" \\
  && ok "lib/roo/base.rb" \\
  || miss "missing critical file: lib/roo/base.rb"
test -f "lib/roo/spreadsheet.rb" \\
  && ok "lib/roo/spreadsheet.rb" \\
  || miss "missing critical file: lib/roo/spreadsheet.rb"
test -f "lib/roo/excelx.rb" \\
  && ok "lib/roo/excelx.rb" \\
  || miss "missing critical file: lib/roo/excelx.rb"
test -f "lib/roo/excelx/cell/base.rb" \\
  && ok "lib/roo/excelx/cell/base.rb" \\
  || miss "missing critical file: lib/roo/excelx/cell/base.rb"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 251 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~221d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/roo-rb/roo"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Roo is a Ruby gem that provides a unified read interface to multiple spreadsheet formats: Excel 2007+ (xlsx, xlsm), LibreOffice/OpenOffice (ods), CSV, and legacy Excel formats (via roo-xls). It abstracts away format-specific parsing so developers interact with a consistent API regardless of the underlying file type. Single-package design with lib/roo/base.rb as the abstract base class and format-specific subclasses (Excelx, CSV, Ods). The lib/roo/excelx/ directory contains specialized cell types (Boolean, Date, Time, String), a Styles handler, and SharedStrings parser. Formatters in lib/roo/formatters/ allow output to CSV, Matrix, XML, YAML. The entry point is Roo::Spreadsheet in lib/roo.rb.

👥Who it's for

Ruby developers and data engineers who need to parse spreadsheets programmatically in Rails apps, ETL pipelines, or batch data processing scripts. They want to avoid learning format-specific libraries and need a simple interface like Roo::Spreadsheet.open(file).sheet('Info').row(1).

🌱Maturity & risk

Production-ready and actively maintained. The project shows regular CI/CD via GitHub Actions (ruby.yml workflow), has comprehensive test coverage (.simplecov config), and uses release-please for automated versioning. However, there's a noted plan for a major v4 release for better Ruby 3.x support (issue #630), indicating some technical debt.

Low risk for stable read operations, but watch for: (1) The upcoming major version bump suggests potential breaking changes, (2) Single-format implementations in files like lib/roo/excelx.rb and lib/roo/csv.rb mean bugs in one format don't affect others but indicate older monolithic design, (3) Reliance on external gems (roo-xls, roo-google) for certain formats creates ecosystem fragmentation.

Active areas of work

Active maintenance with release-please automation (.release-please-manifest.json, .github/workflows/release-please.yml). The .github/workflows/ruby.yml indicates CI runs on push/PR. The CHANGELOG.md and GitHub issue template suggest ongoing bug fixes and feature requests. The planned major version for Ruby 3.x support is pending community feedback.

🚀Get running

git clone https://github.com/roo-rb/roo.git
cd roo
bundle install
bundle exec rake test

Daily commands: This is a library, not a standalone app. Use in your Ruby code via require 'roo' and Roo::Spreadsheet.open('file.xlsx'). For development, run bundle exec rake test (Rakefile available). See examples/write_me.rb and examples/roo_soap_server.rb for runnable demos.

🗺️Map of the codebase

  • lib/roo.rb — Main entry point that exposes the Roo module and Spreadsheet factory class used by all consumers.
  • lib/roo/base.rb — Core abstract base class defining the interface all spreadsheet readers must implement; foundation for all format-specific implementations.
  • lib/roo/spreadsheet.rb — Factory class that routes file paths to appropriate reader based on extension; primary API entry point for end users.
  • lib/roo/excelx.rb — XLSX/XLSM format reader implementation; handles the most common modern Excel format with complex cell type parsing and styling.
  • lib/roo/excelx/cell/base.rb — Base cell abstraction for EXCELX format defining how cell values, types, and formulas are parsed across all cell variants.
  • lib/roo/csv.rb — CSV format reader implementation; simplest reader providing baseline for understanding the pattern.
  • lib/roo/constants.rb — Shared constants and magic numbers used across readers; critical for understanding date systems, cell types, and format conventions.

🧩Components & responsibilities

  • Spreadsheet (Factory) — Routes file extensions to

🛠️How to make changes

Add support for a new spreadsheet format

  1. Create new reader class inheriting from Roo::Base in lib/roo/newformat.rb (lib/roo/newformat.rb)
  2. Implement required interface methods: open, close, sheets, cell, last_row, last_column (lib/roo/base.rb)
  3. Register the new format in the factory router (lib/roo/spreadsheet.rb)
  4. Add test fixtures and specs following existing patterns in spec/lib/roo/ (spec/lib/roo/newformat_spec.rb)

Add a new cell type for EXCELX format

  1. Create cell type class inheriting from Roo::Excelx::Cell::Base (lib/roo/excelx/cell/newtype.rb)
  2. Implement value coercion logic in to_s and other conversion methods (lib/roo/excelx/cell/newtype.rb)
  3. Register cell type in cell factory dispatcher (lib/roo/excelx/cell/base.rb)
  4. Add test fixtures and unit tests (test/excelx/cell/test_newtype.rb)

Add a new output formatter

  1. Create formatter class inheriting from Roo::Formatters::Base in lib/roo/formatters/newformat.rb (lib/roo/formatters/newformat.rb)
  2. Implement format method to transform spreadsheet data to target format (lib/roo/formatters/newformat.rb)
  3. Expose via Roo::Base#to_newformat convenience method (lib/roo/base.rb)
  4. Add tests verifying output format correctness (spec/lib/roo/formatters/newformat_spec.rb)

🔧Why these technologies

  • Ruby — Choice of language for the library; supports duck typing which enables polymorphic reader pattern without verbose interfaces
  • Nokogiri (XML parsing) — Efficiently parses Office Open XML and ODS format schemas
  • Rubyzip (ZIP handling) — Both XLSX and ODS are ZIP archives; Rubyzip provides transparent access to archived XML files
  • WeakRef caching — Avoids memory bloat from caching parsed sheets while allowing garbage collection under memory pressure

⚖️Trade-offs already made

  • Lazy cell parsing: cells parsed on-demand rather than all at once on open

    • Why: Large spreadsheets would consume excessive memory if all cells were materialized upfront
    • Consequence: First cell access in a sheet has slight overhead; trade-off favors memory efficiency over raw speed
  • Single-pass reader design: readers do not support write operations

    • Why: Simplifies implementation; avoids complexity of maintaining index consistency during mutations
    • Consequence: Read-only API; users cannot modify spreadsheets in-place (must use external tools for writes)
  • Format-specific cell type classes rather than unified cell model

    • Why: Enables precise type preservation (e.g., EXCELX distinguishes date vs datetime, CSV treats all as strings)
    • Consequence: More code duplication across readers; allows accurate fidelity to source format semantics
  • 1900 and 1904 date system support

    • Why: Excel has historical bug where 1900 is treated as leap year; must support both legacy Excel and modern standards
    • Consequence: Date parsing requires knowing workbook's date system; added constant complexity in Excelx::Styles

🚫Non-goals (don't propose these)

  • Does not support writing or modifying spreadsheets (read-only library)
  • Does not handle real-time collaborative editing or streaming updates
  • Does not provide in-memory formula evaluation (reads formula results only)
  • Does not support VBA macros or custom XML
  • Does not provide a unified ORM-style query interface across formats

🪤Traps & gotchas

(1) Cell coordinates are 1-indexed (Excel convention), not 0-indexed like Ruby Arrays—this is documented but trips developers. (2) The Roo::Excelx parser decompresses ZIP on instantiation; large xlsx files can be memory-heavy. (3) The roo gem reads only; write support requires roo-google for Google Sheets or separate XLSX/ODS gems—this asymmetry can surprise users. (4) File format detection uses extension, not magic bytes; passing a renamed file without the extension: option will fail. (5) Sheet names are case-sensitive when calling .sheet('Info') vs .sheet('info').

🏗️Architecture

💡Concepts to learn

  • Office Open XML (OOXML) — Modern Excel files (.xlsx) are ZIP archives containing XML; understanding OOXML structure (workbook.xml, sheet*.xml, sharedStrings.xml, styles.xml) is essential to comprehend how lib/roo/excelx/ parses these files and why shared strings are deduplicated.
  • Shared Strings Table (SST) — OOXML optimizes repeated text by storing unique strings once in sharedStrings.xml and referencing them by index in cells; lib/roo/excelx/shared_strings.rb implements this, making it crucial to understand why large Excel files are smaller and parsing strategy matters.
  • Polymorphic Cell Types — Roo uses inheritance hierarchy (lib/roo/excelx/cell/base.rb → Boolean, Date, Number, String) to handle type-specific parsing logic; this pattern isolates format differences and makes adding new cell types straightforward.
  • ZIP-based Formats (OOXML, ODS) — Both .xlsx and .ods are ZIP archives; Roo uses rubyzip to extract them. Understanding this dual nature explains why large files decompress into memory and why formats share extraction logic.
  • Format Auto-detection via File Extension — Roo::Spreadsheet.open() uses file extension to route to the correct parser; unlike magic-byte detection, this is fast but can fail with renamed files—understanding this trade-off prevents integration bugs.
  • 1-indexed Spreadsheet Coordinates — Excel and Roo use 1-based indexing (row 1, column A) while Ruby uses 0-based; this mismatch is a source of off-by-one errors for Ruby developers unfamiliar with spreadsheet conventions.
  • Strategy Pattern for Output Formatters — lib/roo/formatters/ (base.rb, csv.rb, matrix.rb, xml.rb, yaml.rb) implements the Strategy pattern; each formatter encapsulates a different output algorithm, making it easy to add new export formats without modifying core parsing logic.
  • roo-rb/roo-xls — Official companion gem extending Roo to support legacy Excel 97-2003 formats (.xls, .xml) using the same API
  • roo-rb/roo-google — Official companion gem adding read/write access to Google Sheets, maintaining Roo's unified interface for cloud spreadsheets
  • jnunemaker/csv — Ruby standard library CSV module that Roo wraps; understanding Ruby's CSV behavior is essential for CSV parsing in Roo
  • ruby-zip/rubyzip — Underlying gem used to extract OOXML and ODS ZIP archives; Roo depends on this for all modern Excel and LibreOffice parsing
  • rails/rails — Roo is commonly used in Rails apps for bulk data import features; many examples and issues reference Active Record integration

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive test coverage for Roo::Excelx::Cell type detection and formatting

The lib/roo/excelx/cell/ directory has multiple cell type classes (boolean.rb, date.rb, datetime.rb, number.rb, string.rb, time.rb, empty.rb) but there's no dedicated test suite visible in the file structure for edge cases like mixed numeric/string formats, timezone handling in datetime cells, or boolean value parsing. This is critical since cell type misdetection causes downstream data integrity issues.

  • [ ] Create spec/lib/roo/excelx/cell/ directory structure mirroring lib/roo/excelx/cell/
  • [ ] Add spec/lib/roo/excelx/cell/type_detection_spec.rb with tests for boundary conditions (empty cells, malformed dates, locale-specific number formats)
  • [ ] Add spec/lib/roo/excelx/cell/datetime_spec.rb testing timezone handling and DST edge cases
  • [ ] Add spec/lib/roo/excelx/cell/number_spec.rb testing scientific notation, currency symbols, and precision loss
  • [ ] Run test suite and update spec coverage reports in .simplecov

Implement formatter consistency tests and add missing YAML/XML formatter documentation

The lib/roo/formatters/ directory has multiple formatter implementations (base.rb, csv.rb, matrix.rb, xml.rb, yaml.rb) but there's no visible integration test suite validating that all formatters handle the same edge cases (empty sheets, merged cells, special characters, large datasets). Additionally, the README mentions formatters but provides no usage examples.

  • [ ] Create spec/lib/roo/formatters/integration_spec.rb with shared examples that all formatters must pass (empty sheet, single cell, multi-sheet handling)
  • [ ] Add specific tests in spec/lib/roo/formatters/ for xml.rb and yaml.rb handling of special characters and Unicode
  • [ ] Document formatter usage in README.md with code examples for each formatter type (CSV, YAML, XML, Matrix)
  • [ ] Verify all formatters consistently handle nil/empty cell values by adding explicit tests to spec/lib/roo/formatters/base_spec.rb

Add integration tests for Google Drive and sheet format conversion workflows

The spec/fixtures/vcr_cassettes/ directory shows Google Drive integration exists (google_drive.yml, google_drive_access_token.yml, google_drive_set.yml) but there's no visible spec/lib/roo/google*.rb test file in the listing. This creates a gap where the integration layer between local spreadsheet formats and Google Drive isn't validated for format conversions, authentication flows, or error handling.

  • [ ] Create spec/lib/roo/google_drive_integration_spec.rb testing read/write round-trip workflows (Google Sheet → local format → Google Sheet)
  • [ ] Add error handling tests for invalid credentials, network timeouts, and permission denied scenarios using the existing VCR cassettes
  • [ ] Add spec/lib/roo/spreadsheet_format_conversion_spec.rb testing that Roo::Spreadsheet.open() correctly detects and converts between xlsx/ods/csv/google formats
  • [ ] Document the Google Drive workflow in README.md with authentication setup and usage examples, linking to roo-google gem

🌿Good first issues

  • Add comprehensive type conversion tests for lib/roo/excelx/cell/ classes—the Boolean, Date, Datetime, and Time cell types exist but there's no visible spec/ directory in the file list showing whether edge cases (invalid dates, timezone handling, epoch boundaries) are covered. A contributor could add tests/fixtures for these.
  • Document the difference between Roo::Spreadsheet.open(file, extension: :xlsx) and direct class instantiation Roo::Excelx.new(file) in the README—the examples show both patterns but don't explain when to use each or performance trade-offs.
  • Add support for accessing cell formulas (not just values) in Excelx format—lib/roo/excelx/cell/base.rb currently parses cell values but the OOXML format stores <f> tags with formulas that are discarded; a new method like cell_formula(row, col) would be valuable for ETL auditing.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • 20d424f — chore(master): release 3.0.0 (#634) (github-actions[bot])
  • 39e04c8 — fix config of release please (simonoff)
  • 4567fb4 — make it add to top (simonoff)
  • 6d56621 — Revert "Merge pull request #633 from roo-rb/release-please--branches--master--components--roo" (simonoff)
  • 8f7d433 — Merge pull request #633 from roo-rb/release-please--branches--master--components--roo (simonoff)
  • 4ec684d — Update CHANGELOG.md (simonoff)
  • 38d5d22 — chore(master): release roo 3.0.0 (github-actions[bot])
  • cc6f6d8 — Revert "Merge pull request #632 from roo-rb/release-please--branches--master" (simonoff)
  • 8cc5961 — fix release please (simonoff)
  • 0344722 — Merge pull request #632 from roo-rb/release-please--branches--master (simonoff)

🔒Security observations

The Roo spreadsheet processing library has moderate security posture. Primary concerns include potential XXE vulnerabilities in XML parsing, zip file path traversal risks during extraction, and temporary file handling. The codebase lacks visible hardcoded secrets or SQL injection risks (appropriate for a library). Most vulnerabilities are contingent on the library handling malicious input files or the calling application misusing output data. Recommendations focus on securing XML parsing, validating zip extraction paths, proper temporary file cleanup, and maintaining updated dependencies. The library's purpose (read-only spreadsheet access) inherently limits some attack vectors.

  • Medium · Potential XML External Entity (XXE) Injection in Excel/OpenOffice Processing — lib/roo/excelx/, lib/roo/open_office.rb, lib/roo/libre_office.rb. The codebase processes XML from Excel (xlsx/xlsm) and OpenOffice (ods) formats. If XML parsing is not properly configured to disable external entities, XXE attacks could be possible when parsing malicious spreadsheet files. Files like lib/roo/excelx/sheet_doc.rb and lib/roo/excelx/workbook.rb likely parse XML without visible XXE protections in the file structure. Fix: Ensure all XML parsing uses secure configurations: disable DTDs, external entities, and parameter entities. Use Nokogiri with proper security flags (e.g., Nokogiri::XML.parse(xml, nil, nil, Nokogiri::XML::ParseOptions::NONET))
  • Medium · Zip File Extraction Without Path Validation — lib/roo/excelx/extractor.rb, lib/roo/tempdir.rb. The codebase extracts files from zip archives (xlsx, ods formats). If path traversal validation is insufficient, malicious zip files could extract files outside intended directories, potentially overwriting system files or causing directory traversal attacks. Fix: Validate extracted file paths to ensure they remain within the intended extraction directory. Use methods like File.expand_path and verify paths don't contain .. or absolute paths. Consider using gems like SafeZip
  • Low · Temporary Directory Usage Without Secure Cleanup — lib/roo/tempdir.rb. lib/roo/tempdir.rb manages temporary directories for extracted spreadsheet contents. If temporary files are not securely cleaned up or use predictable paths, sensitive data from spreadsheets could be exposed to other local processes. Fix: Ensure temporary directories use secure permissions (0700), are created with Dir.mktmpdir, and are always cleaned up via ensure blocks or finalizers. Verify files are not world-readable
  • Low · Insufficient Input Validation on Cell Data — lib/roo/excelx/cell/, lib/roo/base.rb. The codebase processes cell data from various spreadsheet formats without comprehensive validation. While not directly exploitable, this could lead to issues if the output is used in vulnerable contexts (e.g., unsanitized web display). Fix: Document that cell values should be sanitized before use in web contexts. Add input validation for cell references and range operations. Implement bounds checking for coordinate operations
  • Low · VCR Test Fixture Data Exposure — spec/fixtures/vcr_cassettes/google_drive.yml, spec/fixtures/vcr_cassettes/google_drive_access_token.yml. The spec/fixtures/vcr_cassettes/ directory contains recorded HTTP interactions including potentially sensitive Google Drive API tokens and credentials in plaintext YAML format. Fix: Ensure VCR cassettes never contain real credentials. Use VCR's record mode 'none' in CI, mask sensitive data with VCR's filter_sensitive_data or define_cassette options, and add cassettes to .gitignore if they contain any real tokens
  • Low · Missing Dependency File for Analysis — roo.gemspec, Gemfile. The gemspec and Gemfile were not provided in the analysis context. Dependencies cannot be checked for known vulnerabilities. The gem likely depends on libraries like Nokogiri, Rubyzip, and others that could have security issues if outdated. Fix: Regularly run bundle audit to check for vulnerable dependencies. Keep dependencies updated. Review the security advisories for Nokogiri, Rubyzip, and any XML/compression libraries used

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.