RepoPilotOpen in app →

tealeg/xlsx

Go library for reading and writing XLSX files.

Mixed

Slowing — last commit 9mo ago

weakest axis
Use as dependencyConcerns

non-standard license (Other)

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 9mo ago
  • 21+ active contributors
  • Other licensed
Show all 8 evidence items →
  • CI configured
  • Tests present
  • Slowing — last commit 9mo ago
  • Concentrated ownership — top contributor handles 58% of recent commits
  • Non-standard license (Other) — review terms
What would change the summary?
  • Use as dependency ConcernsMixed if: clarify license terms

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Forkable" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Forkable
[![RepoPilot: Forkable](https://repopilot.app/api/badge/tealeg/xlsx?axis=fork)](https://repopilot.app/r/tealeg/xlsx)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/tealeg/xlsx on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: tealeg/xlsx

Generated by RepoPilot · 2026-05-09 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/tealeg/xlsx shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

WAIT — Slowing — last commit 9mo ago

  • Last commit 9mo ago
  • 21+ active contributors
  • Other licensed
  • CI configured
  • Tests present
  • ⚠ Slowing — last commit 9mo ago
  • ⚠ Concentrated ownership — top contributor handles 58% of recent commits
  • ⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live tealeg/xlsx repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/tealeg/xlsx.

What it runs against: a local clone of tealeg/xlsx — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in tealeg/xlsx | Confirms the artifact applies here, not a fork | | 2 | License is still Other | Catches relicense before you depend on it | | 3 | Default branch master exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 297 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>tealeg/xlsx</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of tealeg/xlsx. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/tealeg/xlsx.git
#   cd xlsx
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of tealeg/xlsx and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "tealeg/xlsx(\\.git)?\\b" \\
  && ok "origin remote is tealeg/xlsx" \\
  || miss "origin remote is not tealeg/xlsx (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Other)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Other\"" package.json 2>/dev/null) \\
  && ok "license is Other" \\
  || miss "license drift — was Other at generation time"

# 3. Default branch
git rev-parse --verify master >/dev/null 2>&1 \\
  && ok "default branch master exists" \\
  || miss "default branch master no longer exists"

# 4. Critical files exist
test -f "file.go" \\
  && ok "file.go" \\
  || miss "missing critical file: file.go"
test -f "sheet.go" \\
  && ok "sheet.go" \\
  || miss "missing critical file: sheet.go"
test -f "cell.go" \\
  && ok "cell.go" \\
  || miss "missing critical file: cell.go"
test -f "read.go" \\
  && ok "read.go" \\
  || miss "missing critical file: read.go"
test -f "write.go" \\
  && ok "write.go" \\
  || miss "missing critical file: write.go"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 297 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~267d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/tealeg/xlsx"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

xlsx is a Go library for reading and writing Microsoft Excel XLSX files (Open XML format). It parses and generates the zipped XML structure underlying .xlsx documents, handling sheets, cells, styles, formulas, data validation, and hyperlinks without external dependencies. Core capability: manipulate Excel workbooks programmatically in pure Go. Monolithic single-package architecture in the root directory: core parsing logic in read.go/file.go/sheet.go/cell.go; style/format handling in style.go/format_code.go/hsl.go; storage backends in memory.go (in-memory) and diskv.go (disk-backed); auxiliary features (richtext.go, data_validation.go, date.go) are self-contained modules. Tests are colocated (*_test.go files).

👥Who it's for

Go backend developers and DevOps engineers who need to parse Excel reports, generate bulk spreadsheets from databases, or automate Excel-based workflows without installing Excel or dependencies on native C libraries.

🌱Maturity & risk

Moderately mature but flagged for transition: the library is 10+ years old with extensive test coverage (cell_test.go, sheet_test.go, read_test.go, compat tests, fuzz tests) and CI via GitHub Actions (go.yml). However, the README prominently states the GitHub version is now unmaintained—the project was migrated to Codeberg as v4 with no further support on GitHub. This makes it at production-ready if you accept the GitHub version is in maintenance-only mode.

Single-maintainer abandonment risk: the official notice explicitly states no further support on GitHub; the author attempted to let it die but relented. Dependencies are minimal and stable (frankban/quicktest, peterbourgon/diskv, shabbyrobe/xmlwriter), but the primary risk is lack of active development and the need to migrate to Codeberg for ongoing support. Breaking API changes occurred between v1→v2→v3, suggesting API surface instability.

Active areas of work

No active development visible on GitHub; the README states the project is now maintained on Codeberg (v4) only. The last activity on the GitHub mirror appears to be maintenance fixes, but the canonical repository has moved. Contributions should go to Codeberg, not GitHub.

🚀Get running

git clone https://github.com/tealeg/xlsx.git
cd xlsx
go mod download
go test ./...

Note: The project uses Go 1.18+ (from go.mod) and has no build script—testing is direct via go test.

Daily commands: This is a library, not an executable. Run tests with go test ./.. or go test -v -race for race detection. See example_read_test.go for example usage patterns in code.

🗺️Map of the codebase

  • file.go — Core File struct and primary API entry point for reading/writing XLSX workbooks; all operations begin here.
  • sheet.go — Sheet abstraction managing rows, columns, and cell data; essential for understanding worksheet organization.
  • cell.go — Cell struct with type coercion, formatting, and rich text support; foundational for data access patterns.
  • read.go — XLSX unmarshaling logic that parses XML into memory structures; critical for understanding file parsing pipeline.
  • write.go — XLSX marshaling logic that serializes in-memory structures back to XML; required for understanding save operations.
  • xmlWorksheet.go — XML schema mappings for worksheet content; essential for understanding cell storage and relationships.
  • xmlSharedStrings.go — Shared strings table management for deduplicated string storage; key to understanding memory efficiency.

🛠️How to make changes

Add Support for a New Cell Data Type

  1. Add type constant to cell.go and implement type detection in GetValue() (cell.go)
  2. Add XML marshaling rules in xmlWorksheet.go for the new type tag (xmlWorksheet.go)
  3. Update read.go to parse the new type during unmarshaling (read.go)
  4. Update write.go to serialize the new type during marshaling (write.go)
  5. Add test cases in cell_test.go covering round-trip read/write (cell_test.go)

Add a New Cell Style Property (e.g., Underline Style)

  1. Add field to Style struct in style.go (style.go)
  2. Update xmlStyle.go to include the property in the styles.xml schema (xmlStyle.go)
  3. Update read.go XML unmarshaling to populate the new property (read.go)
  4. Update write.go XML marshaling to serialize the property (write.go)
  5. Add test cases in style_test.go with sample XLSX files (style_test.go)

Implement a New Cell Storage Backend

  1. Create new file (e.g., custom_store.go) implementing the CellStore interface from cellstore.go (cellstore.go)
  2. Add factory function in file.go or lib.go to instantiate the new backend (file.go)
  3. Update read.go to use the new backend when parsing large files (read.go)
  4. Add comprehensive tests following patterns in memory_test.go and diskv_test.go (memory_test.go)

Add Support for a New XLSX Feature (e.g., Conditional Formatting)

  1. Create new struct file (e.g., conditional_format.go) with feature definition (sheet.go)
  2. Add new XML schema file (e.g., xmlConditionalFormat.go) for parsing/serializing (xmlWorksheet.go)
  3. Integrate into read.go unmarshaling pipeline (read.go)
  4. Integrate into write.go marshaling pipeline (write.go)
  5. Add test XLSX file to testdocs/ and create corresponding _test.go file (compatibility_test.go)

🔧Why these technologies

  • Go standard library (encoding/xml, archive/zip) — XLSX is ZIP-archived XML; Go's stdlib provides native, efficient parsing without external C bindings
  • diskv (disk-based key-value store) — Allows reading very large XLSX files (~1GB+) without loading entire sheet into RAM; pluggable backend
  • xmlwriter (third-party XML writer) — Streaming XML generation reduces memory footprint when writing large workbooks compared to in-memory DOM
  • quicktest + check.v1 (testing frameworks) — Comprehensive test coverage for XLSX spec compliance across multiple Excel versions and edge cases

⚖️Trade-offs already made

  • Pluggable cell storage backend (memory vs. diskv)

    • Why: Trade complexity for scalability; small files stay fast in RAM, large files can use disk backend
    • Consequence: Code must abstract cell access; slightly slower performance than pure in-memory for typical files
  • Shared strings table deduplication

    • Why: XLSX spec requires it; reduces file size significantly for repetitive data
    • Consequence: String access requires indirection through shared strings lookup; more GC pressure on large files
  • No lazy loading of sheets on open

    • Why: Simpler API; all sheets parsed into memory immediately
    • Consequence: Large workbooks with many sheets consume proportional RAM; cannot stream individual sheets independently
  • Full in-memory style cache

    • Why: Excel styles are indexed; must be fully resolved before cell access
    • Consequence: Parsing overhead even for style-light files; all style.xml evaluated upfront

🚫Non-goals (don't propose these)

  • Real-time collaborative editing or multi-writer conflict resolution
  • Chart rendering or display (reads chart metadata but does not render visuals)
  • VBA macro execution or script support (parses but does not execute)
  • Streaming write (entire workbook must be built in memory before Save())
  • Windows-only COM interop with Microsoft Excel

🪤Traps & gotchas

  1. No XML validation during read: The library is lenient with malformed XLSX; corrupted sheets may parse without errors but return incorrect data. 2. Memory usage with large files: By default, all rows are loaded into memory (memory.go)—you must explicitly use diskv.go to stream large workbooks, or you'll hit OOM. 3. Formula strings, not evaluation: The library reads cell formulas as strings (=SUM(...)) but does not evaluate them; users must re-compute or parse formula text manually. 4. Style/Format ID synchronization: Styles are stored as integer references in the stylesheet; manually adding styles without proper ID management causes corruption. 5. GitHub vs Codeberg split: The GitHub repository is no longer maintained; issues filed here will not be addressed. Migration path to Codeberg is unclear for existing users.

🏗️Architecture

💡Concepts to learn

  • OOXML (Office Open XML) — XLSX is a zipped folder of XML files; understanding OOXML structure (workbook.xml, worksheets/sheet1.xml, styles.xml, _rels/) is essential to debug parsing issues and add features.
  • Lazy Loading / Streaming — The library uses lazy evaluation for rows and cells to avoid loading entire workbooks into RAM; diskv.go implements disk-backed lazy loading, critical for understanding performance trade-offs.
  • Storage Abstraction Pattern — xlsx defines cellStore and rowStore interfaces (memory.go vs diskv.go) to swap storage backends; understanding this pattern is needed to add custom storage or optimize for specific workload.
  • Stylesheet ID Mapping — Excel styles are stored in styles.xml with integer IDs; cells reference styles by ID. Incorrect ID tracking causes style loss or corruption—critical for style.go modifications.
  • Relationship Files (_rels/) — XLSX uses .rels files to link sheets to workbook, cells to hyperlinks, etc.; reftable.go and read.go parse these relationships; missing or broken rels cause parsing failures.
  • Cell Value Types (CellTypeString, CellTypeNumeric, etc.) — Excel distinguishes inline strings, formula results, dates, and numbers using type flags; cell.go's Type field and formatting rules depend on correct type inference.
  • Zip-based Archive Format — XLSX files are ZIP archives; read.go uses archive/zip from stdlib to extract and parse XML; understanding ZIP boundaries helps debug why certain .xlsx files fail to open.
  • qax-os/excelize — Modern Go library for XLSX reading/writing with active development; better alternative if you need formula evaluation, charts, and VBA macros, though heavier than xlsx.
  • go-echarts/go-echarts — Companion library for generating interactive charts in Excel files; works with xlsx to add visualization capabilities.
  • golang/go — Core Go language and standard library; xlsx relies on encoding/xml and archive/zip from stdlib for XLSX parsing.
  • peterbourgon/diskv — External dependency for disk-backed storage in diskv.go; needed for streaming large workbooks without memory overhead.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive tests for diskv.go storage backend

diskv.go provides an alternative storage backend to memory.go for handling large spreadsheets, but diskv_test.go appears minimal compared to the feature complexity. The diskv storage is critical for production use with large files (evidenced by test files like large_sheet_large_sharedstrings_dimension_tag.xlsx), yet lacks thorough coverage of edge cases, concurrent access, cleanup, and error handling scenarios.

  • [ ] Review diskv.go and identify untested code paths (error cases, cache invalidation, temporary file cleanup)
  • [ ] Expand diskv_test.go with tests for: concurrent read/write operations, disk space exhaustion, corrupted cache recovery, and large file streaming
  • [ ] Add benchmarks comparing memory.go vs diskv.go performance on the existing large test files
  • [ ] Test integration with cellstore.go to ensure diskv backend works correctly in full read/write workflows

Add unit tests for data_validation.go feature coverage

data_validation.go implements Excel's data validation constraints but data_validation_test.go likely has incomplete coverage. Data validation is a critical feature for spreadsheet integrity, and the existing test files don't show dedicated validation test cases. This is a mature feature that deserves robust testing.

  • [ ] Analyze data_validation.go to identify all validation types supported (list, range, date, decimal, text length, custom formulas)
  • [ ] Expand data_validation_test.go with tests for: creating validations, reading from existing XLSX files with validations, round-trip preservation, and invalid constraint detection
  • [ ] Create or use test files that contain various data validation rules and ensure they parse correctly
  • [ ] Test edge cases: circular references in validation formulas, very long validation lists, Unicode characters in validation messages

Add GitHub Actions workflow for fuzzing and security scanning

The repo includes fuzz.go and fuzzit.sh suggesting fuzzing capability, but .github/workflows/go.yml only runs standard tests. There's no continuous fuzzing CI pipeline to catch edge cases and potential security issues in XML/ZIP parsing. Given that xlsx parses untrusted XLSX files, security scanning is especially valuable.

  • [ ] Review fuzz.go and fuzzit.sh to understand current fuzzing setup and integrate into CI
  • [ ] Create a new GitHub Actions workflow that runs Go fuzzing targets against the xlsx parsing logic on each commit
  • [ ] Add go-fuzz-build integration to catch parsing panics and malformed input handling in read.go
  • [ ] Optionally add gosec or similar security linter to detect potential vulnerabilities in the codebase, particularly around XML unmarshaling and file I/O

🌿Good first issues

  • Add comprehensive documentation for the diskv.go storage backend: include a worked example showing how to read a 1GB XLSX file without memory issues, and add a tutorial/diskv_example_test.go demonstrating streaming row iteration.
  • Write formula evaluation functions: extend cell.go to parse and evaluate common formulas (SUM, AVERAGE, COUNT, etc.) instead of returning formula strings; start with SUM to build the pattern.
  • Improve error messages in read.go: many parsing failures are silent (e.g., missing workbook.xml returns empty workbook); add structured error types (ErrMissingWorksheet, ErrBadRelationship) so callers can distinguish recoverable from fatal issues.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • c4b90f0 — Update readme (tealeg)
  • 611bf2b — Merge pull request #846 from larsve/fix/add-rows-after-sheet-rename (tealeg)
  • 9874921 — Merge pull request #844 from Sajito/fix/parse-with-hyperlinks (tealeg)
  • 9f8dba7 — Fixes issue when newly added rows are missing from file (larsve)
  • 2608549 — fix: parse relations when opening file (Sajito)
  • df122de — Merge pull request #841 from tealeg/turn-off-stale-action (tealeg)
  • d32f3f6 — Turn off automatic mark and delete of stale issues/prs (tealeg)
  • e96f423 — Merge pull request #839 from Sajito/fix/global-defined-names (tealeg)
  • 233ab0f — Merge pull request #840 from adambaratz/uint-cell (tealeg)
  • 1659caf — Handle unsigned ints (adambaratz)

🔒Security observations

The XLSX library has moderate security concerns primarily related to outdated dependencies (Go 1.18 and multiple unmaintained packages), potential XXE vulnerabilities in XML processing, and decompression bomb risks. The repository is also no longer actively maintained on GitHub, creating supply chain risk for users. Immediate actions should include: upgrading to modern Go versions, updating all dependencies, implementing XXE protections in XML parsing, and adding input validation for XLSX file structures. The migration to Codeberg should be prominently

  • High · Outdated Go Version — go.mod. The project specifies 'go 1.18' in go.mod, which was released in March 2022 and is now significantly outdated. Go 1.18 is no longer receiving security patches. This exposes the project to known vulnerabilities in the Go runtime and standard library. Fix: Update to the latest stable Go version (1.21+). Modify go.mod to specify a recent version and rebuild all dependencies.
  • High · Outdated Dependencies with Known Vulnerabilities — go.mod, go.sum. Multiple dependencies have outdated versions that may contain known security vulnerabilities: golang.org/x/text v0.3.8 (from 2022), gopkg.in/check.v1 v1.0.0-20200902074654-038fdea0a05b (unmaintained), and other transitive dependencies. These are significantly behind current versions. Fix: Run 'go get -u ./...' to update all dependencies to their latest secure versions. Review CVE databases for these packages: github.com/rogpeppe/fastuuid, github.com/shabbyrobe/xmlwriter, gopkg.in/check.v1, and golang.org/x/text.
  • Medium · XML External Entity (XXE) Vulnerability Risk — read.go, xmlContentTypes.go, xmlWorkbook.go, xmlSharedStrings.go, xmlStyle.go. The repository is an XLSX reader/writer that parses XML files. XLSX files contain XML structures. If XML parsing is not properly configured to disable external entity resolution and DTD processing, it could be vulnerable to XXE attacks when processing malicious XLSX files. Fix: Review all XML parsing code (likely using encoding/xml package) to ensure XXE protections are in place. Disable DTD processing and external entity resolution. Test with XXE payload files.
  • Medium · Zip Bomb / Decompression Bomb Risk — read.go, file.go. XLSX files are ZIP archives containing XML. Without proper size validation during decompression, the library could be vulnerable to decompression bomb attacks where small files decompress to enormous sizes, causing DoS. Fix: Implement limits on decompressed file sizes. Set maximum thresholds for individual file sizes and total archive size during ZIP extraction. Add timeout protection for decompression operations.
  • Medium · No Input Validation on Uploaded XLSX Files — read.go, file.go, sheet.go, cell.go. The codebase includes test files (testdocs/) but there's no evidence of comprehensive input validation for XLSX file structure, sheet dimensions, cell counts, or formula content. Malformed or malicious XLSX files could cause crashes or resource exhaustion. Fix: Implement strict validation for: sheet dimensions, cell counts, string lengths, file size limits, and formula complexity. Sanitize cell formulas to prevent injection attacks if formulas are evaluated.
  • Low · Deprecated Testing Framework — go.mod. The project uses gopkg.in/check.v1 which is an older, less actively maintained testing framework. While not a direct security vulnerability, it may contain unfixed bugs. Fix: Consider migrating to the standard Go testing library or more modern alternatives like testify. This is a maintenance improvement rather than critical security fix.
  • Low · Repository Migration Notice - Potential Supply Chain Risk — README.org, GitHub repository. The README indicates the project has been migrated to Codeberg and the GitHub version is no longer maintained. Users may unknowingly depend on an unmaintained version, missing critical security updates. Fix: Archive the GitHub repository or add clear security warnings. Ensure all users are directed to the maintained version on Codeberg (v4+). Update go.mod module path if applicable.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Mixed signals · tealeg/xlsx — RepoPilot