apache/fesod
Fast. Easy. Done. Processing spreadsheets without worrying about large files causing OOM.
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit 2d ago
- ✓17 active contributors
- ✓Distributed ownership (top contributor 25% of recent commits)
Show all 6 evidence items →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/apache/fesod)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/apache/fesod on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: apache/fesod
Generated by RepoPilot · 2026-05-09 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/apache/fesod shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit 2d ago
- 17 active contributors
- Distributed ownership (top contributor 25% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live apache/fesod
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/apache/fesod.
What it runs against: a local clone of apache/fesod — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in apache/fesod | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch main exists | Catches branch renames |
| 4 | Last commit ≤ 32 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of apache/fesod. If you don't
# have one yet, run these first:
#
# git clone https://github.com/apache/fesod.git
# cd fesod
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of apache/fesod and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "apache/fesod(\\.git)?\\b" \\
&& ok "origin remote is apache/fesod" \\
|| miss "origin remote is not apache/fesod (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
&& ok "default branch main exists" \\
|| miss "default branch main no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 32 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~2d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/apache/fesod"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
Apache Fesod is a Java library for processing large spreadsheets (Excel/CSV) efficiently without causing out-of-memory errors. It uses streaming and incremental parsing strategies to handle files that would exhaust heap memory in traditional libraries like Apache POI, focusing on speed and memory safety for spreadsheet data extraction and transformation. Multi-module Maven monorepo: fesod-bom (Bill of Materials for version management), fesod-common (shared utilities), fesod-sheet (core streaming engine for spreadsheet parsing). Dependencies managed via parent pom with ${revision} for consistent versioning across modules. CI workflows in .github/workflows/ execute test suites and license checks on push.
👥Who it's for
Java backend developers and data engineers who need to parse large Excel/CSV files in production systems without crashing due to heap exhaustion. Developers maintaining ETL pipelines, batch data processors, and reporting systems that ingest user-uploaded spreadsheets.
🌱Maturity & risk
Production-ready. Apache Software Foundation project with Apache 2.0 license, Maven Central artifacts (org.apache.fesod:fesod-sheet), comprehensive CI/CD via GitHub Actions (ci.yml, nightly.yml, CodeQL scanning), and clear version management via fesod-parent and fesod-bom BOM artifacts. Active maintenance indicated by multiple workflows and structured issue templates.
Low risk overall. Well-structured Maven multi-module project with BOM dependency management and explicit license tracking in /dist/licenses/. Dependencies include battle-tested POI libraries and commons utilities. Primary risk: Java-specific ecosystem locks you into JVM; check if language migration is future-proofed in your roadmap.
Active areas of work
Active Apache incubation/release cycle. Nightly tests run via nightly.yml, security scanning with CodeQL, and stale issue auto-closure via GitHub Actions. The SKILL.md migration guide suggests fastexcel-to-fesod refactoring work is underway for API standardization.
🚀Get running
git clone https://github.com/apache/fesod.git
cd fesod
./mvnw clean install
Uses Maven wrapper (.mvn/wrapper/) for consistent JDK and dependency resolution. Tests execute as part of install phase.
Daily commands:
./mvnw clean test # Run all test suites
./mvnw clean install # Build and install to local Maven repo
./mvnw -pl fesod-sheet clean test # Test only core sheet module
No dedicated dev server; this is a library consumed by applications, not a standalone service.
🗺️Map of the codebase
- fesod-bom/pom.xml: Defines Bill of Materials (BOM) dependency versions for all consumers; critical for version consistency across modules
- .github/workflows/ci.yml: Main CI pipeline—execute this locally via
actif modifying core parsing logic to catch integration failures early - .github/workflows/license-check.yml: Enforces Apache license headers; PRs fail if you forget license boilerplate in new Java files
- fesod-common/pom.xml: Shared dependencies and configuration for all modules; changes here affect the entire project
- .github/skills/fastexcel-to-fesod/SKILL.md: Documents migration path from fastexcel library; useful for understanding API evolution and refactoring priorities
🛠️How to make changes
New features in core parsing: src/main/java under fesod-sheet module. Common utilities: fesod-common/src/. Add tests in src/test/java alongside production code. Follow Apache license headers visible in pom.xml. Workflows validate license headers via .github/workflows/license-check.yml.
🪤Traps & gotchas
Apache license headers required in all new Java files (checked by license-check.yml—PRs fail silently without them). Maven wrapper enforces JDK version via .mvn/wrapper/maven-wrapper.properties; mismatched local JDK may cause 'JAVA_HOME' errors. Nightly.yml suggests experimental features tested separately; main ci.yml is the stable gate. DISCLAIMER file hints at ASF compliance requirements—fork carefully if modifying distribution.
💡Concepts to learn
- Streaming/Incremental Parsing — Core to fesod's entire value proposition—understanding why DOM parsing (SAX vs. DOM) matters explains why fesod avoids loading entire files into memory like POI does
- SparseBitSet (from SparseBitSet dependency) — Used for memory-efficient tracking of row/cell presence in large sparse spreadsheets; reduces heap overhead vs. dense arrays
- Apache POI OOXML-Lite (poi-ooxml-lite.txt dependency) — Stripped-down POI dependency used by fesod instead of full POI to minimize heap footprint; understanding what's removed is key to fesod's optimization strategy
- Bill of Materials (BOM) Pattern — fesod-bom enforces version consistency across modules via Maven dependencyManagement; critical for multi-module projects to avoid version conflicts
- Memory-Mapped I/O — Potential optimization for very large file handling on disk; relevant if fesod adds file-backed buffering instead of heap-only streaming
- Apache License Header Compliance — ASF projects enforce license headers via ci.yml; PRs silently fail without them, causing frustration for new contributors
- Row/Cell Streaming Contracts (SAX-like event model) — fesod likely exposes callbacks or iterators for row/cell events instead of buffering; understanding the stream consumption contract is essential for API usage and extension
🔗Related repos
apache/poi— Core upstream dependency for Excel parsing; fesod optimizes POI's DOM model with streaming patternseasyexcel/easyexcel— Similar Java library for large file handling; direct competitor using different streaming architectureSheetJS/sheetjs— JavaScript alternative for browser/Node.js spreadsheet parsing; users evaluating cross-platform solutions check bothpython-excel/openpyxl— Python ecosystem equivalent; relevant for teams migrating Java ETL pipelines to Python data science stacksapache/commons-csv— Upstream CSV parsing library used by fesod; understanding commons-csv underpins fesod's CSV streaming implementation
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive unit tests for fesod-common utility classes
The fesod-common module contains 9 utility classes (BooleanUtils, IntUtils, IoUtils, ListUtils, MapUtils, MemberUtils, PositionUtils, StringUtils, ValidateUtils) in src/main/java/org/apache/fesod/common/util/, but the test directory structure appears minimal. These utilities are foundational for the entire project and lack comprehensive test coverage, which is critical for a spreadsheet processing library where correctness is paramount.
- [ ] Create test classes for each utility in fesod-common/src/test/java/org/apache/fesod/common/util/ matching the main classes
- [ ] Add edge case tests for StringUtils (null, empty, special characters handling)
- [ ] Add range/boundary tests for IntUtils and PositionUtils (critical for cell position calculations)
- [ ] Add tests for MapUtils and ListUtils with null/empty collections
- [ ] Ensure all tests pass in CI and achieve >80% code coverage for the common module
Implement fuzz testing for spreadsheet parsing and cell value handling
The repo has a fuzz-tests workflow (.github/workflows/fuzz-tests.yml) but no visible fuzzing test suite in the file structure. Given that fesod handles large spreadsheet files and processes cell values, fuzzing is critical to prevent crashes from malformed Excel files, corrupted data streams, or edge case numeric/string values that could cause OOM or parsing errors.
- [ ] Create a new fesod-fuzz module or fuzz-tests directory with fuzzing harnesses
- [ ] Implement fuzzing targets for the core sheet reading/parsing logic (likely in fesod-sheet module)
- [ ] Add fuzzing for cell value parsing, especially numeric and formula handling
- [ ] Configure the existing fuzz-tests.yml workflow to execute the fuzz tests
- [ ] Document fuzzing setup in CONTRIBUTING.md with reproduction steps for any bugs found
Add integration tests for the BOM (Bill of Materials) module with real spreadsheet files
The fesod-bom module manages dependencies for fesod-common and fesod-sheet but lacks visible integration tests that verify the actual dependency resolution and artifact packaging work correctly end-to-end. This is important for downstream users to ensure consistent, reproducible builds. Additionally, there are no visible test files for verifying the library works correctly with various Excel formats (xlsx, xls, csv).
- [ ] Create fesod-bom/src/test with integration tests that verify dependency resolution
- [ ] Add test files (sample .xlsx, .xls, .csv files) to fesod-bom/src/test/resources/
- [ ] Write integration tests that load sample spreadsheets using the BOM-defined dependencies
- [ ] Test with files of varying sizes (small, medium, large) to validate the OOM prevention claim
- [ ] Add a CI job in .github/workflows/ci.yml to run these integration tests against different Java versions
🌿Good first issues
- Add streaming CSV parser tests in fesod-sheet/src/test/java/—current test coverage focuses on Excel; contributors can implement CSV edge case tests (empty rows, quoted delimiters, BOM handling) using existing test infrastructure
- Document streaming memory footprint benchmarks in README.md with concrete examples (e.g., '100MB file uses 50MB heap vs. 500MB with POI')—currently claims 'OOM-free' without proof; add JMH micro-benchmarks under src/test/java/benchmarks/
- Create a fesod-cli module under .github/skills/ to expose fesod-sheet as a command-line tool (e.g.,
fesod convert input.xlsx output.csv)—mirrors the fastexcel-to-fesod skill pattern and provides an entry point for users not comfortable with Java APIs
⭐Top contributors
Click to expand
Top contributors
- @psxjoy — 25 commits
- @bengbengbalabalabeng — 21 commits
- @GOODBOY008 — 13 commits
- @delei — 12 commits
- @dependabot[bot] — 10 commits
📝Recent commits
Click to expand
Recent commits
c44cc05— fix: Correct off-by-one error in CsvRow#getCell(int) (#677) (bengbengbalabalabeng)f673fcb— fix: correct CSV physical cell count (#907) (skytin1004)b340888— docs: add blog for new committer (#908) (bengbengbalabalabeng)7606303— test: cover field-level converter precedence (#906) (skytin1004)52d4b1f— docs: fix Maven Central artifact link (#904) (skytin1004)bd22a31— docs: fix fastexcel 1.1.0 sha link (#903) (skytin1004)14704d6— docs: add Alibaba attribution to 5 legacy-derived example files (#897) (skytin1004)3986b8b— chore: enforce frozen lockfile in CI (#895) (psxjoy)b00281a— docs: standardize and enhance status badges (#896) (bengbengbalabalabeng)bf3d23b— fix: custom converters not being inherited from parent write holders (#893) (utafrali)
🔒Security observations
The Apache Fesod project demonstrates reasonable security practices as an Apache Software Foundation project with proper licensing and governance. However, there are several areas for improvement: (1) The primary concern is incomplete POM configuration that needs verification; (2) Spreadsheet processing libraries require hardened XML parsing configurations to prevent XXE attacks; (3) A formal security policy is missing; (4) Dependency versions should be more explicitly managed. No hardcoded secrets, SQL injection risks, or XSS vulnerabilities were detected in the visible codebase. The project appears well-structured with appropriate CI/CD workflows including CodeQL scanning. Recommended immediate actions: complete the POM configuration, add a SECURITY.md file, verify XXE protections in spreadsheet parsing, and implement dependency pinning for POI libraries.
- Medium · Incomplete POM Configuration in fesod-bom —
fesod-bom/pom.xml. The fesod-bom/pom.xml file appears truncated with an incomplete plugin definition. The build section contains a plugin declaration that is cut off at 'groupId>org.codehaus.mojo</groupId>' without closing tags or configuration. This could indicate misconfiguration or incomplete build setup that might lead to unintended build behavior. Fix: Complete the plugin configuration with proper closing tags and verify all plugins are correctly defined. Ensure the build configuration is valid and tested. - Low · Missing SECURITY.md or Security Policy —
Repository root. The repository does not appear to have a SECURITY.md or security policy document visible in the file structure. This makes it difficult for security researchers to report vulnerabilities responsibly. Fix: Create a SECURITY.md file documenting responsible disclosure procedures and security contact information, as recommended by GitHub and the Apache Software Foundation. - Low · Broad Dependency Management Without Version Pinning —
fesod-bom/pom.xml. The fesod-bom uses ${project.version} for internal dependencies but the parent POM configuration is not shown. Without explicit version constraints or ranges, transitive dependencies could potentially introduce security vulnerabilities if updated automatically. Fix: Review and explicitly pin critical transitive dependencies to known-secure versions. Consider using dependency lock files or explicit version ranges to prevent automatic upgrades to vulnerable versions. - Low · Potential XXE Vulnerability in Spreadsheet Processing —
fesod-sheet module (inferred from project description). The project processes spreadsheets (likely Excel/XLSX files) using Apache POI. XML External Entity (XXE) attacks are a known risk when parsing XML-based spreadsheet formats. If untrusted spreadsheet files are processed without proper XXE protection, this could be exploited. Fix: Ensure all XML parsers used by Apache POI and related libraries are configured to disable external entity resolution and DTD processing. Use secure XML parsing configurations (e.g., XMLConstants.ACCESS_EXTERNAL_DTD = ""). - Low · Dependency on Apache POI with Known CVE History —
fesod dependencies (derived from dist/licenses/LICENSE-poi*.txt). The project depends on Apache POI libraries (poi, poi-ooxml, poi-ooxml-lite) which have had security vulnerabilities in the past. Current version information is not visible in the provided snippet. Fix: Regularly scan and update Apache POI dependencies to the latest patched versions. Use OWASP Dependency-Check or similar tools in CI/CD to detect known vulnerabilities.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.