Onboarding: Stirling-Tools/Stirling-PDF

Item: Stirling-Tools/Stirling-PDF
Rating: 3
Author: RepoPilot

Generated by RepoPilot · 2026-05-05 · Source

Verdict

WAIT — Mixed signals — read the receipts

Last commit today
5 active contributors
Distributed ownership (top contributor 35%)
Other licensed
CI configured
Tests present
⚠ Small team — 5 top contributors
⚠ Non-standard license (Other) — review terms

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

Stirling-PDF is a self-hosted, open-source PDF processing platform offering 50+ tools (merge, split, OCR, redact, sign, convert, compress) that run entirely on your own infrastructure without sending documents to third-party services. The backend is Java/Spring Boot using Apache PDFBox as the core PDF engine, paired with a TypeScript-heavy frontend, and is deployable via Docker or as a desktop app. It exposes a REST API documented via SpringDoc/OpenAPI for automation and programmatic access. The repo is a Gradle multi-module monorepo: Java Spring Boot modules live under src/main/java, the TypeScript frontend is a large separate layer (7.4MB TypeScript, 312KB CSS), and Python scripts (692KB) handle auxiliary tasks like translation sync. CI/CD orchestration is under .github/workflows/, with devcontainer support in .devcontainer/ for reproducible dev environments.

Who it's for

DevOps engineers and self-hosters who need a private, GDPR-compliant PDF processing service; enterprise teams requiring SSO and audit logging for document workflows; and Java/TypeScript developers who want to contribute to or extend an actively maintained open-source PDF toolchain.

Maturity & risk

The project is the #1 PDF application on GitHub by stars and has a comprehensive CI/CD setup across 20+ GitHub Actions workflows covering builds, e2e tests (Gherkin/BDD), Docker compose tests, OpenAPI checks, and license verification. It targets enterprise use cases with SSO and audit support, and has active multi-contributor development with recent workflow additions like 'ai-engine.yml' and 'build-enterprise.yml'. Verdict: production-ready and actively developed.

The dependency surface is large — Apache PDFBox, Spring Boot, SnakeYAML, owasp-java-html-sanitizer, Flexmark, junrar, and PostHog analytics are all pulled in via Gradle, increasing supply chain risk. The project has multiple maintainers (evidenced by .github/CODEOWNERS and config/repo_devs.json) reducing single-maintainer risk, but the sheer number of PDF processing tools means edge-case bugs in lesser-used features (e.g. CBR/RAR comic support via junrar) may go untested. The Gherkin e2e test suite in the repo provides reasonable regression coverage but may not cover all 50+ tools comprehensively.

Active areas of work

Active work includes an AI engine integration (.github/workflows/ai-engine.yml, .github/workflows/ai_pr_title_review.yml), an enterprise build pipeline (.github/workflows/build-enterprise.yml), automated PR demo deployments (.github/workflows/PR-Auto-Deploy-V2.yml), and AUR package publishing for Arch Linux desktop distribution (.github/aur/). The presence of both a 'stirling-pdf-desktop' and 'stirling-pdf-server-bin' AUR package suggests ongoing work on native desktop packaging.

Get running

git clone https://github.com/Stirling-Tools/Stirling-PDF.git && cd Stirling-PDF

Install Task runner first: https://taskfile.dev/installation/

task install

Or run directly via Docker:

docker run -p 8080:8080 docker.stirlingpdf.com/stirlingtools/stirling-pdf

Then open http://localhost:8080

Daily commands:

Using Task (recommended):

task install # installs dependencies task run # starts the dev server

Using Gradle directly (bootRun is disabled for the root module — run from the correct submodule):

./gradlew :stirling-pdf:bootRun

Using Docker Compose:

docker compose up

Map of the codebase

app/common/src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java — Central configuration class that registers and controls which PDF tool endpoints are enabled/disabled, making it the gateway for all feature availability in the application.
CLAUDE.md — AI agent instructions file that documents build commands, code style conventions, module structure, and contribution rules — essential reading before making any changes.
CONTRIBUTING.md — Defines the contribution workflow, branching strategy, coding standards, and PR requirements that all contributors must follow.
ADDING_TOOLS.md — Step-by-step guide for adding new PDF tools to Stirling-PDF, covering controller, service, UI template, and i18n wiring — the primary extension pattern document.
app/common/build.gradle — Shared Gradle build configuration for the common module defining dependencies (PDFBox, iText, Spring Boot) used across all PDF processing services.
Taskfile.yml — Top-level task runner defining build, test, lint, and Docker targets; the authoritative entry point for all developer workflows across the monorepo.
app/common/src/main/java/org/apache/pdfbox/examples/util/DeletingRandomAccessFile.java — Custom PDFBox utility that auto-deletes temp files after PDF processing, a critical security/cleanup mechanism for all PDF operations handling user uploads.

How to make changes

Add a new PDF processing tool

Read ADDING_TOOLS.md fully to understand the complete wiring checklist before writing any code. (ADDING_TOOLS.md)
Create a new Spring MVC @RestController under app/core (or the appropriate sub-module) with a @PostMapping endpoint accepting MultipartFile and tool-specific request parameters, following existing controller conventions. (app/common/src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java)
Register the new endpoint in EndpointConfiguration so it can be enabled/disabled via application properties and respects the feature-flag system. (app/common/src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java)
Add the new task definition to the relevant .taskfiles entry so the tool is accessible via Taskfile commands during development. (Taskfile.yml)
Ensure the CI build pipeline picks up and tests the new endpoint by verifying backend-build.yml triggers on the relevant module paths. (.github/workflows/backend-build.yml)

Add a new CI/CD workflow

Create a new YAML file in .github/workflows/ following the naming convention (kebab-case, descriptive action name). (.github/workflows/build.yml)
Reference the reusable setup-bot action for any workflow needing bot authentication or repo write permissions. (.github/actions/setup-bot/action.yml)
Add the new workflow to the labeler config so PRs touching it are auto-labeled correctly. (.github/labeler-config-srvaroa.yml)
Update the dependency review config if the workflow introduces new third-party actions that require license approval. (.github/config/dependency-review-config.yml)

Add a new Gradle module (sub-application)

Create the new module directory under app/ with the standard src/main/java layout mirroring app/common. (app/common/build.gradle)
Define the module's build.gradle, declaring dependencies on app/common and any module-specific libraries. (app/common/build.gradle)
Add new task targets for build, test, and run in the relevant .taskfiles YAML so the module is accessible from the top-level Taskfile. (.taskfiles/backend.yml)
Update the top-level Taskfile.yml to include the new module in the default build and test dependency chains. (Taskfile.yml)

Update or add a Docker image variant

Modify the push-docker.yml workflow to add a new matrix entry or build step for the new image variant (e.g. enterprise, ultra-lite). (.github/workflows/push-docker.yml)
Update the docker-compose-tests workflow to include a smoke-test stage for the new image variant. (.github/workflows/docker-compose-tests.yml)
Update push-docker-base.yml if the new variant requires a new base image layer to be published separately. (.github/workflows/push-docker-base.yml)
Update the .dockerignore if the new variant needs to include previously excluded build artifacts. (.dockerignore)

Why these technologies

Spring Boot (Java) — Provides a mature, well-understood web framework with embedded Tomcat, auto-configuration, and a large ecosystem — critical for a security-sensitive app handling user documents at scale.
Apache PDFBox — Apache-licensed, pure-Java PDF library with broad format support and active maintenance; allows redistribution without commercial licensing fees unlike iText AGPL.
iText (AGPL) — Used for advanced PDF operations (e.g. signing, AcroForm manipulation) where PDFBox lacks capability; AGPL is acceptable because Stirling-PDF is itself open source.
Gradle multi-module build — Enables separation of common utilities, core PDF features, and optional enterprise features into distinct modules with explicit dependency boundaries and independent versioning.
Docker + Docker Compose — Ensures reproducible deployment across all environments from personal self-hosting to enterprise servers; also drives the test matrix for CI validation.
Tauri (Rust + WebView) — Enables lightweight native desktop packaging without shipping a full browser — Tauri apps are significantly smaller than Electron equivalents, important for a downloadable tool.
Thymeleaf HTML templates — Server-side rendered UI keeps the frontend simple and dependency-light, avoiding a full SPA

Traps & gotchas

The root Gradle bootRun is explicitly disabled (bootRun { enabled = false }) — you cannot run the app from the root module directly and must target the correct submodule. google-java-format 1.28.0 (used by Spotless) crashes on JDK 24/25 due to a Guava conflict — the build config suppresses this lint but you must use JDK 21 or earlier for a smooth Spotless run. The devcontainer (.devcontainer/init-setup.sh) runs initialization scripts that must complete before the environment is usable. PostHog analytics SDK is bundled — check configuration if running in an air-gapped environment.

Architecture

Concepts to learn

Apache PDFBox document model — PDFBox's PDDocument, PDPage, and operator framework are the primitives used by every PDF tool in the Java backend — misunderstanding the object model causes memory leaks with unclosed documents.
Spring Boot aspect-oriented programming (AOP) — spring-boot-starter-aspectj is a declared dependency — cross-cutting concerns like audit logging and usage tracking are likely implemented as aspects rather than in each controller.
OpenAPI 3 / SpringDoc code-first spec generation — The REST API is documented via springdoc-openapi-starter-webmvc-ui with a CI check (check-openapi.yml) that fails if the generated spec drifts — contributors must annotate controllers correctly.
OWASP HTML sanitization — owasp-java-html-sanitizer is used to sanitize HTML inputs before they touch PDF content, preventing XSS/injection via malicious PDF metadata or HTML-to-PDF conversions.
Gherkin / BDD (Behavior-Driven Development) testing — 152KB of Gherkin feature files define the e2e test suite — new tools should have corresponding .feature files or they won't be covered by the e2e-stubbed.yml and e2e-live.yml CI pipelines.
PDF/A (archival PDF standard) — The pdfbox preflight module is a declared dependency, indicating Stirling-PDF validates or converts to PDF/A — contributors working on conversion tools must understand PDF/A conformance levels.
XMP metadata (Extensible Metadata Platform) — pdfbox-xmpbox is a declared dependency — PDF metadata editing features read/write XMP packets embedded in PDF files, a non-obvious Adobe standard separate from the PDF info dictionary.

Related repos

gotenberg/gotenberg — Alternative self-hosted PDF generation/conversion microservice using Docker and headless Chrome — solves similar 'no third-party PDF service' use case.
paperless-ngx/paperless-ngx — Companion document management system that many Stirling-PDF users pair it with for a full self-hosted document workflow.
dangerzone/dangerzone — Alternative self-hosted tool focused specifically on sanitizing potentially malicious PDFs — overlapping security-conscious user base.
apache/pdfbox — The core PDF processing engine used throughout Stirling-PDF's Java backend — understanding PDFBox internals is essential for contributing PDF tool logic.
jlesage/docker-firefox — Predecessor pattern inspiration — single-app Docker containers with web UI, the deployment model Stirling-PDF follows for its server edition.

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for PDF metadata extraction and manipulation utilities

The repo uses Apache PDFBox (org.apache.pdfbox:pdfbox) and a metadata extractor (com.drewnoakes:metadata-extractor) for PDF processing, but the partial file structure shows no dedicated test files for these utility paths. Adding unit tests for the core PDF manipulation logic (e.g. XMP metadata reading/writing via xmpbox, preflight validation via preflight) would catch regressions when PDFBox is upgraded (it's pinned via $pdfboxVersion), and would complement the existing e2e workflows (.github/workflows/e2e-live.yml, .github/workflows/e2e-stubbed.yml) with fast, isolated unit coverage.

[ ] Identify the Java service/util classes under src/main/java/ that wrap PDFBox calls for metadata extraction and XMP manipulation
[ ] Create corresponding test classes under src/test/java/ mirroring the package structure, using JUnit 5 and sample PDF fixtures
[ ] Write tests covering: reading existing XMP metadata, writing/overwriting XMP fields, stripping metadata, and handling malformed/encrypted PDFs gracefully
[ ] Add tests for the preflight PDF/A validation path, asserting correct conformance level detection on known-good and known-bad sample PDFs
[ ] Verify tests run in the existing backend-build.yml workflow (./gradlew test) and add a coverage report step if one is not already present

Add a dedicated Spotless/Checkstyle enforcement CI check for YAML and Gradle files

The build.gradle already configures Spotless for Java, YAML (**/*.yml, **/*.yaml), and Gradle files with trailing whitespace, tab-to-space, and newline rules. However, looking at .github/workflows/, there is a pre_commit.yml but no standalone workflow that runs ./gradlew spotlessCheck specifically for the YAML and Gradle targets in CI on every PR. This means formatting violations in workflow files or build scripts can slip through undetected. A dedicated lightweight workflow would surface these failures early without running the full backend build.

[ ] Create .github/workflows/spotless-check.yml that triggers on pull_request targeting changed **/*.yml, **/*.yaml, and **/*.gradle files using path filters
[ ] Add a job that checks out the repo, sets up JDK (matching the version used in backend-build.yml), and runs ./gradlew spotlessCheck
[ ] Scope the Gradle task with --continue so all format violations are reported at once rather than failing on the first file
[ ] Add a comment to build.gradle under the spotless block noting that CI enforcement lives in spotless-check.yml so future contributors know where to look
[ ] Test the workflow by intentionally introducing a trailing-whitespace violation in a .yml file and confirming the check fails, then revert

Extract and document the EML/MSG email-to-PDF conversion feature with dedicated integration tests

The dependency snippet includes a comment // Simple Java Mail for EML/MSG parsing (replace — this is a truncated comment indicating an in-progress or recently changed dependency for EML/MSG file support. This feature (converting email files to PDF) is non-trivial and likely lives in one or more service classes, but there is no mention of it in the README snippet, no visible test file for it in the file structure, and the truncated comment suggests it may be under active churn. Documenting and testing it now prevents silent breakage and helps onboard contributors.

[ ] Locate the EML/MSG parsing service class(es) under src/main/java/ and complete/clarify the truncated Gradle dependency comment with the correct library name and version
[ ] Create src/test/java/.../EmlToPdfServiceTest.java and `MsgToPdf

Good first issues

Add Gherkin e2e test scenarios for the CBR/RAR comic conversion tool (junrar integration) — visible gap since exotic formats are unlikely to be covered in the existing .feature files. 2. Add OpenAPI response schema examples for the PDF split/merge endpoints — .github/workflows/check-openapi.yml validates the spec but example payloads are often missing in SpringDoc annotations. 3. Improve the .github/scripts/check_language_toml.py script to report which specific translation keys are missing per language, rather than just failing — currently useful only for CI, not for translators.

Top contributors

@dependabot[bot] — 30 commits
@Frooodle — 29 commits
@jbrunton96 — 16 commits
@Ludy87 — 7 commits
@ConnorYoh — 4 commits

Recent commits

4ab7d3b — build(deps): bump the mui group across 1 directory with 2 updates (#6301) (dependabot[bot])
d8519a6 — build(deps): bump com.google.guava:guava from 33.5.0-jre to 33.6.0-jre (#6283) (dependabot[bot])
34c9e9b — build(deps): bump actions/setup-node from 6.3.0 to 6.4.0 (#6258) (dependabot[bot])
4663a9a — build(deps): bump org.springdoc:springdoc-openapi-starter-webmvc-ui from 3.0.2 to 3.0.3 in /app/common (#6286) (dependabot[bot])
f89f7d9 — build(deps): bump actions/upload-artifact from 7.0.0 to 7.0.1 (#6297) (dependabot[bot])
84e30cd — build(deps): bump actions/github-script from 7.1.0 to 9.0.0 (#6298) (dependabot[bot])
69236a8 — build(deps): bump gradle/actions from 5.0.1 to 6.1.0 (#6294) (dependabot[bot])
b66e12d — build(deps): bump eclipse-temurin from a051234 to b27ca47 in /docker/embedded (#6293) (dependabot[bot])
3fe8adc — Switch key areas to lazily import to improve Vite chunk size (#6278) (jbrunton96)
51f5345 — Inform AI engine which endpoints are disabled on the backend (#6251) (jbrunton96)

Security observations

High · Potentially Outdated or Vulnerable Transitive Dependencies via PDFBox — build.gradle / dependencies block. The project uses Apache PDFBox (version referenced as $pdfboxVersion, not pinned explicitly in the visible snippet). PDFBox has historically had vulnerabilities related to malformed PDF parsing, XXE, and denial-of-service. Without seeing the exact pinned version, there is a risk of running an unpatched version. Additionally, the 'preflight' module increases attack surface by parsing and validating potentially malicious PDFs. Fix: Ensure PDFBox is pinned to the latest stable patched version (3.0.x or higher). Regularly run 'gradle dependencyCheckAnalyze' (OWASP Dependency-Check) or similar SCA tools in CI to catch CVEs in transitive dependencies.
High · OWASP HTML Sanitizer Version May Not Cover All XSS Vectors — build.gradle — com.googlecode.owasp-java-html-sanitizer. The project uses 'owasp-java-html-sanitizer:20260313.1' for HTML sanitization. While this is a generally trusted library, if user-supplied HTML content is processed and rendered in PDFs or returned to the browser without strict policy configuration, gaps in the sanitizer policy could allow XSS or HTML injection. The risk depends on how strictly sanitization policies are configured in application code, which is not visible here. Fix: Audit all usages of the HTML sanitizer to ensure a restrictive whitelist policy is applied. Avoid passing sanitized HTML directly into PDF rendering engines without additional validation. Ensure Content-Security-Policy headers are set on all HTTP responses.
High · Arbitrary File Processing Risk from PDF and Archive Uploads — build.gradle — com.github.junrar:junrar, com.drewnoakes:metadata-extractor. The application processes PDFs, RAR archives (via junrar 7.5.10), and images (via metadata-extractor 2.20.0). Parsing complex binary formats from untrusted user uploads is a well-known attack vector (e.g., zip bombs, malformed archives causing DoS, path traversal in archive extraction, metadata-based XXE). junrar 7.5.10 and metadata-extractor 2.20.0 should be checked against known CVEs. Fix: Enforce strict file size and type limits before passing to parsers. Run file content validation (magic bytes). Ensure archive extraction enforces path traversal protection (check for '../' in entry names). Pin these dependencies and monitor NVD/GitHub Advisory for CVEs. Consider sandboxing file processing operations.
High · Markdown-to-HTML Conversion May Introduce XSS — build.gradle — com.vladsch.flexmark:flexmark-html2md-converter. The dependency 'flexmark-html2md-converter:0.64.8' performs HTML-to-Markdown and Markdown-to-HTML conversions. If Markdown content from user input is converted to HTML and rendered client-side or embedded in PDFs without strict sanitization, this can introduce stored or reflected XSS. flexmark by default can render raw HTML embedded in Markdown. Fix: Disable raw HTML rendering in flexmark configuration (set 'HtmlRenderer.ESCAPE_HTML' or disable HTML extension). Pass all flexmark output through the OWASP HTML sanitizer before any rendering. Audit all code paths that invoke flexmark.
Medium · PostHog Analytics Dependency Introduces Data Exfiltration Risk — build.gradle — com.posthog.java:posthog. The application includes 'com.posthog.java:posthog:1.2.0' for analytics/telemetry. This library may transmit user behavioral data or document metadata to external PostHog servers. For a self-hosted privacy-sensitive PDF application, this is a significant concern, especially for enterprise or regulated deployments. The version 1.2.0 should be checked for known vulnerabilities. Fix: Ensure PostHog telemetry is opt-in only with clear user disclosure. Provide a configuration flag to fully disable telemetry. Audit what data is sent to PostHog and ensure no PII or document content is transmitted. Consider allowing users to point to a self-hosted PostHog instance.
Medium · SnakeYAML Engine Usage May Allow YAML Injection — undefined. The dependency 'org.snakeyaml:snakeyaml-engine: Fix: undefined

LLM-derived; treat as a starting point, not a security audit.

Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Stirling-Tools/Stirling-PDF

Embed this verdict

Onboarding doc