RepoPilotOpen in app →

dotnet/Open-XML-SDK

Open XML SDK by Microsoft

Healthy

Healthy across the board

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 5d ago
  • 11 active contributors
  • Distributed ownership (top contributor 45% of recent commits)
Show 3 more →
  • MIT licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/dotnet/open-xml-sdk)](https://repopilot.app/r/dotnet/open-xml-sdk)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/dotnet/open-xml-sdk on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: dotnet/Open-XML-SDK

Generated by RepoPilot · 2026-05-10 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/dotnet/Open-XML-SDK shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit 5d ago
  • 11 active contributors
  • Distributed ownership (top contributor 45% of recent commits)
  • MIT licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live dotnet/Open-XML-SDK repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/dotnet/Open-XML-SDK.

What it runs against: a local clone of dotnet/Open-XML-SDK — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in dotnet/Open-XML-SDK | Confirms the artifact applies here, not a fork | | 2 | License is still MIT | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 35 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>dotnet/Open-XML-SDK</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of dotnet/Open-XML-SDK. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/dotnet/Open-XML-SDK.git
#   cd Open-XML-SDK
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of dotnet/Open-XML-SDK and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "dotnet/Open-XML-SDK(\\.git)?\\b" \\
  && ok "origin remote is dotnet/Open-XML-SDK" \\
  || miss "origin remote is not dotnet/Open-XML-SDK (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(MIT)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"MIT\"" package.json 2>/dev/null) \\
  && ok "license is MIT" \\
  || miss "license drift — was MIT at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "Directory.Build.props" \\
  && ok "Directory.Build.props" \\
  || miss "missing critical file: Directory.Build.props"
test -f "Directory.Packages.props" \\
  && ok "Directory.Packages.props" \\
  || miss "missing critical file: Directory.Packages.props"
test -f "data/namespaces.json" \\
  && ok "data/namespaces.json" \\
  || miss "missing critical file: data/namespaces.json"
test -f "data/OpenXmlData.targets" \\
  && ok "data/OpenXmlData.targets" \\
  || miss "missing critical file: data/OpenXmlData.targets"
test -f "DocumentFormat.OpenXml.snk" \\
  && ok "DocumentFormat.OpenXml.snk" \\
  || miss "missing critical file: DocumentFormat.OpenXml.snk"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 35 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~5d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/dotnet/Open-XML-SDK"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

The Open XML SDK is a C# framework for programmatic reading, writing, and modifying Microsoft Office documents (Word, Excel, PowerPoint) by directly manipulating their underlying OPC (Open Packaging Convention) package structure and Open XML markup. It provides both low-level APIs for package operations and strongly-typed classes that map to ISO 29500 Open XML schemas, enabling high-performance document generation, modification, and bulk processing without requiring Office to be installed. Monolithic structured solution (Open-XML-SDK.slnx) with a core framework in DocumentFormat.OpenXml.snk that generates strongly-typed classes from XML schema definitions stored in data/ (data/namespaces.json, data/parts/*.json). The SDK includes OPC/packaging layer, strongly-typed element/attribute classes, LINQ-to-XML integration, and part-type definitions (ChartPart, WorksheetPart, etc.) all compiled into a single NuGet package (DocumentFormat.OpenXml and DocumentFormat.OpenXml.Framework).

👥Who it's for

Enterprise .NET developers building document automation systems, reporting engines, or bulk document processing pipelines; software vendors integrating Office document creation into their products; developers who need programmatic control over Office file internals without relying on COM interop or Office installation.

🌱Maturity & risk

Highly mature and production-ready. The SDK is an official Microsoft library (OfficeDev org) with strong version history (v3.0.0 released with breaking changes indicating active evolution), comprehensive CI/CD pipelines (build, code-coverage, CodeQL, AOT compatibility), and extensive GitHub Actions workflows. It has significant NuGet download volume and is actively maintained with recent breaking changes demonstrating ongoing standards compliance updates.

Low risk for stable features; moderate risk for rapid adoption of new versions. The jump to v3.0.0 introduced breaking changes requiring code updates and recompilation. The project depends heavily on correct implementation of complex ISO 29500 standards—bugs in schema mapping can silently corrupt documents. No visible single-maintainer risk (Microsoft-backed), but the fast-moving Office standards landscape means new Office features may not be immediately reflected in schema definitions.

Active areas of work

Active development targeting TFM (target framework) updates via GitHub agents (tfm-updater.agent.md), automated changelog generation, and schema version alignment. Workflows show focus on AOT (Ahead-of-Time) compilation compatibility and benchmarking. The project recently migrated CI feed URLs (noted in README for builds after 2 April 2024). v3.0.0 milestone tracked multiple breaking changes, suggesting ongoing API modernization.

🚀Get running

git clone https://github.com/OfficeDev/Open-XML-SDK.git
cd Open-XML-SDK
dotnet build Open-XML-SDK.slnx
dotnet test

Daily commands:

dotnet build Open-XML-SDK.slnx
# Run unit tests
dotnet test
# Run benchmarks (if benchmark project exists)
dotnet run --project ./benchmarks -c Release

🗺️Map of the codebase

  • Directory.Build.props — Defines shared project properties, versioning, and assembly configuration for all SDK projects.
  • Directory.Packages.props — Central package management configuration specifying all NuGet dependencies across the SDK.
  • data/namespaces.json — Defines OpenXML namespace mappings essential for XML schema validation and element generation.
  • data/OpenXmlData.targets — Build targets that code-generate OpenXML schema classes from JSON schema definitions.
  • DocumentFormat.OpenXml.snk — Strong name key file required for assembly signing and NuGet package publication.
  • .github/workflows/build.yml — Primary CI/CD pipeline defining build, test, and package generation for all branches.
  • CHANGELOG.md — Historical record of breaking changes in v3.0.0 and migration guidance for users.

🧩Components & responsibilities

  • Code Generator (data/OpenXmlData.targets + schemas) (MSBuild, JSON schema, Roslyn (C# code generation)) — Synthesizes C# element, attribute, and part classes from JSON schema definitions; ensures consistency with ECMA-376 specification
    • Failure mode: Schema corruption or generator bugs produce non-compiling or semantically invalid classes; breaks downstream SDK users
  • OPC Package Layer (generated from OpenXmlPart hierarchy) (System.IO.Packaging, System.Xml, ZIP algorithms) — Manages ZIP-based container I/O, relationship XML parsing, and part lifecycle (create, delete, clone)
    • Failure mode: Corrupted package structure causes loss of document integrity; malformed relationships prevent Office

🛠️How to make changes

Add support for a new OpenXML part type

  1. Create a JSON schema file describing the new part's structure, relationships, and metadata (data/parts/NewCustomPart.json)
  2. Register the namespace URI in the centralized mapping if introducing a new XML schema (data/namespaces.json)
  3. Run the build targets to auto-generate C# classes from the schema (data/OpenXmlData.targets)
  4. Add integration tests validating part creation, serialization, and relationship handling (tests/[NewCustomPart]Tests.cs)
  5. Document the new part in CHANGELOG.md for the next release (CHANGELOG.md)

Update OpenXML schema compatibility (namespace or element changes)

  1. Update the relevant part JSON schema in data/parts/ to reflect new elements, attributes, or cardinality rules (data/parts/[AffectedPart].json)
  2. Add or modify namespace URI mappings if XML namespace changes are needed (data/namespaces.json)
  3. Rebuild the project to trigger code generation and update generated classes (Directory.Build.props)
  4. Create tests validating backward compatibility and new functionality (tests/[FormatVersion]CompatibilityTests.cs)
  5. Update CHANGELOG.md indicating breaking changes (if any) and migration guidance (CHANGELOG.md)

Create a new Office format validator or utility

  1. Create a new utility class in src/DocumentFormat.OpenXml/ following existing conventions (src/DocumentFormat.OpenXml/NewValidatorUtility.cs)
  2. Reference the appropriate part schemas and namespace definitions (data/namespaces.json)
  3. Add comprehensive unit tests covering edge cases and Office format compliance (tests/DocumentFormat.OpenXml/NewValidatorUtilityTests.cs)
  4. Update build configuration if the utility adds new dependencies (Directory.Packages.props)

🔧Why these technologies

  • JSON Schema Definitions (data/parts/*.json) — Enables single-source-of-truth for OpenXML element structures, supporting code generation and reducing manual maintenance of 600+ auto-generated classes
  • MSBuild Targets (data/OpenXmlData.targets) — Integrates code generation seamlessly into the standard .NET build pipeline, ensuring generated code stays synchronized with schema changes
  • Strong Name Signing (DocumentFormat.OpenXml.snk) — Required for enterprise NuGet distribution and GAC installation, ensuring assembly identity and preventing tampering
  • .NET Multi-Targeting (Directory.Build.props) — Supports legacy .NET Framework and modern .NET Core consumers, maximizing SDK adoption across enterprise and greenfield projects
  • GitHub Actions Workflows — Provides CI/CD automation for building, testing, code coverage, and releasing without external dependencies

⚖️Trade-offs already made

  • Code generation over hand-written classes

    • Why: OpenXML Standard defines 600+ element types; manual maintenance would be prohibitive and error-prone
    • Consequence: Generated code is opaque; debugging requires understanding schema definitions and code generator logic
  • Schema-driven architecture via JSON files

    • Why: Single source of truth reduces inconsistency between documentation, code, and actual Office format support
    • Consequence: Schema changes trigger full regeneration; incremental updates to individual elements are not supported
  • v3.0.0 breaking changes (noted in README)

    • Why: Opportunity to modernize API, remove deprecated patterns, and align with current .NET best practices
    • Consequence: Upgrading from v2.x requires recompilation and potential code changes; migration guide necessary
  • Support for multiple Office formats (Word, Excel, PowerPoint) in single package

    • Why: Common OPC container and XML foundations allow consolidated SDK; reduces duplication
    • Consequence: Large binary size and complex namespace taxonomy; developers unfamiliar with one format must navigate unused schemas

🚫Non-goals (don't propose these)

  • Real-time collaborative editing of Office documents
  • Visual rendering of Office documents in the browser or desktop (read/write structure only, not presentation layer)
  • Automatic formula evaluation in Excel workbooks
  • Audio/video streaming support for embedded media
  • Cross-platform Office macro execution (sandbox not provided)
  • Encryption/decryption of password-protected Office documents

🪤Traps & gotchas

Schema generation caching: Regenerated strongly-typed classes depend on data/parts/ JSON structure; cache busting may be required if schema files change but build doesn't trigger regeneration. Strong naming: DocumentFormat.OpenXml.snk requires matching signed/unsigned assembly references; mixing can cause runtime binding failures. Standards compliance: The ISO 29500 spec is complex and non-obvious (Office implementation deviates from standard in subtle ways); bugs in schema mapping propagate silently to generated documents. Breaking changes in v3.0.0: Major version bump introduced incompatibilities; existing code may fail to compile without updates. NuGet feed migration: CI builds now use different feed URL (changed 2 April 2024); old build scripts will fail.

🏗️Architecture

💡Concepts to learn

  • OPC (Open Packaging Convention) — Office documents (docx, xlsx, pptx) are ZIP-based OPC packages; understanding parts, relationships, and package structure is fundamental to working with Open-XML-SDK
  • ISO 29500 Open XML Standard — The SDK is a C# implementation of ISO 29500; knowing the standard's structure, namespaces, and element definitions is essential for debugging schema mismatches and extending document types
  • Strongly-typed XML mapping via code generation — The SDK generates C# classes from JSON schemas (data/parts/*.json); understanding how schema definitions map to generated properties and inheritance hierarchies helps when adding new element types
  • LINQ to XML (XDocument/XElement) — Open-XML-SDK provides LINQ-to-XML wrappers around Office markup; proficiency in XDocument query patterns is necessary for working with document content programmatically
  • Strong name assembly signing (SNK) — DocumentFormat.OpenXml.snk enables strong naming for the assembly, ensuring binary compatibility and enabling GAC registration; critical for understanding versioning and deployment constraints
  • Semantic Versioning (SemVer) with GitVersion — GitVersion.yml automates version calculation from git history; understanding SemVer is necessary to predict breaking changes (v3.0.0 introduced incompatibilities) and communicate API evolution
  • Target Framework Monikers (TFM) and multi-targeting — Directory.Build.props orchestrates builds across multiple .NET versions (net6.0, net8.0, etc.); understanding TFM constraints is essential when adding dependencies or using platform-specific APIs
  • dotnet/roslyn — Provides the Roslyn compiler APIs used implicitly by code generation pipelines; understanding Roslyn helps debug and optimize code generation in Open-XML-SDK
  • dotnet/runtime — Core .NET runtime; critical for understanding TFM support, AOT compatibility constraints, and multi-platform assembly compatibility tested in aot.yml workflow
  • OfficeDev/Open-XML-SDK-docs — Official documentation repo for Open XML SDK; companion resource with tutorials, API reference, and standards guidance (if it exists)
  • OfficeDev/office-open-xml-docs — Office Open XML specification reference; defines the standards that the SDK implements and that data/parts/*.json schemas represent
  • dotnet/System.IO.Packaging — Provides low-level OPC (Open Packaging Convention) APIs that Open-XML-SDK wraps; understanding packaging mechanics is essential for troubleshooting part/document structure issues

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive unit tests for Open XML Parts deserialization

The repo contains 80+ part type definitions in data/parts/*.json (AlternativeFormatImportPart, ChartPart, CustomXmlPart, etc.) but there's no visible dedicated test suite validating serialization/deserialization of each part type. This is critical for ensuring part compatibility across Office formats. New contributors can add parameterized tests covering round-trip serialization for each part definition.

  • [ ] Create test file: src/DocumentFormat.OpenXml.Tests/Parts/PartSerializationTests.cs
  • [ ] Parse all part definitions from data/parts/*.json using OpenXmlData.targets
  • [ ] Implement parameterized tests for each part type covering: instantiation, XML serialization, deserialization, and property validation
  • [ ] Add edge case tests for parts with complex relationships (e.g., ChartPart with ChartDrawingPart references)
  • [ ] Run against existing code-coverage workflow in .github/workflows/code-coverage.yml

Implement namespace validation and documentation tests against data/namespaces.json

The repo maintains data/namespaces.json as a single source of truth for XML namespaces, but there's no validation layer ensuring all OpenXML elements use correct namespaces. This prevents namespace pollution bugs. New contributors can create tests that validate every generated class/element against the namespace definitions and auto-generate namespace documentation.

  • [ ] Create test file: src/DocumentFormat.OpenXml.Tests/Validation/NamespaceValidationTests.cs
  • [ ] Load namespace definitions from data/namespaces.json
  • [ ] Implement reflection-based tests scanning all OpenXmlElement subclasses for correct namespace URIs
  • [ ] Add tests validating that elements from different namespaces cannot be illegally mixed in parent relationships
  • [ ] Generate markdown documentation from namespaces.json in data/README.md with namespace-to-feature mappings

Create GitHub Actions workflow for validating generated code against schema definitions

The repo has .github/workflows/build.yml, code-coverage.yml, and codeql-analysis.yml, but no automated validation that generated C# classes match the data/*.json schema definitions. This prevents drift between schema and implementation. New contributors can add a CI workflow that runs on each PR to verify schema-to-code consistency.

  • [ ] Create workflow file: .github/workflows/schema-validation.yml
  • [ ] Build a schema validator tool (can be simple C# console app) that parses data/parts/*.json and data/namespaces.json
  • [ ] Compare generated classes in src/DocumentFormat.OpenXml/Generated/ against schema definitions for property names, types, cardinality
  • [ ] Run on PR triggers and report mismatches as workflow failures
  • [ ] Add job to test results workflow (.github/workflows/test-results.yml) to surface validation failures alongside test results

🌿Good first issues

  • Add comprehensive XML schema validation tests for the data/parts/*.json definitions to catch schema mismatches early; currently no visible test coverage validating that JSON schemas correctly represent Office standards.
  • Expand .editorconfig rules with explicit C# 12+ feature guidance (nullability annotations, collection expressions) and document them in CONTRIBUTING.md; codebase shows no visible guidelines for modern C# patterns.
  • Create focused documentation in docs/ (or expand CONTRIBUTING.md) with real before/after examples of common modifications: adding a new XML attribute type, extending a part class, and regenerating code; current repo lacks a 'developer guide' for schema changes.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • cfba2c5 — Bump dawidd6/action-download-artifact from 20 to 21 (#2077) (dependabot[bot])
  • 70115cc — Bump Microsoft.NET.Test.Sdk from 18.4.0 to 18.5.1 (#2078) (dependabot[bot])
  • 7d1bd90 — Potential bug in TryWriteBytes (#2049) (SimonCropp)
  • d7aceaa — use implicit conversion to convert from byte[] to Span<byte> (#2061) (mikeebowen)
  • de614cb — Bump Microsoft.Bcl.Memory and 2 others (#2068) (dependabot[bot])
  • 01decf8 — Bump danielpalme/ReportGenerator-GitHub-Action from 5.5.4 to 5.5.7 (#2072) (dependabot[bot])
  • 034d324 — remove Microsoft.SourceLink.GitHub (#2047) (SimonCropp)
  • bff3713 — avoid some string aloc by using StringBuilder (#2055) (SimonCropp)
  • bebd123 — pre size list in GetAttributes (#2045) (SimonCropp)
  • cf9e57b — Avoid some alloc by merging where and single (#2044) (SimonCropp)

🔒Security observations

The Open XML SDK by Microsoft demonstrates a strong security posture. It is an actively maintained, well-documented open-source project with proper security reporting channels established. No critical vulnerabilities were identified in the repository structure or visible configuration files. The main security considerations are: (1) ensuring XML processing includes XXE protections for users of the SDK, (2) maintaining the SECURITY.md documentation for proper vulnerability reporting, and (3) following standard practices for key management with the strong-named assembly. The project follows Microsoft's security guidelines and best practices for open-source .NET libraries.

  • Low · Strong Named Assembly with Hardcoded Key File — DocumentFormat.OpenXml.snk. The codebase uses a strong name key file (DocumentFormat.OpenXml.snk) for assembly signing. While this is a standard practice for .NET libraries, the presence of a hardcoded key file in the repository could be a concern if the private key is exposed. However, as this is an open-source project by Microsoft, this is likely intentional for public distribution. Fix: Ensure the SNK file contains only public key information or is properly secured. For production scenarios, consider using key vaults instead of repository-stored keys.
  • Low · Incomplete Security Policy Documentation — SECURITY.md. The SECURITY.md file appears to be truncated and incomplete. The Microsoft Security Response Center (MSRC) reporting endpoint URL is cut off, which could confuse users attempting to report security vulnerabilities. Fix: Complete the SECURITY.md file with full contact information for security vulnerability reporting, including the complete MSRC submission URL and expected response timeframes.
  • Low · XML Processing Library - Potential XXE Risk — Core library functionality (general exposure). The Open XML SDK is primarily an XML processing library. While the codebase itself appears well-maintained by Microsoft, XML parsing without proper safeguards could be vulnerable to XXE (XML External Entity) attacks if developers using this SDK don't properly configure the underlying XML readers. Fix: Ensure that all XML parsing operations in the SDK have XXE protection enabled by default. Document XXE security considerations in the README and provide secure usage examples.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · dotnet/Open-XML-SDK — RepoPilot