RepoPilotOpen in app →

NationalSecurityAgency/ghidra

Ghidra is a software reverse engineering (SRE) framework

WAIT

Mixed signals — read the receipts

  • Last commit 4d ago
  • 5 active contributors
  • Apache-2.0 licensed
  • Tests present
  • Small team — 5 top contributors
  • Concentrated ownership — top contributor handles 73% of commits
  • No CI workflows detected

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Embed this verdict

[![RepoPilot: WAIT](https://repopilot.app/api/badge/nationalsecurityagency/ghidra)](https://repopilot.app/r/nationalsecurityagency/ghidra)

Paste into your README — the badge live-updates from the latest cached analysis.

Onboarding doc

Onboarding: NationalSecurityAgency/ghidra

Generated by RepoPilot · 2026-05-05 · Source

Verdict

WAIT — Mixed signals — read the receipts

  • Last commit 4d ago
  • 5 active contributors
  • Apache-2.0 licensed
  • Tests present
  • ⚠ Small team — 5 top contributors
  • ⚠ Concentrated ownership — top contributor handles 73% of commits
  • ⚠ No CI workflows detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

Ghidra is a software reverse engineering (SRE) framework built by the NSA that enables analysts to disassemble, decompile, and analyze compiled binaries across dozens of processor architectures (x86, ARM, MIPS, RISC-V, etc.) and executable formats (ELF, PE, Mach-O, DMG, etc.). It solves the problem of understanding compiled code without source, supporting both interactive GUI analysis and headless/automated scripting workflows. Core capabilities include a decompiler, control-flow graphing, cross-references, symbol analysis, and a plugin/script extension model in Java and Python. Ghidra is a Gradle multi-project monorepo: the main application lives under Ghidra/ (Features, Framework, Processors, Extensions subdirectories), while GPL-licensed modules are isolated under GPL/ (e.g., GPL/DMG/ for Apple DMG file support). The DMG module specifically implements a standalone subprocess model — GPL/DMG/src/dmg/java/mobiledevices/dmg/ contains a self-contained HFS+ parser (BTree, decmpfs, binary reader wrappers) that Ghidra invokes via DmgServerProcessManager over IPC rather than loading it in-process.

Who it's for

Malware analysts, vulnerability researchers, CTF players, and government/defense SRE teams who need to reverse engineer binaries at scale or in team environments. Contributors are typically Java engineers familiar with reverse engineering tooling or binary format specialists adding new processor modules or file format loaders.

Maturity & risk

Ghidra was publicly released by the NSA in 2019 and has been under active development since, with thousands of GitHub stars and a large community. The codebase has CI, extensive test infrastructure, and structured module manifests (e.g., GPL/DMG/Module.manifest, GPL/DMG/certification.manifest). It is production-ready and used in professional SRE environments, though individual modules vary in polish.

The project's own README explicitly calls out known security vulnerabilities in certain versions and links to Security Advisories, which is a concrete risk for any deployment. The codebase is predominantly Java (101M+ bytes) with significant C++ native components (~7M bytes) for the decompiler engine, introducing cross-language build complexity. The Gradle build system requires specific JDK 21, Gradle 8.5+, and platform-specific native toolchains (GCC/Clang on Linux/macOS, MSVC on Windows), making environment setup fragile.

Active areas of work

Based on visible repo structure, the DMG module is actively maintained with full BTree record parsing (BTreeHeaderRecord, BTreeNodeDescriptor, BTreeRootNodeDescriptor) and decmpfs decompression support (DecmpfsHeader, DecmpfsCompressionTypes). The build system is being kept current with Gradle wrapper support and multi-platform native binary distribution (GPL/DMG/data/os/win_x86_32/, win_x86_64/). PyGhidra integration (Python 3.9–3.14 support) appears to be a recent addition based on README launch instructions.

Get running

git clone https://github.com/NationalSecurityAgency/ghidra.git cd ghidra

Ensure JDK 21 64-bit is installed and JAVA_HOME is set

Ensure Gradle 8.5+ is available or use the wrapper

./gradlew buildGhidra

Output will be in build/dist/ as a zip

Extract and run:

unzip build/dist/ghidra_.zip -d /opt/ /opt/ghidra_/ghidraRun

Daily commands: ./gradlew buildGhidra # full build ./gradlew :GPL:DMG:jar # build just the DMG module

After extracting release zip:

./ghidraRun # GUI mode ./support/analyzeHeadless <projectPath> <projectName> -import <binary> # headless mode

Map of the codebase

  • GPL/DMG/src/dmg/java/mobiledevices/dmg/reader/DmgFileReader.java — Primary entry point for reading Apple DMG disk images; orchestrates parsing of all DMG structures and is the load-bearing class for the DMG module.
  • GPL/DMG/src/dmg/java/mobiledevices/dmg/server/DmgServer.java — Standalone server process that Ghidra communicates with to extract DMG contents; defines the IPC boundary between Ghidra and the GPL DMG module.
  • GPL/DMG/src/dmg/java/mobiledevices/dmg/ghidra/GBinaryReader.java — Core binary reading abstraction used throughout the DMG parser; every structure parser depends on this class for byte-level data access.
  • GPL/DMG/src/dmg/java/mobiledevices/dmg/hfsplus/AttributesFileParser.java — Parses the HFS+ attributes file within DMG images; critical for extracting extended attributes and file metadata from Apple disk images.
  • GPL/DMG/src/dmg/java/mobiledevices/dmg/btree/BTreeRootNodeDescriptor.java — Root of the B-tree traversal hierarchy used for HFS+ catalog and attributes file navigation; all file lookups depend on correct B-tree parsing.
  • GPL/DMG/src/dmg/java/mobiledevices/dmg/reader/DmgInputStream.java — Provides streaming decompressed access to DMG partition data; bridges compressed DMG blocks to readable byte streams for higher-level parsers.
  • GPL/DemanglerGnu/src/demangler_gnu_v2_41/c/cp-demangle.c — Core GNU C++ demangler implementation (v2.41); the heaviest dependency in the DemanglerGnu module, handling all Itanium ABI name demangling for Ghidra symbol analysis.

How to make changes

Add support for a new DMG block compression type

  1. Add the new compression type constant to the compression type enum/constants file (GPL/DMG/src/dmg/java/mobiledevices/dmg/decmpfs/DecmpfsCompressionTypes.java)
  2. Implement decompression logic; add a new case to the block decompression switch in the DMG input stream (GPL/DMG/src/dmg/java/mobiledevices/dmg/reader/DmgInputStream.java)
  3. Update decmpfs header parsing if the new type requires new header fields (GPL/DMG/src/dmg/java/mobiledevices/dmg/decmpfs/DecmpfsHeader.java)
  4. Wire the new decompressor in the top-level file reader and verify it is exercised during DMG parsing (GPL/DMG/src/dmg/java/mobiledevices/dmg/reader/DmgFileReader.java)

Add a new HFS+ B-tree node record type

  1. Define a new node kind constant in the B-tree node kinds file (GPL/DMG/src/dmg/java/mobiledevices/dmg/btree/BTreeNodeKinds.java)
  2. Create a new Java class for the record type, using GBinaryReader for field parsing, following the pattern of existing record classes (GPL/DMG/src/dmg/java/mobiledevices/dmg/btree/BTreeNodeRecord.java)
  3. Register the new record type in the B-tree node descriptor dispatch logic (GPL/DMG/src/dmg/java/mobiledevices/dmg/btree/BTreeNodeDescriptor.java)
  4. Expose or process the new record data through the HFS+ attributes parser if the record carries attribute data (GPL/DMG/src/dmg/java/mobiledevices/dmg/hfsplus/AttributesFileParser.java)

Add a new language demangler to the GNU Demangler module

  1. Add the new C demangling source file following the naming convention of existing demanglers (e.g. mylang-demangle.c) (GPL/DemanglerGnu/src/demangler_gnu_v2_41/c/rust-demangle.c)
  2. Add any required header declarations for the new demangler (GPL/DemanglerGnu/src/demangler_gnu_v2_41/headers/demangle.h)
  3. Update the cxxfilt CLI entry point to call the new demangler when the appropriate mangling scheme is detected (GPL/DemanglerGnu/src/demangler_gnu_v2_41/c/cxxfilt.c)
  4. Add the new source file to the DemanglerGnu build so it is compiled into the native binary (GPL/DemanglerGnu/build.gradle)

Add a new extended attribute (xattr) type handler

  1. Add the new xattr name constant to the xattr constants file (GPL/DMG/src/dmg/java/mobiledevices/dmg/xattr/XattrConstants.java)
  2. Add parsing logic for the new xattr data in the attributes file parser, using GBinaryReader (GPL/DMG/src/dmg/java/mobiledevices/dmg/hfsplus/AttributesFileParser.java)

Traps & gotchas

  1. The DMG module's native DLLs (llio_amd64.dll, llio_i386.dll, llio_ia64.dll) are pre-built binaries checked into the repo under GPL/DMG/data/os/ — they are NOT rebuilt by Gradle and must match the platform. 2) The GPL/ modules use a flat directory repository (data/lib/*.jar) for dependencies like csframework, hfsx, and hfsx_dmglib — these are not on Maven Central and the build will silently fail to resolve them if the jars are missing or renamed. 3) The DMG module builds a standalone jar (DMG.jar) that runs as a separate JVM subprocess — debugging it requires attaching to the child process, not the main Ghidra JVM. 4) JDK must be exactly 21 64-bit; other versions are not supported. 5) The buildGhidra task requires platform-native C++ toolchain for the decompiler — pure Java builds are not possible.

Concepts to learn

  • SLEIGH Processor Specification Language — Ghidra's ISA definitions are written in SLEIGH (.slaspec files), a domain-specific language for specifying instruction encodings and semantics — understanding it is required to add or fix any processor module.
  • HFS+ BTree Structure — Apple's HFS+ filesystem organizes catalog, extents, and attribute data in B-trees — the entire GPL/DMG/src/dmg/java/mobiledevices/dmg/btree/ package implements this on-disk format for navigating DMG contents.
  • P-Code Intermediate Representation — Ghidra translates all architectures into P-Code (a RISC-like IR) before analysis and decompilation — all cross-architecture analysis, including the decompiler, operates on P-Code rather than native instructions.
  • decmpfs Transparent Compression — Apple uses a per-file compression scheme (decmpfs) stored in extended attributes on HFS+ — files in DMGs may appear as their uncompressed size but require decompression, handled in GPL/DMG/src/dmg/java/mobiledevices/dmg/decmpfs/.
  • Subprocess Isolation for GPL Code — The DMG module runs as a separate JVM process (invoked via DmgServerProcessManager) specifically to isolate GPL-licensed HFS+ library code from Ghidra's non-GPL core — this is a legal architecture, not a performance one, and affects how you debug it.
  • Function ID (FID) Fingerprinting — Ghidra uses hash-based function fingerprinting to identify library functions in stripped binaries without symbols — understanding this is key to extending Ghidra's library recognition capabilities.

Related repos

  • radareorg/radare2 — Open-source reverse engineering framework solving the same binary analysis problem with a CLI-first approach and C implementation, the primary open-source alternative to Ghidra.
  • rizinorg/rizin — Radare2 fork with modernized API and Cutter GUI frontend — another direct alternative for binary analysis and disassembly.
  • angr/angr — Python binary analysis framework often used alongside Ghidra for symbolic execution and automated vulnerability discovery on the same binaries.
  • NationalSecurityAgency/ghidra-data — Companion NSA repo providing additional processor language files and FID (Function ID) databases that extend Ghidra's analysis capabilities.
  • mandiant/Ghidrathon — Ecosystem extension that replaces Ghidra's Jython Python 2 scripting with CPython 3, directly relevant to the PyGhidra integration work visible in the README.

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for GPL/DMG BTree parsing classes (BTreeHeaderRecord, BTreeNodeDescriptor, etc.)

The GPL/DMG module contains several BTree-related classes (BTreeHeaderRecord.java, BTreeNodeDescriptor.java, BTreeRootNodeDescriptor.java, BTreeNodeRecord.java, BTreeMapRecord.java, BTreeUserDataRecord.java) with zero visible test coverage. These classes are critical for parsing HFS+ disk images correctly. Bugs here would cause silent data corruption or misread file system structures during reverse engineering. Adding targeted unit tests would catch regressions and help new contributors understand the binary parsing logic.

  • [ ] Create a test source directory at GPL/DMG/src/test/java/mobiledevices/dmg/btree/
  • [ ] Write BTreeHeaderRecordTest.java that constructs mock byte arrays matching the HFS+ BTree header spec and asserts correct field parsing via GBinaryReader (GPL/DMG/src/dmg/java/mobiledevices/dmg/ghidra/GBinaryReader.java)
  • [ ] Write BTreeNodeDescriptorTest.java covering both leaf and index node descriptor parsing, including edge cases for corrupt or truncated data
  • [ ] Write BTreeRootNodeDescriptorTest.java verifying that root node detection and record count fields parse correctly
  • [ ] Add a test fixture DMG file or synthetic byte array builder utility to avoid depending on real disk images
  • [ ] Wire the new test source set into GPL/DMG/build.gradle under a testImplementation dependency block and register a test task

Refactor GPL/DMG/src/dmg/java/mobiledevices/dmg/ghidra/ shim layer to reduce code duplication between GDataConverterBE.java and GDataConverterLE.java

The dmg/ghidra/ package contains hand-rolled duplicates of Ghidra's internal utility classes (GBinaryReader, GByteProvider, GDataConverter, GDataConverterBE, GDataConverterLE, GConv, etc.) to allow the DMG server process to run standalone. GDataConverterBE and GDataConverterLE almost certainly share large amounts of mirrored byte-swapping logic. Extracting a shared abstract base class or common static utility methods would reduce maintenance burden and the risk of divergent bug fixes, while keeping the standalone nature of the module intact.

  • [ ] Audit GPL/DMG/src/dmg/java/mobiledevices/dmg/ghidra/GDataConverterBE.java and GDataConverterLE.java line-by-line to catalogue duplicated methods
  • [ ] Create an abstract class AbstractGDataConverter.java in the same package that implements all methods whose logic differs only by byte-order, using a constructor-injected boolean or ByteOrder field
  • [ ] Refactor GDataConverterBE and GDataConverterLE to extend AbstractGDataConverter, removing all duplicated method bodies
  • [ ] Verify GConv.java and GBinaryReader.java still compile and function correctly after the refactor
  • [ ] Run existing integration smoke-test by building GPL/DMG with './gradlew :GPL:DMG:jar' and confirming the jar produces correctly sized output
  • [ ] Update GPL/DMG/README.md to document the standalone shim layer design and why it mirrors core Ghidra classes

Add a GitHub Actions CI workflow to build and test the GPL/DMG standalone module on every PR

The repository's .github/ directory currently contains only ISSUE_TEMPLATE files — there is no visible CI workflow file (no .github/workflows/ directory shown in the file tree). The GPL/DMG module has its own build.gradle and is a self-contained Java project, making it an ideal first target for a focused CI workflow. Without CI, regressions in the DMG parsing pipeline (decmpfs decompression, xattr reading, HFS+ BTree traversal) can silently land on main. This is a concrete, scoped workflow — not a generic 'add

Good first issues

  1. Add unit tests for BTreeHeaderRecord.java and BTreeMapRecord.java in GPL/DMG/ — there appear to be no test source sets defined in build.gradle for the dmg sourceSet. 2) Add Javadoc to the GBinaryReader.java and GByteProvider.java classes in GPL/DMG/src/dmg/java/mobiledevices/dmg/ghidra/ which are undocumented but critical for understanding the subprocess communication pattern. 3) The DecmpfsCompressionTypes.java and DecmpfsStates.java enums likely lack coverage for newer Apple compression algorithms (e.g., LZFSE, zlib-raw) added in macOS 10.11+ — adding those constants and documenting them would be a concrete, bounded contribution.

Top contributors

Recent commits

  • b3eef59 — Merge remote-tracking branch 'origin/GP-0-dragonmacher-test-fixes-4-30-26' (ryanmkurtz)
  • a47246a — Test fixes (dragonmacher)
  • 4e83ee7 — Merge remote-tracking branch 'origin/Ghidra_12.1' (ryanmkurtz)
  • e115cd1 — GP-6754: Throw Error instead of System.exit() (ryanmkurtz)
  • b337505 — Merge remote-tracking branch 'origin/Ghidra_12.1' (ryanmkurtz)
  • dbce976 — Merge remote-tracking branch 'origin/GP-0_ghidorahrex_8051_fix' into Ghidra_12.1 (ryanmkurtz)
  • 9e18f94 — GP-0: Fixed missing define statement (GhidorahRex)
  • 7118ef9 — Merge remote-tracking branch 'origin/GP-6667_dev747368_rust_dwarf_empty_func_params' (ryanmkurtz)
  • e1c3134 — Merge remote-tracking branch 'origin/GP-6761_ghidragon_append_graph_bug--SQUASHED' (ryanmkurtz)
  • f44a52e — Merge remote-tracking branch 'origin/GP-6692-dragonmacher-too-many-symbols--SQUASHED' (ryanmkurtz)

Security observations

  • High · Use of Unversioned/Unverified Local JAR Dependencies — GPL/DMG/build.gradle, GPL/DMG/data/lib/. The build.gradle file references local JAR files (csframework.jar, hfsx.jar, hfsx_dmglib.jar, iharder-base64.jar) from a flat directory repository ('data/lib'). These JARs have no integrity verification (no checksums, no signatures), are bundled directly in the repository, and have unclear provenance. The hfsx library is a third-party HFS+ parser which, if outdated or tampered with, could introduce vulnerabilities in the DMG parsing pipeline. The version 'hfsexplorer-0_21' suggests a very old release (0.21). Fix: Migrate dependencies to a proper dependency management system (e.g., Maven Central or a private Nexus/Artifactory) with explicit version pinning and checksum/signature verification. Remove bundled JARs from the repository. Upgrade hfsexplorer to a current, maintained version and audit all third-party libraries for known CVEs.
  • High · Inclusion of Pre-compiled Binary DLLs in Source Repository — GPL/DMG/data/os/win_x86_32/, GPL/DMG/data/os/win_x86_64/. Multiple pre-compiled Windows DLL files (llio_amd64.dll, llio_i386.dll, llio_ia64.dll) are committed directly to the repository under GPL/DMG/data/os/. Binary blobs in source repositories cannot be audited for malicious code, backdoors, or vulnerabilities through standard static analysis. These DLLs are loaded at runtime and could execute arbitrary native code with the privileges of the Ghidra process. Fix: Build all native binaries from source as part of the CI/CD pipeline rather than committing pre-compiled binaries. If pre-built binaries must be included, provide cryptographic checksums (SHA-256) and code-signing certificates. Implement integrity checks before loading any native library.
  • High · Arbitrary File Parsing Attack Surface in DMG/HFS+ Parser — GPL/DMG/src/dmg/java/mobiledevices/dmg/reader/DmgFileReader.java, GPL/DMG/src/dmg/java/mobiledevices/dmg/hfsplus/AttributesFileParser.java, GPL/DMG/src/dmg/java/mobiledevices/dmg/zlib/ZLIB.java. The codebase implements a full DMG/HFS+ file format parser (DmgFileReader.java, AttributesFileParser.java, BTree* classes, DecmpfsHeader.java, ZLIB.java). Parsing complex binary formats from untrusted sources is a high-risk activity prone to buffer overflows, integer overflows, and memory corruption in the underlying native libraries (hfsx). The ZLIB decompression (ZLIB.java) and decmpfs decompression paths are particularly risky when processing maliciously crafted DMG files. Fix: Apply strict input validation and bounds checking on all parsed fields before use. Run the DMG parsing component in a sandboxed process with minimal privileges (principle of least privilege). Fuzz test the parser with tools like AFL or libFuzzer. Consider using memory-safe parsing alternatives and enforce resource limits (max decompressed size, max recursion depth).
  • Medium · DmgServer Running as Externally Callable Process Without Apparent Authentication — GPL/DMG/src/dmg/java/mobiledevices/dmg/server/DmgServer.java. The presence of DmgServer.java and DmgServerProcessManager (referenced in comments) indicates that the DMG parsing functionality is exposed as a server process that Ghidra communicates with. If this server binds to a network socket without proper authentication or access controls, it could be accessible to unauthorized local or network clients, enabling arbitrary DMG file parsing or potential exploitation. Fix: Ensure the DmgServer only listens on loopback (127.0.0.1) and not on all interfaces. Implement authentication/authorization for all inter-process communication. Use Unix domain sockets or named pipes instead of TCP sockets where possible. Validate and sanitize all inputs received by the server.
  • Medium · Potential Path Traversal in File Utility Methods — undefined. GFileUtilityMethods.java and GRandomAccessFile.java provide file access abstractions. If file paths are constructed from user-supplied or DMG-derived data Fix: undefined

LLM-derived; treat as a starting point, not a security audit.

Where to read next


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

WAIT · NationalSecurityAgency/ghidra — RepoPilot Verdict