pandas-dev/pandas

Item: pandas-dev/pandas
Rating: 5
Author: RepoPilot

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Healthy

Healthy across the board

HealthyDependency

Permissive license, no critical CVEs, actively maintained — safe to depend on.

HealthyFork & modify

Has a license, tests, and CI — clean foundation to fork and modify.

HealthyLearn from

Documented and popular — useful reference codebase to read through.

HealthyDeploy as-is

No critical CVEs, sane security posture — runnable as-is.

⚠Concentrated ownership — top contributor handles 71% of recent commits
✓Last commit today
✓23+ active contributors
✓BSD-3-Clause licensed
✓CI configured
✓Tests present

Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests, cross-checked against OpenSSF Scorecard

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Want this for your own repo?

Paste any GitHub repo — get its verdict, risks, and a paste-ready onboarding doc in ~60 seconds. Free, no sign-up.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/pandas-dev/pandas)](https://repopilot.app/r/pandas-dev/pandas)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card

This card auto-renders when someone shares https://repopilot.app/r/pandas-dev/pandas on X, Slack, or LinkedIn.

Ask AI about pandas-dev/pandas

Grounded in the actual source code. Pick a starter question or write your own.

What does this repo do, in one paragraph?How would I get started using it?What are the main alternatives?Show me the entry point.

Or write your own question →

Onboarding doc

Onboarding: pandas-dev/pandas

Generated by RepoPilot · 2026-06-27 · Source

🎯Verdict

GO — Healthy across the board

Last commit today
23+ active contributors
BSD-3-Clause licensed
CI configured
Tests present
⚠ Concentrated ownership — top contributor handles 71% of recent commits

<sub>Computed from maintenance signals — commit recency, contributor breadth, bus factor, license, CI, tests, cross-checked against OpenSSF Scorecard</sub>

⚡TL;DR

pandas is a Python data manipulation library providing labeled data structures (DataFrames and Series) with fast I/O, statistical functions, and reshaping tools. It solves the core problem of working with tabular, time-series, and heterogeneous data in Python by offering R-like data.frame functionality with superior performance and expressiveness. Monolithic structure with pandas/ as the core module containing: pandas/core/ (DataFrame/Series engines and methods), pandas/io/ (readers for CSV/Excel/SQL/HDF5), pandas/tseries/ (time-series utilities), and pandas/_libs/ (Cython-accelerated routines). asv_bench/ holds ASV benchmarks; build system uses Meson (meson_python backend) and pixi for dependency management.

👥Who it's for

Data scientists, financial analysts, and Python developers performing exploratory data analysis, data cleaning, and statistical computation. Contributors are typically experienced Python developers comfortable with Cython and C extensions who maintain performance-critical data manipulation code.

🌱Maturity & risk

Highly mature and production-ready. The project has 42,000+ GitHub stars, spans 25+ years of development, includes comprehensive CI/CD via GitHub Actions (.github/workflows/unit-tests.yml, code-checks.yml), extensive test coverage tracked via codecov, and is backed by NumFOCUS. Regular releases and active maintenance are evident from the pixi-based build system and automated workflows.

Standard open source risks apply.

Active areas of work

Active development on performance optimization, type system improvements, and API stability. Recent infrastructure changes visible: migration to Meson build system, pixi for reproducible builds (.github/actions/setup-pixi/), and CodeQL security scanning. Ongoing work on nullable dtypes, extension arrays, and deprecation tracking (see .github/workflows/deprecation-tracking-bot.yml).

🚀Get running

git clone https://github.com/pandas-dev/pandas.git && cd pandas && pip install -e . --no-build-isolation (or use pixi: pixi install && pixi run develop). For testing: pytest pandas/tests/ or use the GitHub Actions workflow runner locally.

Daily commands: Development workflow: Clone repo, install with pip install -e . --no-build-isolation (handles Meson), run tests with pytest pandas/tests/frame/test_api.py -v. For performance work: python asv_bench/benchmarks/algorithms.py. CI/CD runs automatically on PR via .github/workflows/unit-tests.yml.

🗺️Map of the codebase

pandas/__init__.py — Main entry point that exports public API; defines what users import when they use pandas.
pandas/core/frame.py — Core DataFrame class—the central abstraction for tabular data manipulation in pandas.
pandas/core/series.py — Core Series class—the 1D labeled array abstraction; foundation for most operations.
pandas/core/generic.py — Base NDFrame class containing shared logic for DataFrame and Series; defines common methods.
pandas/core/indexes/base.py — Index base class managing row/column labels; critical for alignment, grouping, and selection.
pandas/core/groupby/groupby.py — GroupBy engine implementing split-apply-combine pattern; one of pandas' most-used APIs.
.github/workflows/unit-tests.yml — Continuous integration test pipeline; ensures all contributions pass before merge.

🧩Components & responsibilities

DataFrame & Series (Python, NumPy) — High-level labeled array interfaces exposing public methods for selection, transformation, and I/O.
- Failure mode: Type coercion errors, alignment mismatches, memory overflow on large data.
Index subsystem (Python, NumPy, hash tables) — Row/column label management, fast lookup, alignment during operations, and hierarchical indexing.
- Failure mode: Label collision, alignment deadlock in MultiIndex, performance cliffs with large cardinality.
BlockManager (Python, NumPy) — Internal columnar storage: maps index -> blocks; handles consolidation, coercion, and memory efficiency.
- Failure mode: Fragmented blocks cause memory bloat; dtype coercion downgrades precision unexpectedly.
GroupBy engine (Cython, NumPy) — Factorization and aggregation for split-apply-combine; compiles operations into efficient loops.
- Failure mode: Out-of-memory on huge group counts; incorrect aggregation with NaN handling edge cases.
I/O subsystem (Python, NumPy, external libraries (openpyxl, sqlalchemy, etc.)) — Parsers and writers for CSV, JSON, HDF, SQL, Excel, Stata, SAS; handles dtype inference and missing data.
- Failure mode: Encoding errors, dtype inference fails on ambiguous data, memory bloat on huge files.
Time series subsystem (Cython, NumPy, pytz, dateutil) — DatetimeIndex, TimedeltaIndex, PeriodIndex, timezone handling, resampling, and offset logic.
- Failure mode: DST ambiguity, timezone misalignment, offset arithmetic overflow, resampling with irregular gaps.
Reshape & merge (Python, NumPy, hash tables) — Pivoting, unstacking, concatenation, and joining on indices; alignment and duplicate handling.
- Failure mode: Cartesian explosion on cross joins, performance cliffs with large cardinality, memory bloat.

🔀Data flow

User code → pandas API (pd.read_csv, df.groupby) — High-level operations invoked via method calls.
pandas API → I/O parsers or Core engines (BlockManager, GroupBy) — API routes work to internal storage and computation layer.
Core engines → Index & BlockManager — Engines manipulate blocks and use Index for alignment and lookups.
BlockManager & Index → NumPy arrays — Underlying storage of all data; Cython operations directly access.
NumPy arrays (Cython) → Result objects (DataFrame, Series, Index) — Computed results wrapped back into high-level containers.

🛠️How to make changes

Add a new DataFrame method

Implement the method in pandas/core/frame.py (or pandas/core/generic.py if shared with Series) (pandas/core/frame.py)
Add docstring with Parameters, Returns, Examples sections following NumPy style (pandas/core/frame.py)
Create tests in pandas/tests/frame/methods/ or pandas/tests/generic/ (pandas/tests/frame/methods/test_<method_name>.py)
Export in pandas/init.py if a public API addition (pandas/__init__.py)
Update release notes (doc/source/whatsnew/vX.Y.Z.rst) (doc/source/whatsnew/vX.Y.Z.rst)

Add a new I/O format reader/writer

Implement reader/writer in pandas/io/<format>.py following existing pattern (e.g., csv, json) (pandas/io/<format>.py)
Register in pandas/io/init.py and expose via pd.read_<format>() (pandas/io/__init__.py)
Add comprehensive tests in pandas/tests/io/<format>/ (pandas/tests/io/<format>/test_<format>.py)
Add performance benchmarks in asv_bench/benchmarks/io/<format>.py (asv_bench/benchmarks/io/<format>.py)

Add a new Index type

Create class inheriting from Index in pandas/core/indexes/base.py or dedicated file (pandas/core/indexes/base.py)
Implement required abstract methods (_simple_new, _shallow_copy, _format_attrs, etc.) (pandas/core/indexes/<indextype>.py)
Add comprehensive tests in pandas/tests/indexes/test_<indextype>.py (pandas/tests/indexes/test_<indextype>.py)
Export in pandas/core/indexes/init.py (pandas/core/indexes/__init__.py)

Add a groupby aggregation function

Implement aggregation logic in pandas/core/groupby/groupby.py or ops module (pandas/core/groupby/groupby.py)
Register in groupby._aggregate_multiple_funcs() and _NumericGroupby if numeric-only (pandas/core/groupby/groupby.py)
Create tests in pandas/tests/groupby/aggregate/ (pandas/tests/groupby/aggregate/test_<function>.py)
Add benchmarks to asv_bench/benchmarks/groupby.py (asv_bench/benchmarks/groupby.py)

🔧Why these technologies

NumPy — Underlying array storage and vectorized numerical operations for performance.
Cython — Performance-critical algorithms (groupby, reductions, rolling windows) compiled to C.
Python — High-level API and declarative interface for data manipulation; lower performance-sensitive code.
ASV (Airspeed Velocity) — Continuous performance regression detection across commits and Python versions.

⚖️Trade-offs already made

Column-oriented BlockManager for internal storage
- Why: Efficient columnar operations and memory layout for large datasets; aligns with typical analytical workloads.
- Consequence: Row-wise operations less efficient; transposition cost higher than row-oriented stores.
Copy-on-write semantics gradually replacing implicit copy
- Why: Avoid silent data aliasing bugs and reduce memory overhead; clearer semantics.
- Consequence: Breaking change for some chained operations; requires version migration path.
Lazy evaluation avoided in favor of eager execution
- Why: Simplicity and predictability; matches NumPy/Python conventions.
- Consequence: Large operations can exhaust memory; no query optimization for complex chains.
Support for mixed dtypes in DataFrames
- Why: Real-world data is heterogeneous; needed for practical analytics.
- Consequence: Lower performance than homogeneous arrays; complexity in type inference and coercion.

🚫Non-goals (don't propose these)

Real-time streaming data processing (batch-oriented)
Distributed computing across clusters (single-machine focus; see pandas on Dask/Spark)
Graph or hierarchical database semantics
ACID transactions or data consistency guarantees
GPU acceleration (NumPy-compatible CPU-only by default)
Automatic data type inference from raw bytes without schema

⚠️Anti-patterns to avoid

Implicit copy vs. view ambiguity (High) — pandas/core/frame.py, pandas/core/series.py: Chained indexing (df[col][row]) may copy or view depending on underlying storage; copy-on-write migration incomplete.
Dtype coercion surprises (Medium) — pandas/core/dtypes/: Implicit upcasting to object or float64 on mixed operations; can silently lose precision or performance.
NaN handling inconsistency (Medium) — pandas/core/groupby/groupby.py, pandas/core/ops/: Some methods skip NaN, others include; behavior varies by operation and dtype.
Memory bloat from block consolidation (Medium) — pandas/core/internals/managers.py: Fragmented BlockManager can consume 2–3× expected memory until consolidation is forced.
Mutation during iteration (Low) — pandas/core/frame.py: User code can modify a Series/DataFrame while iterating; no guard; produces undefined behavior.

🔥Performance hotspots

pandas/core/groupby/groupby.py (CPU-bound) — GroupBy aggregation with many groups or high-cardinality keys; factorization and hashing dominate.
pandas/core/reshape/merge.py (CPU-bound + memory allocation) — Join/merge on large DataFrames with many columns; index alignment and sorting are O(n log n).
pandas/io/parsers/c_parser_engine.pyx (I/O + CPU) — CSV parsing with dtype inference; C engine still required for large files; memory copy per column.
undefined (undefined) — undefined

🪤Traps & gotchas

Cython modules require recompilation after any .pyx change; standard pip install may skip this—use pip install -e . --no-build-isolation with Meson. The BlockManager internal storage model is non-obvious; most changes affecting data layout need careful testing. Timezone-aware operations depend on pytz but prefer dateutil's zoneinfo; mixing creates subtle bugs. Tests require pytest and extensive test data; some tests are slow and marked @slow or use parametrize extensively. Documentation is built from docstrings; changes to public APIs require docstring updates. The repo uses deprecation cycles (FutureWarning paths); see pandas.errors.FutureWarning and deprecation decorators throughout.

🏗️Architecture

💡Concepts to learn

BlockManager — Understanding BlockManager—pandas' internal columnar storage organizing data into blocks of homogeneous dtype—is essential for optimizing memory layout, understanding copy semantics, and debugging strange performance cliffs
MultiIndex (Hierarchical Indexing) — MultiIndex enables labeled data manipulation with multiple index levels, critical for time-series and financial data; its implementation in pandas/core/indexes/multi.py reveals subtle performance gotchas with alignment
ExtensionDtype & ExtensionArray — The ExtensionDtype system (pandas/core/arrays/base.py) allows custom dtype implementations beyond NumPy; understanding this unlocks nullable integers, string dtypes, and categorical arrays
Indexing & Alignment — pandas' automatic label-based alignment during binary ops, reindexing, and join is powerful but non-intuitive; bugs often hide in edge cases like misaligned DatetimeIndex or categorical joins
Copy-on-Write (CoW) Semantics — Recent pandas versions (2.0+) introduced CoW to make copy/mutation semantics predictable; this is a major shift affecting SettingWithCopyWarning and memory efficiency
Groupby Split-Apply-Combine Pattern — The groupby engine (pandas/core/groupby/) implements the split-apply-combine paradigm; its fast paths and aggregate signature system are central to high-level API but complex to extend
Cython Compiled Hot Paths — pandas/_libs/ (lib.pyx, algos.pyx, hashtable.pyx) contains Cython-compiled critical sections for factorization, sorting, and deduplication; changes here require profiling and benchmark validation via asv_bench/

pandas-dev/pandas-stubs — Official type stub repository providing .pyi files for pandas public API; essential for type-checking pandas code
pandas-dev/pandas-vet — Linter package that enforces pandas best practices and anti-patterns in user code
numpy/numpy — Core numerical dependency; pandas DataFrames wrap NumPy arrays internally and depend on ndarray semantics
pydata/xarray — Companion library extending pandas concepts to multi-dimensional labeled arrays; shares contributor base and design philosophy
dask/dask — Parallel computing library providing Dask DataFrame that mimics pandas API for out-of-core data; popular for scaling pandas workflows

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive benchmarking suite for DataFrame indexing operations

The repo has asv_bench/benchmarks/indexing.py and asv_bench/benchmarks/indexing_engines.py, but these appear incomplete based on the partial file listing. Modern pandas users heavily rely on .loc, .iloc, and .at operations. Adding detailed benchmarks for edge cases (multi-index hierarchical access, boolean indexing with NA values, large sparse DataFrames) would help catch performance regressions early and provide contributors with baseline metrics.

[ ] Expand asv_bench/benchmarks/indexing.py with benchmarks for .loc[] with MultiIndex on 100K+ row DataFrames
[ ] Add benchmarks for mixed-type boolean indexing with NA values in asv_bench/benchmarks/boolean.py
[ ] Create new asv_bench/benchmarks/indexing_edge_cases.py covering NA propagation during fancy indexing
[ ] Run benchmarks against current main branch to establish baseline metrics

Implement missing dtype inference benchmarks for common data loading scenarios

The file asv_bench/benchmarks/inference.py exists but dtype inference is critical for CSV/JSON loading performance. Contributors could add realistic benchmarks for inferring dtypes on datasets with mixed types, nullable integers, and datetime parsing—directly addressing real-world performance bottlenecks.

[ ] Add benchmarks in asv_bench/benchmarks/inference.py for dtype inference on 10K+ rows with 50+ columns of mixed types
[ ] Create test cases for nullable integer (Int64) inference vs standard int64
[ ] Add datetime parsing inference benchmarks with timezone-aware and naive timestamps
[ ] Reference the benchmark results in updated documentation for dtype handling best practices

Add missing GitHub Actions workflow for dependency compatibility testing

The .github/workflows/ directory has unit-tests.yml, code-checks.yml, and wheels.yml, but there's no dedicated workflow for testing against multiple versions of key dependencies (NumPy, PyArrow, SciPy). This would catch upstream breaking changes early. New contributors can create a workflow that tests against NumPy stable, development, and NumPy 2.0+ versions.

[ ] Create .github/workflows/dependency-compat.yml testing pandas against NumPy stable, rc, and dev versions
[ ] Add matrix configuration for PyArrow versions (7.0, 10.0, 13.0+) since it's increasingly critical for dtype support
[ ] Include SciPy compatibility checks for stats and interpolation methods
[ ] Configure the workflow to run on schedule (weekly) and on dependency-related file changes (pyproject.toml, setup.py)

🌿Good first issues

Add missing docstring examples in pandas/core/series.py for recently added methods—many methods lack executable example code in their docstrings, blocking documentation generation.
Extend test coverage for pandas/io/excel/ Excel format readers (xlsx, xls, ods) by adding edge-case tests for malformed files, missing sheets, and formula preservation.
Implement type stub improvements in pandas/_typing.py for IndexLabel and DtypeObj union types; many internal APIs still lack precise stubs, affecting static analysis downstream.

⭐Top contributors

Click to expand

@jbrockmendel — 71 commits
@jorisvandenbossche — 5 commits
@rhshadrach — 2 commits
@gautamvarmadatla — 2 commits
@dependabot[bot] — 2 commits

📝Recent commits

Click to expand

b6de6c4 — ENH: fallback to zoneinfo Python API (for correct tz localization) for distant dates (#65733) (jorisvandenbossche)
871d627 — BUG: escape SQL identifiers in ADBCDatabase to prevent injection (#65066) (betoalien)
c665ed2 — BUG: refs not set in DataFrame constructor from Series/Index with dtype="str" (infer_string=False) (#65943) (jbrockmendel)
8ba3684 — BUG: track CoW references through Index.where (#65846) (jbrockmendel)
febf4ae — STYLE: enforce "|".join([...]) for message-regex alternations in tests (#65756) (jbrockmendel)
43d7f99 — DOC: clarify that stack does not sort, only unstack (#65955) (jbrockmendel)
1fe4ff4 — CLN: remove genuinely-unreachable dead code (GH#27396) (#66045) (jbrockmendel)
f52d478 — BUG: Fix MultiIndex.equals regression regarding code bit-width (#65700) (#65701) (anzinmhd)
ec7e85a — BUG: raise a proper exception when str.rsplit is passed a regex and clarify the docs (#62035) (hamdanal)
1eaed84 — PERF: Frequency determination when concatting datetimes (#65920) (rhshadrach)

🔒Security observations

The pandas codebase is a well-established open-source project with generally good security practices, but there are several areas of concern related to file format parsing and SQL operations. The main risks stem from: (1) Support for inherently unsafe formats like pickle, (2) Multiple file parsers that could be vulnerable to malicious inputs, (3) SQL I/O functionality that requires careful parameterization, and (4) Lack of visible centralized input validation. The project appears to follow standard security practices for an open-source library, with CI/CD pipelines and code quality checks in place. However, more explicit security documentation and validation frameworks would strengthen the security posture. No hardcoded credentials were detected in the file structure, and no obvious Docker misconfigurations were found.

Medium · Potential SQL Injection in IO Module — asv_bench/benchmarks/io/sql.py and pandas/io/sql.py (inferred). The codebase includes SQL I/O functionality (sql.py in io/). Without examining the actual implementation, SQL operations in pandas could be vulnerable to injection attacks if user input is not properly parameterized or sanitized. Fix: Ensure all SQL queries use parameterized statements with bound parameters. Never concatenate user input directly into SQL queries. Use ORM or prepared statements consistently.
Medium · Multiple File Format Parsers — asv_bench/benchmarks/io/ (csv.py, excel.py, hdf.py, json.py, pickle.py, sas.py, stata.py). The codebase includes parsers for various file formats (CSV, Excel, HDF, JSON, Pickle, SAS, Stata). These parsers could be vulnerable to arbitrary code execution or denial-of-service attacks if malicious files are processed without proper validation. Fix: Implement strict file format validation before parsing. Use safe deserialization methods (avoid pickle for untrusted data). Set resource limits on file parsing (max file size, recursion depth, etc.). Document security risks of each format.
Medium · Pickle Deserialization Risk — asv_bench/benchmarks/io/pickle.py and pandas/io/pickle.py (inferred). Pickle format support (pickle.py) is inherently unsafe as it can execute arbitrary Python code during deserialization. This is a known security risk when processing untrusted data. Fix: Document security warnings about pickle format. Recommend alternatives (JSON, Parquet, HDF5) for untrusted data. Consider adding security warnings in documentation and function docstrings.
Low · No Apparent Input Validation Framework — Repository root - overall architecture. The file structure does not show evidence of centralized input validation or sanitization utilities. This could lead to inconsistent security practices across different modules. Fix: Implement a centralized input validation framework. Create security utilities for common validation tasks (file path traversal prevention, type checking, range validation). Use throughout the codebase.
Low · Potential Path Traversal in File Operations — asv_bench/benchmarks/io/ (all file format modules). File I/O operations (CSV, Excel, HDF, SAS, Stata) may be vulnerable to path traversal attacks if file paths are not properly validated before use. Fix: Validate and sanitize all file paths before file operations. Use os.path.abspath() and ensure paths are within expected directories. Implement allowlist-based path validation.

LLM-derived; treat as a starting point, not a security audit.

👉Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/pandas-dev/pandas shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live pandas-dev/pandas repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/pandas-dev/pandas.

What it runs against: a local clone of pandas-dev/pandas — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in pandas-dev/pandas | Confirms the artifact applies here, not a fork | | 2 | License is still BSD-3-Clause | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>pandas-dev/pandas</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of pandas-dev/pandas. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/pandas-dev/pandas.git
#   cd pandas
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of pandas-dev/pandas and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "pandas-dev/pandas(\\.git)?\\b" \\
  && ok "origin remote is pandas-dev/pandas" \\
  || miss "origin remote is not pandas-dev/pandas (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(BSD-3-Clause)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"BSD-3-Clause\"" package.json 2>/dev/null) \\
  && ok "license is BSD-3-Clause" \\
  || miss "license drift — was BSD-3-Clause at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "pandas/__init__.py" \\
  && ok "pandas/__init__.py" \\
  || miss "missing critical file: pandas/__init__.py"
test -f "pandas/core/frame.py" \\
  && ok "pandas/core/frame.py" \\
  || miss "missing critical file: pandas/core/frame.py"
test -f "pandas/core/series.py" \\
  && ok "pandas/core/series.py" \\
  || miss "missing critical file: pandas/core/series.py"
test -f "pandas/core/generic.py" \\
  && ok "pandas/core/generic.py" \\
  || miss "missing critical file: pandas/core/generic.py"
test -f "pandas/core/indexes/base.py" \\
  && ok "pandas/core/indexes/base.py" \\
  || miss "missing critical file: pandas/core/indexes/base.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/pandas-dev/pandas"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Similar Python repos

Other healthy-signal Python repos by stars.

Embed this chat in your README →

Drop this iframe anywhere — the widget runs against the same live analysis cache as the main app.

<iframe
  src="https://repopilot.app/embed/pandas-dev/pandas"
  width="100%" height="500"
  style="border:1px solid #d0d7de; border-radius:8px;"
  allow="microphone"
  loading="lazy"
></iframe>