RepoPilotOpen in app →

simonw/datasette

An open source multi-tool for exploring and publishing data

Healthy

Healthy across all four use cases

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit 1d ago
  • 5 active contributors
  • Apache-2.0 licensed
Show all 6 evidence items →
  • CI configured
  • Tests present
  • Single-maintainer risk — top contributor 80% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/simonw/datasette)](https://repopilot.app/r/simonw/datasette)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/simonw/datasette on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: simonw/datasette

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/simonw/datasette shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across all four use cases

  • Last commit 1d ago
  • 5 active contributors
  • Apache-2.0 licensed
  • CI configured
  • Tests present
  • ⚠ Single-maintainer risk — top contributor 80% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live simonw/datasette repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/simonw/datasette.

What it runs against: a local clone of simonw/datasette — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in simonw/datasette | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | Last commit ≤ 31 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>simonw/datasette</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of simonw/datasette. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/simonw/datasette.git
#   cd datasette
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of simonw/datasette and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "simonw/datasette(\\.git)?\\b" \\
  && ok "origin remote is simonw/datasette" \\
  || miss "origin remote is not simonw/datasette (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 31 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~1d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/simonw/datasette"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

Datasette is a multi-tool for exploring, analyzing, and publishing SQLite databases as interactive web applications and APIs. It transforms raw SQLite databases into explorable websites with built-in faceting, full-text search, and REST APIs—enabling data journalists, archivists, and researchers to instantly publish data without writing backend code. Monolithic Python package under datasette/ with modular separation: app.py (core ASGI application), database.py (SQLite abstraction), cli.py (command-line interface), default_permissions/ (auth system), facets.py and filters.py (query features). Static assets (JS/CSS/HTML) in datasette/static/. Plugin system enabled via _pytest_plugin.py for extensibility.

👥Who it's for

Data journalists, museum curators, archivists, local government officials, and researchers who have SQL databases and want to publish them as interactive web interfaces and APIs without building custom backends. Contributors are typically Python developers interested in data tooling and web frameworks.

🌱Maturity & risk

Highly mature and production-ready. The project has extensive CI/CD workflows (13+ GitHub Actions pipelines), comprehensive test coverage tracked via codecov, Docker distribution, ReadTheDocs documentation, and active maintenance visible in stable versioning (PyPI badge present). The codebase is 1.5M+ lines of Python with clear module organization, indicating years of production use.

Low risk for a single-maintainer project (simonw). The Python dependency surface is well-managed (can be inferred from setup patterns). Primary risks: tight coupling to SQLite capabilities, and breaking changes in major versions require migration effort for published databases. However, strong test coverage and documented API surface mitigate these concerns.

Active areas of work

Active development visible in multiple CI workflows including test-pyodide (WebAssembly support), docker tagging, documentation deployment, and prettier code formatting checks. Recent work spans permissions, column type handling, and API improvements based on workflow diversity.

🚀Get running

git clone https://github.com/simonw/datasette.git
cd datasette
pip install -e '.[dev]'
datasette serve --help

Daily commands: Development: pip install -e '.[dev]' then datasette serve path/to/db.db. The app listens on port 8001 by default. Use datasette --help to see all CLI options. For frontend changes, run npm run prettier after editing JavaScript in datasette/static/.

🗺️Map of the codebase

  • datasette/app.py: Main ASGI application class that routes all HTTP requests, manages database lifecycle, and coordinates permission checks
  • datasette/database.py: Abstraction layer for SQLite interaction; handles connection pooling, query execution, and transaction management
  • datasette/cli.py: Command-line interface entry point; parses arguments like serve, publish, and database file paths
  • datasette/default_permissions/config.py: Permission system configuration and authorization logic that determines who can access which tables and execute which queries
  • datasette/facets.py: Implements faceted search/filtering—a core differentiator allowing users to explore data by dimension without writing SQL
  • datasette/filters.py: Query filter parsing and construction; converts URL parameters to WHERE clauses and SQL WHERE conditions
  • datasette/static/: Frontend JavaScript and CSS; contains CodeMirror SQL editor integration and API client code bundled with Rollup
  • .github/workflows/test.yml: Primary test suite CI; defines Python version matrix, coverage reporting, and test command execution

🛠️How to make changes

New database feature: extend datasette/database.py and add tests in tests/. New UI component: add HTML templates in datasette/templates/, JavaScript in datasette/static/ (follow Rollup/CodeMirror patterns). New permission: add logic to datasette/default_permissions/ and register in datasette/default_permissions/config.py. CLI commands: modify datasette/cli.py. Plugin hooks: document in datasette/_pytest_plugin.py.

🪤Traps & gotchas

  1. Plugin loading order: _pytest_plugin.py hooks must be registered before database initialization in tests. 2) SQLite version sensitivity: Some SQL functions (JSON, window functions) require SQLite 3.25+; test-sqlite-support.yml catches version mismatches. 3) ASGI app instance: app.py creates a singleton instance on import; custom initialization must happen before that or via plugin hooks. 4) Permission caching: Default permissions may be cached; changes require app restart. 5) Static file bundling: JavaScript edits in datasette/static/ must run through Rollup (configured in rollup.config.js, not visible in file list but required); raw .js won't be served in production.

💡Concepts to learn

  • coleifer/peewee — ORM for SQLite and other databases; often used alongside Datasette by developers building custom backends instead of publishing raw databases
  • dbt-labs/dbt — Data transformation and modeling tool that produces normalized SQL database outputs that can be published via Datasette
  • simonw/datasette-plugins — Official plugin registry and examples for extending Datasette with custom features
  • simonw/dogsheep — Collection of tools to export personal data into SQLite databases designed to be published with Datasette
  • apache/superset — Alternative data exploration and visualization tool for SQL databases; heavier but offers more charting options than Datasette's faceting

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive tests for datasette/default_permissions module

The datasette/default_permissions/ directory contains 5 modules (config.py, defaults.py, helpers.py, restrictions.py, root.py, tokens.py) that handle critical permission logic, but there's no dedicated test file visible in the structure. This is a security-sensitive area that needs high test coverage. A new contributor could create tests/test_default_permissions.py covering all permission scenarios.

  • [ ] Review datasette/default_permissions/*.py to understand permission logic flow
  • [ ] Create tests/test_default_permissions.py with test classes for each module
  • [ ] Add tests for token validation, restriction enforcement, and root permission checks
  • [ ] Ensure coverage reaches >90% for the default_permissions package
  • [ ] Run existing test suite to verify no regressions

Add GitHub Action workflow for testing plugin compatibility

Datasette is a plugin-based system (datasette/plugins.py and datasette/hookspecs.py exist), but there's no dedicated CI workflow testing plugin integration. Currently, workflows test core functionality but not plugins. A new contributor could create .github/workflows/test-plugins.yml to test against popular plugins (datasette-json-api, datasette-export, etc) to catch breaking changes early.

  • [ ] Review datasette/plugins.py and datasette/hookspecs.py to understand plugin system
  • [ ] Create .github/workflows/test-plugins.yml workflow file
  • [ ] Add matrix jobs to test against 3-5 popular official plugins
  • [ ] Configure workflow to run on pull requests and push to main
  • [ ] Document expected behavior in CONTRIBUTING guidelines

Add unit tests for datasette/column_types.py and datasette/default_column_types.py

The column type system (datasette/column_types.py and datasette/default_column_types.py) is core to data rendering but lacks visible dedicated test coverage. These files handle type detection and formatting which directly impact user experience. A contributor could create comprehensive tests/test_column_types.py covering type detection, custom type registration, and rendering edge cases.

  • [ ] Analyze datasette/column_types.py and datasette/default_column_types.py implementations
  • [ ] Create tests/test_column_types.py with test classes for built-in column types
  • [ ] Add tests for type detection with various data values (NULL, numbers, dates, JSON)
  • [ ] Test custom column type registration via plugin system
  • [ ] Add tests for edge cases (empty values, unicode, very large numbers)
  • [ ] Verify >85% coverage of both column_types modules

🌿Good first issues

  • Add unit tests for datasette/blob_renderer.py (file exists but test coverage is sparse); create tests/test_blob_renderer.py covering different MIME types and edge cases.
  • Improve error messaging in datasette/filters.py by adding specific exception classes (currently generic exceptions); document filter syntax in docstrings for user-facing filter parsing.
  • Extend datasette/column_types.py with a new column type handler (e.g., UUID, IP address); add corresponding tests and update default_column_types.py to register it.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • aa84fe0 — Fix for column actions on Mobile Safari, closes #2708 (simonw)
  • 0dc7bb1 — Table headers and column options visible for 0 rows (simonw)
  • b15ce18 — TokenRestrictions.abbreviated(datasette) utility method for creating _r dicts (#2696) (simonw)
  • a6031c9 — Release 1.0a28 (simonw)
  • 1cd53e1 — Datasette.close() method, plus pytest plugin to automatically call it during tests (simonw)
  • 630e557 — Ran black (simonw)
  • b3001c1 — Drop redundant _ds_client global now that ds_client is session-scoped (simonw)
  • c9a7dc9 — Declare ds_client as session-scoped so auto-close plugin spares it (simonw)
  • 03eeeb9 — Docs: auto-close plugin now handles function-scoped fixtures (simonw)
  • ede942a — Fix ruff lints in close-related tests (simonw)

🔒Security observations

  • High · Insecure Docker Build - Arbitrary Version Installation — Dockerfile, line: RUN pip install https://github.com/simonw/datasette/archive/refs/tags/${VERSION}.zip. The Dockerfile uses a build ARG VERSION that is directly interpolated into a GitHub URL for pip install without validation. An attacker who can control the VERSION build argument could install a malicious version of datasette or trigger a Man-in-the-Middle attack. Fix: Validate the VERSION argument against a whitelist of known releases, use signed releases with cryptographic verification, or use pip's built-in hash checking with --require-hashes flag.
  • High · Potential SQL Injection via Raw SQL Queries — datasette/filters.py, datasette/sql_functions.py, datasette/database.py. The codebase includes files like 'datasette/filters.py', 'datasette/sql_functions.py', and SQL-related processing. Without visible ORM usage or parameterized query patterns in the file structure, there's risk of SQL injection if user input is directly concatenated into SQL queries. Fix: Ensure all SQL queries use parameterized queries/prepared statements. Audit files for dynamic SQL construction and replace with parameterized alternatives. Use an ORM or query builder that enforces parameterization.
  • High · Potential Cross-Site Scripting (XSS) via Template Rendering — datasette/templates/ (all .html files). The presence of multiple template files (*.html) that render user-controlled data without obvious escaping mechanisms poses XSS risk. Files like '_table.html', '_facet_results.html' likely render database content that could contain malicious JavaScript. Fix: Ensure all template variables are properly escaped using the template engine's auto-escaping features. Use Content-Security-Policy headers. Audit templates for unescaped variable interpolation, especially {{ }} without filters.
  • Medium · Exposed Port in Docker Container — Dockerfile, line: EXPOSE 8001. The Dockerfile exposes port 8001 without specifying it as internal-only. If this container is deployed without proper network isolation, the service is directly accessible from the internet. Fix: Document port exposure clearly in deployment guides. Use network policies, firewalls, or reverse proxies to restrict access. Consider running behind a reverse proxy with authentication and SSL/TLS termination.
  • Medium · Outdated Python Base Image — Dockerfile, line: FROM python:3.11.0-slim-bullseye as build. The Dockerfile uses python:3.11.0-slim-bullseye, which is significantly outdated (released September 2022). This base image likely contains known vulnerabilities in both Python and system libraries. Fix: Update to the latest stable Python 3.11.x or 3.12.x slim image. Regularly rebuild Docker images to incorporate security patches. Use a image scanning tool in CI/CD pipeline.
  • Medium · Missing Security Headers Configuration — datasette/app.py (likely location for middleware configuration). No visible configuration for security headers (Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Strict-Transport-Security, etc.) in the codebase structure. These are critical for preventing XSS, clickjacking, and MIME-sniffing attacks. Fix: Implement security headers middleware. Add CSP policy, set X-Frame-Options: DENY, X-Content-Type-Options: nosniff, and enable HSTS for HTTPS connections. Provide configurable security header options.
  • Medium · CSRF Protection Token Handling — datasette/csrf.py, datasette/app.py. While 'datasette/csrf.py' exists suggesting CSRF protection is implemented, the file structure doesn't show evidence of CSRF token validation in all state-changing operations. Without comprehensive CSRF protection, the application is vulnerable to cross-site request forgery. Fix: Audit all POST, PUT, DELETE endpoints to ensure CSRF token validation. Use SameSite cookie attribute as a secondary defense. Document CSRF protection strategy in security docs.
  • Medium · NPM Dependency Version Ranges — undefined. The package.json uses caret (^) version specifiers (e.g., '^3.0.0 Fix: undefined

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · simonw/datasette — RepoPilot