RepoPilotOpen in app →

apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

WAIT

Mixed signals — read the receipts

  • Last commit today
  • 5 active contributors
  • Apache-2.0 licensed
  • CI configured
  • Tests present
  • Small team — 5 top contributors
  • Concentrated ownership — top contributor handles 64% of commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Embed this verdict

[![RepoPilot: WAIT](https://repopilot.app/api/badge/apache/superset)](https://repopilot.app/r/apache/superset)

Paste into your README — the badge live-updates from the latest cached analysis.

Onboarding doc

Onboarding: apache/superset

Generated by RepoPilot · 2026-05-05 · Source

Verdict

WAIT — Mixed signals — read the receipts

  • Last commit today
  • 5 active contributors
  • Apache-2.0 licensed
  • CI configured
  • Tests present
  • ⚠ Small team — 5 top contributors
  • ⚠ Concentrated ownership — top contributor handles 64% of commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

Apache Superset is a self-hosted business intelligence and data exploration platform built on Flask (Python) and React/TypeScript. It lets users connect to dozens of SQL databases, build interactive charts via a no-code UI or SQL Lab, and assemble those charts into shareable dashboards. The core problem it solves is giving organizations an open-source alternative to Tableau/Looker with full control over their data stack. Monorepo with a Python Flask/FAB backend under superset/ (models, views, APIs, CLI) and a React/TypeScript frontend under superset-frontend/src/ (Redux store, chart plugins under packages/superset-ui-, Explore and Dashboard views). Chart plugin packages live in superset-frontend/packages/ as independently versioned @superset-ui/ packages.

Who it's for

Data engineers, BI developers, and data analysts at companies who need to self-host a Tableau/Looker-style tool on their own infrastructure. Contributors are typically Python/Flask backend engineers or TypeScript/React frontend engineers working on the visualization layer, chart plugins, or database connectors.

Maturity & risk

Superset is a top-level Apache Software Foundation project with 60,000+ GitHub stars, a long CI pipeline defined in .github/actions (Python unit tests, frontend tests, Docker builds), and active commit history visible in the repo. It is considered production-ready and is deployed at companies like Airbnb, Twitter, and Netflix. The SECURITY.md and detailed CODEOWNERS file indicate mature governance.

The repo has a large open issue backlog typical of high-profile OSS projects and a broad surface area across 50+ database connectors that creates maintenance burden. The dual-stack architecture (heavy Python backend + large TypeScript frontend, 18M+ TS lines) means changes can require coordinated work across both layers. Dependency on SQLAlchemy dialect ecosystem means database connector breakage is a recurring risk.

Active areas of work

Active work is visible on a JS-to-TS migration project documented at .claude/projects/js-to-ts/PROJECT.md and .claude/commands/js-to-ts.md, converting remaining JavaScript files to TypeScript. The .cursor/rules/dev-standard.mdc suggests active enforcement of new coding standards. The .devcontainer/with-mcp/ directory indicates recent work integrating MCP (Model Context Protocol) tooling into the dev environment.

Get running

git clone https://github.com/apache/superset.git && cd superset

Backend

python -m venv venv && source venv/bin/activate pip install -e '.[development]' superset db upgrade && superset init superset load-examples # optional sample data

Frontend (separate terminal)

cd superset-frontend && npm install && npm run dev-server

Or use devcontainer: open in VS Code, reopen in container via .devcontainer/devcontainer.json

Daily commands:

Dev server (frontend hot reload on :9000, proxies API to Flask on :8088)

cd superset-frontend && npm run dev-server

Flask backend

FLASK_ENV=development superset run -p 8088 --with-threads --reload --debugger

Or via Docker Compose (full stack):

docker compose -f docker-compose-non-dev.yml up

Map of the codebase

  • .github/CODEOWNERS — Defines ownership boundaries for the entire repo, dictating who must review every PR touching any given file or directory.
  • .pre-commit-config.yaml — Enforces all code quality gates (linting, formatting, type checks) that every contributor's changes must pass before merge.
  • .devcontainer/devcontainer.json — Primary dev container configuration that bootstraps the full Superset development environment for all contributors.
  • .devcontainer/setup-dev.sh — Shell script that installs all Python and Node dependencies and initializes the database, making it the canonical dev setup entrypoint.
  • .github/copilot-instructions.md — Documents repo-specific coding conventions and architecture patterns that all AI-assisted and human contributors must follow.
  • .cursor/rules/dev-standard.mdc — Cursor IDE rules encoding the project's development standards and patterns, serving as living architectural documentation.
  • AGENTS.md — Describes agent/automation conventions for this repo, critical for understanding how agentic workflows interact with the codebase.

How to make changes

Add a new GitHub Actions CI workflow

  1. Create a new YAML workflow file following the naming convention of existing workflows (e.g., superset-<feature>.yml). (.github/workflows/superset-python-unittest.yml)
  2. Reference the reusable setup-backend action to install Python dependencies consistently. (.github/actions/setup-backend/action.yml)
  3. Add the new workflow to CODEOWNERS so the appropriate team is notified of changes. (.github/CODEOWNERS)
  4. If the workflow needs Docker, use the setup-docker composite action for consistent image building. (.github/actions/setup-docker/action.yml)

Add a new pre-commit code quality hook

  1. Add the new hook repository and hook id to the repos list in .pre-commit-config.yaml following existing hook patterns. (.pre-commit-config.yaml)
  2. If the hook is Python-specific, verify it aligns with PyLint rules to avoid conflicts. (.pylintrc)
  3. Update .codecov.yml if the hook affects coverage calculation or reporting thresholds. (.codecov.yml)

Onboard a new developer to the project

  1. Open the devcontainer using VS Code Remote Containers; the devcontainer.json defines all required services and extensions. (.devcontainer/devcontainer.json)
  2. The setup-dev.sh script runs automatically to install Python/Node deps and initialize the database. (.devcontainer/setup-dev.sh)
  3. Run start-superset.sh to launch the development server after environment setup. (.devcontainer/start-superset.sh)
  4. Review dev-standard.mdc for coding conventions and architectural patterns before writing code. (.cursor/rules/dev-standard.mdc)

Add a new release artifact or release step

  1. Add the new release step to the release.yml workflow, following the existing job dependency chain. (.github/workflows/release.yml)
  2. If the release involves tagging, update tag-release.yml to handle the new artifact type. (.github/workflows/tag-release.yml)
  3. Update CHANGELOG.md conventions if the new artifact type requires a new changelog section. (CHANGELOG.md)
  4. Update CODEOWNERS to assign the release team as owners of any new release-related files. (.github/CODEOWNERS)

Why these technologies

  • Python / Flask — Flask's lightweight, extensible architecture suits Superset's plugin-heavy database connector model, and Python's data science ecosystem (SQLAlchemy, pandas) enables native SQL execution and data processing.
  • React / TypeScript — React's component model supports Superset's complex interactive chart and dashboard UI, while TypeScript (being actively migrated to from JS) provides type safety for the large frontend codebase.
  • GitHub Actions — Native GitHub integration with matrix builds enables parallel testing across Python versions and database backends without external CI infrastructure.
  • Docker / Dev Containers — Containerized development eliminates environment inconsistencies across the large contributor base and enables reproducible CI builds matching local development.
  • Apache Software Foundation governance — ASF provides legal, branding, and infrastructure support for enterprise adoption while mandating Apache 2.0 licensing that protects both contributors and users.

Trade-offs already made

  • Incremental JS-to-TypeScript migration via automated agents

    • Why: The frontend codebase was originally JavaScript; migrating all at once would be too disruptive, so a phased agent-assisted approach was chosen.
    • Consequence: Coexistence of .js and .ts files creates mixed-type imports and inconsistent IDE experiences during the transition period.
  • DevContainer-first developer experience

    • Why: Eliminates 'works on my machine' problems across a global contributor base with diverse OS environments.
    • Consequence: Contributors without Docker or VS Code face a higher barrier to entry; initial container build time is 3-5 minutes.
  • Pre-commit hooks for code quality enforcement

    • Why: Catches issues locally before CI, reducing wasted CI minutes and speeding up the review cycle.
    • Consequence: undefined

Traps & gotchas

  1. The frontend dev server (port 9000) must proxy to a running Flask instance (port 8088) — running only one side gives you a blank or broken UI. 2) superset init must be run after superset db upgrade to create default roles/perms or login will fail. 3) MAPBOX_API_KEY env var is required for map-based chart types to render. 4) The Jinja2 templating in SQL queries (superset/jinja_context.py) means raw SQL in SQL Lab can behave differently than expected. 5) Flask-AppBuilder's role/permission system auto-generates permissions on startup — adding a new API endpoint without calling superset sync-tags can leave it inaccessible.

Architecture

Concepts to learn

  • Flask-AppBuilder (FAB) RBAC — All authentication, authorization, roles, and permissions in Superset are managed by FAB — understanding its permission model is essential before touching any API endpoint or view.
  • SQLAlchemy Engine Specs — Superset abstracts database-specific SQL behavior through engine spec classes in superset/db_engine_specs/ — this is the pattern for adding or fixing database connector behavior.
  • Jinja2 SQL Templating — Superset allows Jinja2 macros inside SQL queries (e.g., {{ filter_values() }}) which introduces a server-side template rendering step before query execution — a common source of confusion and security considerations.
  • Superset Chart Plugin Architecture — Each chart type is a self-contained @superset-ui/* plugin package with a defined transformProps contract between the query response and the React component — understanding this interface is required to build or modify any chart.
  • Async Query Execution with Celery — Long-running SQL queries are offloaded to Celery workers and results cached in Redis — this distributed execution model affects how query state, errors, and results are tracked in the frontend.
  • Row Level Security (RLS) — Superset supports data-level access control by injecting WHERE clause filters per user/role via RLS rules — critical for multi-tenant deployments and a non-obvious layer between the UI and actual query execution.
  • Apache ECharts declarative chart spec — Modern Superset charts render via ECharts' JSON option object rather than imperative D3 — transformProps functions in chart plugins produce this spec, so understanding ECharts' option schema is essential for chart work.

Related repos

  • metabase/metabase — Direct open-source alternative BI tool in the same self-hosted analytics space, built on Clojure/JavaScript instead of Python/React.
  • getredash/redash — Another open-source data visualization and SQL querying tool that solves the same self-hosted BI problem with a simpler feature set.
  • apache-superset/superset-ui — The upstream home of @superset-ui/* chart plugin packages before they were consolidated into the main monorepo — useful for historical context on plugin architecture.
  • preset-io/superset-embedded-sdk — Official SDK for embedding Superset dashboards in external applications, a common use case for Superset deployments.
  • airbnb/airflow — Airbnb open-sourced both Superset and Airflow; Airflow is commonly used alongside Superset in the same data stack for pipeline orchestration feeding Superset dashboards.

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Convert .claude/projects/js-to-ts/AGENT.md automation to a GitHub Actions workflow

The repo has a .claude/projects/js-to-ts/ directory with AGENT.md, COORDINATOR.md, and PROJECT.md indicating an ongoing JS-to-TS migration effort. There is no corresponding .github/workflows/ file to automate enforcement (e.g., blocking new .js files in superset-frontend/src, reporting migration progress). Adding a CI workflow would make the migration self-reinforcing and trackable for all contributors.

  • [ ] Read .claude/projects/js-to-ts/PROJECT.md and AGENT.md to understand the migration scope and rules
  • [ ] Create .github/workflows/js-to-ts-guard.yml that runs on pull_request targeting files in superset-frontend/src/**
  • [ ] Add a step using git diff --name-only to detect any newly added .js or .jsx files (excluding test fixtures and config files like babel.config.js)
  • [ ] Fail the workflow with a descriptive error message linking to .claude/projects/js-to-ts/PROJECT.md if new JS files are introduced
  • [ ] Add an optional step that posts a PR comment summarizing the current .js file count via a simple find superset-frontend/src -name '*.js' | wc -l badge/metric
  • [ ] Update .github/PULL_REQUEST_TEMPLATE.md with a checkbox item confirming new files are TypeScript

Add integration tests for the ephemeral environment PR workflows

The repo has .github/workflows/ephemeral-env.yml and .github/workflows/ephemeral-env-pr-close.yml but .github/workflows/github-action-validator.yml only references a shell script (github-action-validator.sh). The ephemeral env workflows involve ECS task definitions (.github/workflows/ecs-task-definition.json) and are high-risk — a misconfiguration could leak credentials or leave cloud resources running. Adding a validation job using action-validator or actionlint specifically targeting these files would catch regressions early.

  • [ ] Inspect .github/workflows/github-action-validator.sh to understand what is currently validated and what is excluded
  • [ ] Install actionlint (https://github.com/rhysd/actionlint) in the existing .github/workflows/github-action-validator.yml workflow
  • [ ] Add explicit actionlint checks for ephemeral-env.yml and ephemeral-env-pr-close.yml with the -shellcheck flag enabled
  • [ ] Add a JSON schema validation step for .github/workflows/ecs-task-definition.json using ajv-cli against the official AWS ECS task definition schema
  • [ ] Add a test in .github/workflows/github-action-validator.yml that dry-runs the cancel_duplicates.yml and ephemeral-env-pr-close.yml trigger conditions to verify if: expressions are syntactically valid
  • [ ] Document the validation approach in .github/workflows/ with a short README.md

Add a check-db-migration-conflict pre-commit hook and document it in the devcontainer setup

The repo already has .github/workflows/check_db_migration_confict.yml (note the existing typo in the filename) as a CI check, but new contributors running locally via .devcontainer/ will only discover migration conflicts after pushing. The .devcontainer/setup-dev.sh and .devcontainer/start-superset.sh scripts don't install or mention a pre-commit hook for this. Adding a local hook and fixing the workflow filename typo would prevent a common contributor pain point.

  • [ ] Read .github/workflows/check_db_migration_confict.yml to extract the exact commands used to detect Alemb

Good first issues

  1. Many files in superset-frontend/src/ still have .js or .jsx extensions per the active JS-to-TS project at .claude/projects/js-to-ts/ — pick any unconverted component file and migrate it with proper TypeScript types. 2) The db_engine_specs/ directory has engine specs with missing or incomplete get_datatype() implementations — adding type mappings for an underserved database (e.g., Trino, Dremio) is a well-scoped backend contribution. 3) Several .github/ISSUE_TEMPLATE/ entries are in older markdown format while others use the newer .yml schema — converting cosmetic.md to a structured yml template with dropdowns would be a clean, low-risk improvement.

Top contributors

Recent commits

  • d618837 — chore(deps): bump docusaurus-theme-openapi-docs from 5.0.1 to 5.0.2 in /docs (#39846) (dependabot[bot])
  • 2edae16 — chore(deps): bump baseline-browser-mapping from 2.10.24 to 2.10.27 in /docs (#39848) (dependabot[bot])
  • e802072 — chore(deps-dev): bump eslint from 10.2.1 to 10.3.0 in /superset-websocket (#39843) (dependabot[bot])
  • 7695501 — chore: bump shillelagh to 1.4.4 (#39870) (betodealmeida)
  • 5325b87 — fix(clickhouse): prevent expensive table scan (#39867) (betodealmeida)
  • e763186 — fix(helm): allow chart to work out-of-the-box with legacy Bitnami images (#39839) (hainenber)
  • c2725e8 — fix(markdown): Allow "target" attribute (#39868) (sfirke)
  • 2f60572 — chore(deps-dev): bump globals from 17.5.0 to 17.6.0 in /superset-websocket (#39844) (dependabot[bot])
  • ebb02d0 — chore(deps): bump @swc/core from 1.15.32 to 1.15.33 in /docs (#39845) (dependabot[bot])
  • 319b8a1 — chore(deps-dev): bump globals from 17.5.0 to 17.6.0 in /docs (#39847) (dependabot[bot])

Security observations

  • High · Jinja2 Template Injection Risk — Dependencies file (jinja2 listed), superset SQL templating engine. The dependency file includes 'jinja2', which is a powerful templating engine. Apache Superset is known to use Jinja2 for SQL templating (e.g., in SQL Lab and dataset queries). If user-supplied input is rendered via Jinja2 without proper sandboxing or escaping, it can lead to Server-Side Template Injection (SSTI), allowing arbitrary code execution on the server. Fix: Ensure Jinja2 is used in a sandboxed environment (SandboxedEnvironment). Strictly validate and sanitize all user inputs before passing them to Jinja2 templates. Audit all template rendering paths to confirm untrusted input is never directly interpolated.
  • High · SQL Injection via Dynamic SQL Templating — SQL Lab / Dataset query engine, Jinja2 SQL templates. Apache Superset supports Jinja2-based SQL templating in queries and datasets. If user-controlled parameters are interpolated directly into SQL queries without parameterization, this introduces SQL injection vulnerabilities. The 'cherrytree' dependency (a tree data structure library) combined with dynamic query construction could be part of query building logic that is vulnerable. Fix: Use parameterized queries and prepared statements wherever possible. Enforce strict allowlisting of template variables. Limit what Jinja2 macros and filters are available in SQL templates. Audit all database query construction paths.
  • High · Potential XSS via Dashboard/Chart Rendering — Frontend components (superset-frontend/), Markdown chart types, dashboard rendering. Superset is a data visualization platform that renders user-defined chart titles, dataset names, custom HTML in markdown components, and dashboard metadata. If user-supplied content is rendered without proper sanitization (e.g., via dangerouslySetInnerHTML in React components), this could lead to stored or reflected Cross-Site Scripting (XSS) attacks. Fix: Audit all React components for use of 'dangerouslySetInnerHTML'. Use a robust HTML sanitization library (e.g., DOMPurify) before rendering any user-supplied HTML. Implement a strict Content Security Policy (CSP) header.
  • High · Docker Compose Used for Potentially Production-Like Deployments — docker-compose.yml. The docker-compose.yml file contains a comment explicitly warning 'We don't support docker compose for production environments,' yet the file exists and is available. Developers or operators may still use it in production or staging environments, potentially exposing insecure defaults such as default credentials, exposed ports, missing TLS, and no resource limits. Fix: Clearly document and enforce that docker-compose is for development only. Remove any default credentials or hardcoded secrets from docker-compose files. Add explicit warnings in CI/CD pipelines to prevent docker-compose deployments to production. Ensure production deployments use hardened Helm charts or equivalent.
  • Medium · Cherrytree Dependency - Unverified Security Posture — Dependencies file (cherrytree). The 'cherrytree' package is listed as a dependency. This is a relatively obscure Python library (tree data structure). Its security posture, maintenance status, and potential for supply chain vulnerabilities are unclear. Obscure or unmaintained dependencies can introduce vulnerabilities through transitive dependencies or lack of security patching. Fix: Verify that 'cherrytree' is actively maintained and has no known CVEs. Pin the dependency to a specific version. Run regular audits using tools like 'pip-audit' or 'safety'. Consider replacing with a well-maintained alternative if cherrytree is not regularly updated.
  • Medium · Missing or Incomplete Content Security Policy (CSP) — Application-wide HTTP response headers. A data visualization platform like Superset, which renders user-defined content, charts, and embeds external resources, is at high risk of XSS and data injection without a strong Content Security Policy. There is no evidence in the reviewed file structure of enforced CSP headers, which could allow execution of injected scripts. Fix: Implement a strict Content Security Policy header via Flask-Talisman or equivalent middleware. Restrict 'script-src', 'object-src', and 'base-uri' directives. Avoid 'unsafe-inline' and 'unsafe-eval'. Regularly test CSP coverage.
  • Medium · undefined — undefined. undefined Fix: undefined

LLM-derived; treat as a starting point, not a security audit.

Where to read next


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

WAIT · apache/superset — RepoPilot Verdict