RepoPilotOpen in app →

voxel51/fiftyone

Refine high-quality datasets and visual AI models

Healthy

Healthy across the board

weakest axis
Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

  • Last commit today
  • 17 active contributors
  • Distributed ownership (top contributor 29% of recent commits)
Show all 6 evidence items →
  • Apache-2.0 licensed
  • CI configured
  • Tests present

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:
RepoPilot: Healthy
[![RepoPilot: Healthy](https://repopilot.app/api/badge/voxel51/fiftyone)](https://repopilot.app/r/voxel51/fiftyone)

Paste at the top of your README.md — renders inline like a shields.io badge.

Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/voxel51/fiftyone on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: voxel51/fiftyone

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

  1. Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
  2. Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
  3. Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/voxel51/fiftyone shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

  • Last commit today
  • 17 active contributors
  • Distributed ownership (top contributor 29% of recent commits)
  • Apache-2.0 licensed
  • CI configured
  • Tests present

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live voxel51/fiftyone repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/voxel51/fiftyone.

What it runs against: a local clone of voxel51/fiftyone — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in voxel51/fiftyone | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch develop exists | Catches branch renames | | 4 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>voxel51/fiftyone</code></summary>
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of voxel51/fiftyone. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/voxel51/fiftyone.git
#   cd fiftyone
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of voxel51/fiftyone and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "voxel51/fiftyone(\\.git)?\\b" \\
  && ok "origin remote is voxel51/fiftyone" \\
  || miss "origin remote is not voxel51/fiftyone (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify develop >/dev/null 2>&1 \\
  && ok "default branch develop exists" \\
  || miss "default branch develop no longer exists"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/voxel51/fiftyone"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

TL;DR

FiftyOne is an open-source Python+TypeScript platform for building, refining, and evaluating high-quality computer vision datasets and AI models. It provides interactive visual dataset exploration, model evaluation, labeling workflows, and quality assessment tools through a web-based UI backed by a Python server and MongoDB-compatible storage. Hybrid monorepo: Python backend (fiftyone/ package, server via fiftyone/server/main.py), TypeScript frontend (app/ workspace with @fiftyone/app, @fiftyone/spotlight, @fiftyone/multimodal sub-packages using yarn 3.2.1), and shared workflows in .github/. State management likely uses Recoil (recoil.ts mocks present) and Relay for GraphQL on frontend; backend uses Strawberry GraphQL (gen:schema script references it).

👥Who it's for

Machine learning engineers and data scientists who need to visualize large image/video datasets, identify labeling errors, evaluate model predictions, and iterate on dataset quality before training vision models. Also used by annotation teams and researchers building production CV pipelines.

🌱Maturity & risk

Production-ready and actively maintained. The project has substantial adoption (PyPI badge present, Docker Hub metrics, 9.8M LOC Python), comprehensive CI/CD pipelines (.github/workflows/ with build, test, lint, e2e, publish steps), and a mature monorepo structure. Recently actively developed with organized GitHub issue templates and documented contribution guidelines.

Moderate complexity risk due to tight Python-TypeScript coupling (backend at fiftyone/server/main.py must align with frontend at app/) and reliance on external services (MongoDB, potentially cloud integrations). Monorepo structure with yarn workspaces and Python build coordination means breaking changes in one language can cascade. No obvious single-maintainer risk (CODEOWNERS file exists), but the large codebase (9.8M Python LOC) means onboarding time is significant.

Active areas of work

Active development across multiple fronts: GraphQL schema generation (build-graphql.yml), Docker image publishing, E2E testing (e2e.yml), linting enforcement (lint-app.yml for TypeScript/JavaScript), and cross-platform builds (windows-test.yml). PR welcome automation and dependabot security updates suggest continuous integration and community focus.

🚀Get running

git clone https://github.com/voxel51/fiftyone.git
cd fiftyone
pip install -e .
cd app && yarn install && yarn dev:wpy

This installs the Python package in editable mode, installs frontend dependencies via yarn, and runs both the TypeScript dev server and Python backend concurrently.

Daily commands: Development: yarn dev:wpy (from app/) runs both frontend dev server and python ../fiftyone/server/main.py concurrently. Production: yarn build compiles TypeScript workspace, then deploy the built app with the Python server. GraphQL schema regeneration: yarn gen:schema.

🗺️Map of the codebase

  • fiftyone/server/main.py: Entry point for the Python backend server; handles HTTP requests, GraphQL API, and dataset/model logic
  • app/package.json: Monorepo root configuration; defines all frontend workspace dependencies and yarn scripts for dev/build/test
  • app/packages/@fiftyone/app: Main React application; contains UI components, Relay queries, and Recoil state for dataset visualization
  • .github/workflows/build.yml: Primary CI/CD pipeline; orchestrates Python tests, TypeScript linting, GraphQL builds, and Docker image publishing
  • [CONTRIBUTING.md and app/CONTRIBUTING.md](https://github.com/voxel51/fiftyone/blob/develop/CONTRIBUTING.md and app/CONTRIBUTING.md): Contributor guidelines for both backend and frontend; required reading before submitting PRs
  • .pre-commit-config.yaml: Git hooks configuration; enforces code style and type safety before commits

🛠️How to make changes

Frontend changes: edit app/packages/@fiftyone/app/src/ (main UI components), use Relay compiler (yarn compile) after GraphQL schema changes. Backend changes: edit fiftyone/server/main.py and related modules, regenerate GraphQL schema with yarn gen:schema. Type safety: run yarn typecheck before committing. Linting: yarn check:strict enforces all packages.

🪤Traps & gotchas

Python-TypeScript sync: Changes to Strawberry GraphQL schema (fiftyone/server/) must be followed by yarn gen:schema and Relay compiler regeneration, or frontend will break silently. Yarn workspaces: The app/ directory uses yarn 3.2.1 exclusively; mixing npm will cause conflicts. Concurrent dev server: yarn dev:wpy spawns two processes (frontend on port ~3000, backend on ~5000 by default); killing one doesn't kill both—use Ctrl+C or specify -k flag. MongoDB assumption: The backend likely expects a MongoDB instance running; check fiftyone/server/main.py for connection details. Monorepo builds: yarn build runs strict checks across all workspaces; any workspace with errors blocks the full build.

💡Concepts to learn

  • Relay (GraphQL client framework) — Frontend uses Relay for type-safe GraphQL queries and automatic caching; understanding Relay's compile step and fragment masking is essential for modifying data-fetching logic in app/packages/@fiftyone/app
  • Recoil (state management) — Recoil atoms and selectors manage global UI state (selected samples, filters, model predictions) across the FiftyOne UI; crucial for understanding data flow between components
  • Strawberry GraphQL (Python framework) — Backend exposes a GraphQL API via Strawberry; schema changes here cascade to Relay on frontend via yarn gen:schema, making this the bridge between Python and TypeScript
  • Yarn workspaces (monorepo management) — The app/ directory is a yarn 3 monorepo with multiple workspace packages (@fiftyone/app, @fiftyone/spotlight, @fiftyone/multimodal); understanding workspace dependencies and hoisting is critical for adding or moving code
  • Document databases (MongoDB-compatible) — FiftyOne stores datasets, samples, and labels in a document database; understanding schema flexibility and indexing is important for performance optimization and feature development
  • Concurrent process management (concurrently) — Development workflow (yarn dev:wpy) runs TypeScript and Python servers in parallel; understanding process lifecycles and port conflicts helps troubleshoot dev environment issues
  • Protocol Buffers (protoc-gen-es) — @bufbuild/protoc-gen-es is listed in devDependencies, suggesting proto messages may be used for efficient data serialization between frontend and backend; relevant for understanding data structure definitions
  • roboflow/supervision — Computer vision dataset annotation and model evaluation library; complements FiftyOne for post-inference analysis pipelines
  • iterative/dvc — Data and ML versioning; often used alongside FiftyOne for reproducible dataset and model lineage tracking
  • labelImg/labelImg — Lightweight image annotation tool; FiftyOne's labeling workflows are designed to integrate with or replace tools like this
  • voxel51/fiftyone-examples — Official example notebooks and scripts (referenced in README Colab link); demonstrates FiftyOne usage patterns
  • huggingface/datasets — Hugging Face Datasets library; FiftyOne can load and export datasets in HF format for model training workflows

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add comprehensive E2E tests for GraphQL schema generation workflow

The repo has a gen:schema script that exports the GraphQL schema but there's no dedicated E2E test validating that schema generation works end-to-end. The .github/workflows/build-graphql.yml exists but lacks validation that the generated schema is valid and backwards compatible. This is critical for a tool managing datasets and AI models where schema stability matters.

  • [ ] Create .github/workflows/validate-graphql-schema.yml that runs on PR to validate schema generation doesn't break
  • [ ] Add schema validation step using graphql-core or similar to check for syntax errors
  • [ ] Generate schema in CI and compare against baseline to detect unintended breaking changes
  • [ ] Document schema versioning strategy in CONTRIBUTING.md

Add TypeScript strict mode enforcement tests for app/packages subdirectories

The repo has typecheck and typecheck:error-count scripts, but no automated CI workflow ensuring TypeScript strict mode compliance per workspace. The monorepo structure (app/packages/aggregations, app/packages/analytics, etc.) needs consistent type safety. Currently, violations could slip through PRs since type checking appears optional.

  • [ ] Create .github/workflows/typecheck-strict.yml that runs yarn typecheck on every PR
  • [ ] Add per-workspace tsconfig.json strict mode validation in app/packages/*/tsconfig.json
  • [ ] Update CONTRIBUTING.md to document strict TypeScript requirements for new code
  • [ ] Add pre-commit hook in .pre-commit-config.yaml for local typecheck enforcement

Add missing unit test suite for app/packages/aggregations and app/packages/analytics

The repo has test and test-ui scripts with Vitest, but the app/packages/aggregations and app/packages/analytics directories lack visible test files. These are core packages used by the FiftyOne app. Without tests, refactoring and bug fixes in these packages are risky and regressions can reach users.

  • [ ] Create app/packages/aggregations/__tests__/Aggregation.test.ts with unit tests for src/Aggregation.ts class methods
  • [ ] Create app/packages/analytics/__tests__/analytics.test.ts covering core analytics functions
  • [ ] Update .github/workflows/test.yml to run tests from nested workspaces using yarn workspaces foreach
  • [ ] Add test coverage thresholds to package.json scripts to prevent coverage regression

🌿Good first issues

  • Add missing unit tests for app/packages/@fiftyone/spotlight or app/packages/@fiftyone/multimodal (visible in check:strict script but likely sparse test coverage); look for .test.ts/.test.tsx files and mirror that pattern for untested modules.
  • Document the GraphQL schema by adding docstrings to Strawberry decorators in fiftyone/server/; the gen:schema script will pull these into schema.graphql, improving IDE autocomplete and developer experience.
  • Improve Windows compatibility: windows-test.yml exists but may have gaps; audit Makefile and shell scripts (.sh files) for Unix-only commands and add equivalent batch scripts or cross-platform Node alternatives.

Top contributors

Click to expand

📝Recent commits

Click to expand
  • d57cd3f — [FOEPD-3498] Per-deployment model weights source configurability (#7471) (yrahal)
  • 63207c9 — Merge pull request #7457 from voxel51/ontology-when-classes (erik-nieh)
  • 4183b5e — [FOEPD-3738] Read MCAP to SceneInventory (#7418) (exupero)
  • 1182f21 — Merge pull request #7177 from voxel51/FOEPD-3003-When-the-user-clicks-on-the-Annotate-tab-the-Patch-sample-is-zoomed-in- (nverjinski)
  • c66b0e0 — test: updated snapshots for viewport-bridge (nverjinski)
  • dcdc519 — test: reworked wheel function in sample-canvas to behave as expected (nverjinski)
  • 66bee67 — Merge pull request #7342 from voxel51/mikeobrien/annotationHookTEsts (mozabes)
  • 2e05cb7 — chore(deps): bump axios from 1.15.0 to 1.16.0 in /e2e-pw (#7468) (dependabot[bot])
  • 019c20c — Merge pull request #7459 from voxel51/feat/media-optimizations (sashankaryal)
  • 830637d — import errno (sashankaryal)

🔒Security observations

The FiftyOne codebase demonstrates a reasonable security posture with established practices (security reporting policy, dependabot configuration, SECURITY.md). However, several areas require attention: incomplete visibility into dependency management, missing security header configurations, potential GraphQL injection vectors, and unclear production deployment security controls. The application handles sensitive computer vision datasets and models, making input validation and authentication critical. The Docker configuration appears development-focused and should be hardened for production. Strengths include a responsible vulnerability disclosure policy and existing CI/CD security workflows.

  • Medium · Incomplete Dependency Management in package.json — app/package.json. The package.json file appears truncated at the devDependencies section, making it impossible to verify all dependencies for known vulnerabilities. This could mask security issues in production or development dependencies. Fix: Ensure the complete package.json is reviewed. Implement automated dependency scanning using tools like 'yarn audit', 'npm audit', or 'dependabot' (already configured in .github/dependabot.yml). Keep all dependencies updated and regularly scan for CVEs.
  • Medium · Missing Security Headers Configuration — fiftyone/server/main.py (inferred), Dockerfile. No visible security headers configuration (Content-Security-Policy, X-Frame-Options, etc.) in the provided codebase snippets. The application handles visual AI models and datasets, which could be sensitive. Fix: Implement security headers in the Python Flask/FastAPI application. Add middleware to set appropriate CSP, X-Frame-Options, X-Content-Type-Options, and Strict-Transport-Security headers.
  • Medium · Exposed Development Port in Docker Configuration — Dockerfile. The Dockerfile mentions port 5151 for the server target, which may be exposed without proper authentication or rate limiting controls. Fix: Ensure the application enforces authentication before accessing port 5151. Implement rate limiting, API key validation, and consider running the server behind a reverse proxy with security controls.
  • Low · Development Script in Production Path — app/package.json. The package.json includes 'dev:py' and 'dev:wpy' scripts that run the Python development server directly, which should not be used in production. Fix: Ensure development scripts are clearly documented as development-only. Use environment-based checks to prevent accidental production execution. Document that production deployments should use a WSGI server (gunicorn, uWSGI, etc.).
  • Low · No Visible Input Validation Framework — app/packages/annotation/src/. The codebase includes TypeScript/React components for annotation and AI model interaction, but no visible input validation or sanitization patterns are evident in the file structure. Fix: Implement input validation for all user-supplied data. Use libraries like 'zod', 'yup', or 'joi' for schema validation. Sanitize all GraphQL inputs and API parameters. Ensure XSS protection through React's default escaping and Content Security Policy.
  • Low · Missing CORS Configuration Visibility — fiftyone/server/ (not fully provided). No visible CORS configuration or security policy for cross-origin requests in the provided codebase sections. Fix: Implement explicit CORS configuration allowing only trusted origins. Avoid using '*' as allowed origin. Configure credentials handling appropriately. Document CORS policy in security documentation.
  • Low · Relay Compiler Usage Without Visible Security Context — app/packages/, app/__mocks__/recoil-relay.ts. The codebase uses relay-compiler and GraphQL with @recoil/relay mocks, but no visible query validation or injection protection patterns are evident. Fix: Implement GraphQL query validation to prevent injection attacks. Use persisted queries when possible. Implement rate limiting on GraphQL endpoint. Validate all query complexity to prevent DoS attacks.

LLM-derived; treat as a starting point, not a security audit.


Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

Healthy signals · voxel51/fiftyone — RepoPilot