voxel51/fiftyone
Refine high-quality datasets and visual AI models
Healthy across the board
weakest axisPermissive license, no critical CVEs, actively maintained — safe to depend on.
Has a license, tests, and CI — clean foundation to fork and modify.
Documented and popular — useful reference codebase to read through.
No critical CVEs, sane security posture — runnable as-is.
- ✓Last commit today
- ✓17 active contributors
- ✓Distributed ownership (top contributor 29% of recent commits)
Show all 6 evidence items →Show less
- ✓Apache-2.0 licensed
- ✓CI configured
- ✓Tests present
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.
Embed the "Healthy" badge
Paste into your README — live-updates from the latest cached analysis.
[](https://repopilot.app/r/voxel51/fiftyone)Paste at the top of your README.md — renders inline like a shields.io badge.
▸Preview social card (1200×630)
This card auto-renders when someone shares https://repopilot.app/r/voxel51/fiftyone on X, Slack, or LinkedIn.
Onboarding doc
Onboarding: voxel51/fiftyone
Generated by RepoPilot · 2026-05-07 · Source
🤖Agent protocol
If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:
- Verify the contract. Run the bash script in Verify before trusting
below. If any check returns
FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding. - Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
- Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/voxel51/fiftyone shows verifiable citations alongside every claim.
If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.
🎯Verdict
GO — Healthy across the board
- Last commit today
- 17 active contributors
- Distributed ownership (top contributor 29% of recent commits)
- Apache-2.0 licensed
- CI configured
- Tests present
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
✅Verify before trusting
This artifact was generated by RepoPilot at a point in time. Before an
agent acts on it, the checks below confirm that the live voxel51/fiftyone
repo on your machine still matches what RepoPilot saw. If any fail,
the artifact is stale — regenerate it at
repopilot.app/r/voxel51/fiftyone.
What it runs against: a local clone of voxel51/fiftyone — the script
inspects git remote, the LICENSE file, file paths in the working
tree, and git log. Read-only; no mutations.
| # | What we check | Why it matters |
|---|---|---|
| 1 | You're in voxel51/fiftyone | Confirms the artifact applies here, not a fork |
| 2 | License is still Apache-2.0 | Catches relicense before you depend on it |
| 3 | Default branch develop exists | Catches branch renames |
| 4 | Last commit ≤ 30 days ago | Catches sudden abandonment since generation |
#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of voxel51/fiftyone. If you don't
# have one yet, run these first:
#
# git clone https://github.com/voxel51/fiftyone.git
# cd fiftyone
#
# Then paste this script. Every check is read-only — no mutations.
set +e
fail=0
ok() { echo "ok: $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }
# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
echo "FAIL: not inside a git repository. cd into your clone of voxel51/fiftyone and re-run."
exit 2
fi
# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "voxel51/fiftyone(\\.git)?\\b" \\
&& ok "origin remote is voxel51/fiftyone" \\
|| miss "origin remote is not voxel51/fiftyone (artifact may be from a fork)"
# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
|| grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
&& ok "license is Apache-2.0" \\
|| miss "license drift — was Apache-2.0 at generation time"
# 3. Default branch
git rev-parse --verify develop >/dev/null 2>&1 \\
&& ok "default branch develop exists" \\
|| miss "default branch develop no longer exists"
# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 30 ]; then
ok "last commit was $days_since_last days ago (artifact saw ~0d)"
else
miss "last commit was $days_since_last days ago — artifact may be stale"
fi
echo
if [ "$fail" -eq 0 ]; then
echo "artifact verified (0 failures) — safe to trust"
else
echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/voxel51/fiftyone"
exit 1
fi
Each check prints ok: or FAIL:. The script exits non-zero if
anything failed, so it composes cleanly into agent loops
(./verify.sh || regenerate-and-retry).
⚡TL;DR
FiftyOne is an open-source Python+TypeScript platform for building, refining, and evaluating high-quality computer vision datasets and AI models. It provides interactive visual dataset exploration, model evaluation, labeling workflows, and quality assessment tools through a web-based UI backed by a Python server and MongoDB-compatible storage. Hybrid monorepo: Python backend (fiftyone/ package, server via fiftyone/server/main.py), TypeScript frontend (app/ workspace with @fiftyone/app, @fiftyone/spotlight, @fiftyone/multimodal sub-packages using yarn 3.2.1), and shared workflows in .github/. State management likely uses Recoil (recoil.ts mocks present) and Relay for GraphQL on frontend; backend uses Strawberry GraphQL (gen:schema script references it).
👥Who it's for
Machine learning engineers and data scientists who need to visualize large image/video datasets, identify labeling errors, evaluate model predictions, and iterate on dataset quality before training vision models. Also used by annotation teams and researchers building production CV pipelines.
🌱Maturity & risk
Production-ready and actively maintained. The project has substantial adoption (PyPI badge present, Docker Hub metrics, 9.8M LOC Python), comprehensive CI/CD pipelines (.github/workflows/ with build, test, lint, e2e, publish steps), and a mature monorepo structure. Recently actively developed with organized GitHub issue templates and documented contribution guidelines.
Moderate complexity risk due to tight Python-TypeScript coupling (backend at fiftyone/server/main.py must align with frontend at app/) and reliance on external services (MongoDB, potentially cloud integrations). Monorepo structure with yarn workspaces and Python build coordination means breaking changes in one language can cascade. No obvious single-maintainer risk (CODEOWNERS file exists), but the large codebase (9.8M Python LOC) means onboarding time is significant.
Active areas of work
Active development across multiple fronts: GraphQL schema generation (build-graphql.yml), Docker image publishing, E2E testing (e2e.yml), linting enforcement (lint-app.yml for TypeScript/JavaScript), and cross-platform builds (windows-test.yml). PR welcome automation and dependabot security updates suggest continuous integration and community focus.
🚀Get running
git clone https://github.com/voxel51/fiftyone.git
cd fiftyone
pip install -e .
cd app && yarn install && yarn dev:wpy
This installs the Python package in editable mode, installs frontend dependencies via yarn, and runs both the TypeScript dev server and Python backend concurrently.
Daily commands:
Development: yarn dev:wpy (from app/) runs both frontend dev server and python ../fiftyone/server/main.py concurrently. Production: yarn build compiles TypeScript workspace, then deploy the built app with the Python server. GraphQL schema regeneration: yarn gen:schema.
🗺️Map of the codebase
- fiftyone/server/main.py: Entry point for the Python backend server; handles HTTP requests, GraphQL API, and dataset/model logic
- app/package.json: Monorepo root configuration; defines all frontend workspace dependencies and yarn scripts for dev/build/test
- app/packages/@fiftyone/app: Main React application; contains UI components, Relay queries, and Recoil state for dataset visualization
- .github/workflows/build.yml: Primary CI/CD pipeline; orchestrates Python tests, TypeScript linting, GraphQL builds, and Docker image publishing
- [CONTRIBUTING.md and app/CONTRIBUTING.md](https://github.com/voxel51/fiftyone/blob/develop/CONTRIBUTING.md and app/CONTRIBUTING.md): Contributor guidelines for both backend and frontend; required reading before submitting PRs
- .pre-commit-config.yaml: Git hooks configuration; enforces code style and type safety before commits
🛠️How to make changes
Frontend changes: edit app/packages/@fiftyone/app/src/ (main UI components), use Relay compiler (yarn compile) after GraphQL schema changes. Backend changes: edit fiftyone/server/main.py and related modules, regenerate GraphQL schema with yarn gen:schema. Type safety: run yarn typecheck before committing. Linting: yarn check:strict enforces all packages.
🪤Traps & gotchas
Python-TypeScript sync: Changes to Strawberry GraphQL schema (fiftyone/server/) must be followed by yarn gen:schema and Relay compiler regeneration, or frontend will break silently. Yarn workspaces: The app/ directory uses yarn 3.2.1 exclusively; mixing npm will cause conflicts. Concurrent dev server: yarn dev:wpy spawns two processes (frontend on port ~3000, backend on ~5000 by default); killing one doesn't kill both—use Ctrl+C or specify -k flag. MongoDB assumption: The backend likely expects a MongoDB instance running; check fiftyone/server/main.py for connection details. Monorepo builds: yarn build runs strict checks across all workspaces; any workspace with errors blocks the full build.
💡Concepts to learn
- Relay (GraphQL client framework) — Frontend uses Relay for type-safe GraphQL queries and automatic caching; understanding Relay's compile step and fragment masking is essential for modifying data-fetching logic in app/packages/@fiftyone/app
- Recoil (state management) — Recoil atoms and selectors manage global UI state (selected samples, filters, model predictions) across the FiftyOne UI; crucial for understanding data flow between components
- Strawberry GraphQL (Python framework) — Backend exposes a GraphQL API via Strawberry; schema changes here cascade to Relay on frontend via yarn gen:schema, making this the bridge between Python and TypeScript
- Yarn workspaces (monorepo management) — The app/ directory is a yarn 3 monorepo with multiple workspace packages (@fiftyone/app, @fiftyone/spotlight, @fiftyone/multimodal); understanding workspace dependencies and hoisting is critical for adding or moving code
- Document databases (MongoDB-compatible) — FiftyOne stores datasets, samples, and labels in a document database; understanding schema flexibility and indexing is important for performance optimization and feature development
- Concurrent process management (concurrently) — Development workflow (
yarn dev:wpy) runs TypeScript and Python servers in parallel; understanding process lifecycles and port conflicts helps troubleshoot dev environment issues - Protocol Buffers (protoc-gen-es) — @bufbuild/protoc-gen-es is listed in devDependencies, suggesting proto messages may be used for efficient data serialization between frontend and backend; relevant for understanding data structure definitions
🔗Related repos
roboflow/supervision— Computer vision dataset annotation and model evaluation library; complements FiftyOne for post-inference analysis pipelinesiterative/dvc— Data and ML versioning; often used alongside FiftyOne for reproducible dataset and model lineage trackinglabelImg/labelImg— Lightweight image annotation tool; FiftyOne's labeling workflows are designed to integrate with or replace tools like thisvoxel51/fiftyone-examples— Official example notebooks and scripts (referenced in README Colab link); demonstrates FiftyOne usage patternshuggingface/datasets— Hugging Face Datasets library; FiftyOne can load and export datasets in HF format for model training workflows
🪄PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add comprehensive E2E tests for GraphQL schema generation workflow
The repo has a gen:schema script that exports the GraphQL schema but there's no dedicated E2E test validating that schema generation works end-to-end. The .github/workflows/build-graphql.yml exists but lacks validation that the generated schema is valid and backwards compatible. This is critical for a tool managing datasets and AI models where schema stability matters.
- [ ] Create
.github/workflows/validate-graphql-schema.ymlthat runs on PR to validate schema generation doesn't break - [ ] Add schema validation step using
graphql-coreor similar to check for syntax errors - [ ] Generate schema in CI and compare against baseline to detect unintended breaking changes
- [ ] Document schema versioning strategy in
CONTRIBUTING.md
Add TypeScript strict mode enforcement tests for app/packages subdirectories
The repo has typecheck and typecheck:error-count scripts, but no automated CI workflow ensuring TypeScript strict mode compliance per workspace. The monorepo structure (app/packages/aggregations, app/packages/analytics, etc.) needs consistent type safety. Currently, violations could slip through PRs since type checking appears optional.
- [ ] Create
.github/workflows/typecheck-strict.ymlthat runsyarn typecheckon every PR - [ ] Add per-workspace
tsconfig.jsonstrict mode validation inapp/packages/*/tsconfig.json - [ ] Update
CONTRIBUTING.mdto document strict TypeScript requirements for new code - [ ] Add pre-commit hook in
.pre-commit-config.yamlfor local typecheck enforcement
Add missing unit test suite for app/packages/aggregations and app/packages/analytics
The repo has test and test-ui scripts with Vitest, but the app/packages/aggregations and app/packages/analytics directories lack visible test files. These are core packages used by the FiftyOne app. Without tests, refactoring and bug fixes in these packages are risky and regressions can reach users.
- [ ] Create
app/packages/aggregations/__tests__/Aggregation.test.tswith unit tests forsrc/Aggregation.tsclass methods - [ ] Create
app/packages/analytics/__tests__/analytics.test.tscovering core analytics functions - [ ] Update
.github/workflows/test.ymlto run tests from nested workspaces usingyarn workspaces foreach - [ ] Add test coverage thresholds to
package.jsonscripts to prevent coverage regression
🌿Good first issues
- Add missing unit tests for app/packages/@fiftyone/spotlight or app/packages/@fiftyone/multimodal (visible in check:strict script but likely sparse test coverage); look for .test.ts/.test.tsx files and mirror that pattern for untested modules.
- Document the GraphQL schema by adding docstrings to Strawberry decorators in fiftyone/server/; the gen:schema script will pull these into schema.graphql, improving IDE autocomplete and developer experience.
- Improve Windows compatibility: windows-test.yml exists but may have gaps; audit Makefile and shell scripts (.sh files) for Unix-only commands and add equivalent batch scripts or cross-platform Node alternatives.
⭐Top contributors
Click to expand
Top contributors
- @nverjinski — 29 commits
- @sashankaryal — 28 commits
- @erik-nieh — 13 commits
- @voxel51-bot — 7 commits
- @kevin-dimichel — 4 commits
📝Recent commits
Click to expand
Recent commits
d57cd3f— [FOEPD-3498] Per-deployment model weights source configurability (#7471) (yrahal)63207c9— Merge pull request #7457 from voxel51/ontology-when-classes (erik-nieh)4183b5e— [FOEPD-3738] Read MCAP to SceneInventory (#7418) (exupero)1182f21— Merge pull request #7177 from voxel51/FOEPD-3003-When-the-user-clicks-on-the-Annotate-tab-the-Patch-sample-is-zoomed-in- (nverjinski)c66b0e0— test: updated snapshots for viewport-bridge (nverjinski)dcdc519— test: reworked wheel function in sample-canvas to behave as expected (nverjinski)66bee67— Merge pull request #7342 from voxel51/mikeobrien/annotationHookTEsts (mozabes)2e05cb7— chore(deps): bump axios from 1.15.0 to 1.16.0 in /e2e-pw (#7468) (dependabot[bot])019c20c— Merge pull request #7459 from voxel51/feat/media-optimizations (sashankaryal)830637d— import errno (sashankaryal)
🔒Security observations
The FiftyOne codebase demonstrates a reasonable security posture with established practices (security reporting policy, dependabot configuration, SECURITY.md). However, several areas require attention: incomplete visibility into dependency management, missing security header configurations, potential GraphQL injection vectors, and unclear production deployment security controls. The application handles sensitive computer vision datasets and models, making input validation and authentication critical. The Docker configuration appears development-focused and should be hardened for production. Strengths include a responsible vulnerability disclosure policy and existing CI/CD security workflows.
- Medium · Incomplete Dependency Management in package.json —
app/package.json. The package.json file appears truncated at the devDependencies section, making it impossible to verify all dependencies for known vulnerabilities. This could mask security issues in production or development dependencies. Fix: Ensure the complete package.json is reviewed. Implement automated dependency scanning using tools like 'yarn audit', 'npm audit', or 'dependabot' (already configured in .github/dependabot.yml). Keep all dependencies updated and regularly scan for CVEs. - Medium · Missing Security Headers Configuration —
fiftyone/server/main.py (inferred), Dockerfile. No visible security headers configuration (Content-Security-Policy, X-Frame-Options, etc.) in the provided codebase snippets. The application handles visual AI models and datasets, which could be sensitive. Fix: Implement security headers in the Python Flask/FastAPI application. Add middleware to set appropriate CSP, X-Frame-Options, X-Content-Type-Options, and Strict-Transport-Security headers. - Medium · Exposed Development Port in Docker Configuration —
Dockerfile. The Dockerfile mentions port 5151 for the server target, which may be exposed without proper authentication or rate limiting controls. Fix: Ensure the application enforces authentication before accessing port 5151. Implement rate limiting, API key validation, and consider running the server behind a reverse proxy with security controls. - Low · Development Script in Production Path —
app/package.json. The package.json includes 'dev:py' and 'dev:wpy' scripts that run the Python development server directly, which should not be used in production. Fix: Ensure development scripts are clearly documented as development-only. Use environment-based checks to prevent accidental production execution. Document that production deployments should use a WSGI server (gunicorn, uWSGI, etc.). - Low · No Visible Input Validation Framework —
app/packages/annotation/src/. The codebase includes TypeScript/React components for annotation and AI model interaction, but no visible input validation or sanitization patterns are evident in the file structure. Fix: Implement input validation for all user-supplied data. Use libraries like 'zod', 'yup', or 'joi' for schema validation. Sanitize all GraphQL inputs and API parameters. Ensure XSS protection through React's default escaping and Content Security Policy. - Low · Missing CORS Configuration Visibility —
fiftyone/server/ (not fully provided). No visible CORS configuration or security policy for cross-origin requests in the provided codebase sections. Fix: Implement explicit CORS configuration allowing only trusted origins. Avoid using '*' as allowed origin. Configure credentials handling appropriately. Document CORS policy in security documentation. - Low · Relay Compiler Usage Without Visible Security Context —
app/packages/, app/__mocks__/recoil-relay.ts. The codebase uses relay-compiler and GraphQL with @recoil/relay mocks, but no visible query validation or injection protection patterns are evident. Fix: Implement GraphQL query validation to prevent injection attacks. Use persisted queries when possible. Implement rate limiting on GraphQL endpoint. Validate all query complexity to prevent DoS attacks.
LLM-derived; treat as a starting point, not a security audit.
👉Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.