Onboarding: abi/screenshot-to-code

Item: abi/screenshot-to-code
Rating: 3
Author: RepoPilot

Generated by RepoPilot · 2026-05-05 · Source

Verdict

WAIT — Solo project — review before adopting

Last commit 1d ago
MIT licensed
Tests present
⚠ Solo or near-solo (2 contributors visible)
⚠ Concentrated ownership — top contributor handles 77% of commits
⚠ No CI workflows detected

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

TL;DR

screenshot-to-code is a web app that accepts a screenshot, mockup, or Figma design and uses multimodal AI (GPT-5, Claude Opus 4.5, Gemini 3) to generate clean, functional frontend code in stacks like HTML+Tailwind, React+Tailwind, Vue+Tailwind, Bootstrap, Ionic, and SVG. It also supports video/screen-recording input to produce interactive prototypes, and optionally calls DALL-E 3 or Flux Schnell (via Replicate) to replace placeholder images with AI-generated ones. Monorepo split into backend/ (Python FastAPI with a structured agent system under backend/agent/ containing providers, tools, codegen, and evals subdirectories) and frontend/ (React+Vite+TypeScript SPA with Zustand for state and CodeMirror for code display). The agent layer (backend/agent/engine.py, runner.py, state.py) orchestrates multi-step LLM calls through provider abstractions in backend/agent/providers/ (openai.py, gemini.py, anthropic/provider.py).

Who it's for

Frontend developers and UI/UX designers who want to rapidly scaffold working code from visual designs without manual translation, and AI/ML engineers exploring multimodal LLM capabilities in a real product context.

Maturity & risk

The repo has substantial code mass (750k+ lines across TypeScript and Python), a structured backend with evals (backend/evals/), pre-commit hooks (.pre-commit-config.yaml), and Docker support — indicating a serious project beyond a toy. It supports multiple production-grade AI providers and has a hosted paid version at screenshottocode.com, suggesting active maintenance. Verdict: actively developed and production-used, though it carries the volatility of depending on rapidly-changing LLM APIs.

Single-maintainer risk is real (@abi), and the project's correctness depends entirely on external LLM APIs (OpenAI, Anthropic, Google Gemini) that change pricing, rate limits, and model names frequently — the README already mentions model version names like 'GPT-5.3' and 'Gemini 3' that may drift. The frontend has many Radix UI + Tailwind dependencies (~20 @radix-ui packages) which add upgrade surface area. No visible CI configuration in the listed files beyond .pre-commit-config.yaml, which is a gap for automated regression detection.

Active areas of work

Recent activity visible in the README includes support for Gemini 3 Flash/Pro and Claude Opus 4.5, suggesting active model version tracking. The agent tools directory (backend/agent/tools/) with definitions.py, parsing.py, runtime.py, and summaries.py indicates ongoing work on a tool-use/agentic code-generation loop beyond simple single-shot prompting.

Get running

git clone https://github.com/abi/screenshot-to-code.git cd screenshot-to-code

Backend

cd backend echo 'OPENAI_API_KEY=sk-your-key' > .env echo 'ANTHROPIC_API_KEY=your-key' >> .env echo 'GEMINI_API_KEY=your-key' >> .env pip install --upgrade poetry poetry install poetry env activate # run the printed source command poetry run uvicorn main:app --reload --port 7001

Frontend (new terminal)

cd frontend yarn yarn dev

Open http://localhost:5173

Daily commands: Backend: cd backend && poetry run uvicorn main:app --reload --port 7001 Frontend: cd frontend && yarn dev Docker (full stack): echo 'OPENAI_API_KEY=sk-your-key' > .env && docker-compose up -d --build

Map of the codebase

backend/agent/engine.py: Core orchestration logic that drives the multi-step LLM code generation loop.
backend/agent/providers/factory.py: Provider factory that selects the correct AI provider (OpenAI/Anthropic/Gemini) at runtime.
backend/agent/providers/base.py: Abstract base class that all AI provider implementations must conform to.
backend/agent/runner.py: Entry point that connects the FastAPI WebSocket endpoint to the agent engine.
backend/codegen/utils.py: Code generation utilities shared across stacks — where output formatting and post-processing happen.
backend/custom_types.py: Defines the supported output stacks and shared type contracts between frontend and backend.
backend/agent/tools/definitions.py: Defines the LLM tool-use schemas (function calling specs) used in the agentic loop.
backend/evals/config.py: Configuration for the evaluation harness used to measure code generation quality.
backend/config.py: Central config loading (API keys, feature flags) from environment variables.
frontend/src: Root of the React frontend — all UI components, Zustand stores, and WebSocket client logic live here.

How to make changes

To add a new AI provider: create a file in backend/agent/providers/ following the pattern of openai.py or gemini.py, register it in backend/agent/providers/factory.py and __init__.py. To add a new output stack (e.g., Angular): look at backend/codegen/utils.py and backend/custom_types.py where stacks are defined. To change the UI: start in frontend/src/ — the main app flow is driven by the component tree wired to Zustand stores. To add evals: see backend/evals/ and backend/codegen/test_utils.py.

Traps & gotchas

The backend WebSocket port defaults to 7001; if you change it, you must update VITE_WS_BACKEND_URL in frontend/.env.local — easy to miss. 2. You need at least one valid API key (OpenAI, Anthropic, or Gemini) in backend/.env before the backend starts accepting requests — missing keys cause silent failures in generation, not startup errors. 3. Poetry environment activation requires running the printed source ... command manually; forgetting this and running uvicorn outside the venv will fail with missing imports. 4. Image generation via DALL-E 3 or Flux requires a separate Replicate API key, which is not mentioned in the primary setup flow. 5. Pre-commit hooks (.pre-commit-config.yaml) must be installed separately with pre-commit install — they are not run automatically on clone.

Concepts to learn

Multimodal LLM prompting — The entire product depends on sending image+text to LLMs via vision APIs — understanding how image tokens are encoded and priced (see backend/agent/providers/pricing.py) is essential for cost and quality tuning.
LLM Tool Use / Function Calling — The agent loop in backend/agent/tools/ uses structured tool definitions (definitions.py) so the LLM can invoke specific actions during code generation rather than producing a single monolithic response.
WebSocket streaming — Generated code is streamed token-by-token from the FastAPI backend to the React frontend over WebSockets (not HTTP), enabling the live-update preview UX — understanding WebSocket lifecycle is required to debug connection issues.
Provider abstraction pattern — backend/agent/providers/base.py + factory.py implement a classic strategy pattern so OpenAI, Anthropic, and Gemini can be swapped at runtime — new contributors must follow this pattern to add models.
Token usage accounting — backend/agent/providers/token_usage.py and pricing.py track per-request token consumption across providers with different pricing models — critical for the hosted paid version's cost management.
CodeMirror 6 editor integration — The frontend uses CodeMirror 6 (not the older CM5) with the @codemirror/lang-html extension for syntax-highlighted, editable code display — the API is significantly different from CM5 and most tutorials target the older version.
Zustand state management — Global frontend state (selected model, generated code history, settings) is managed via Zustand stores rather than Redux or React Context — understanding Zustand's slice pattern is needed to trace data flow in the frontend.

Related repos

emilkowalski/v0 — Vercel's v0 is the closest hosted alternative — AI-driven UI generation from prompts/screenshots targeting React/Tailwind.
tldraw/make-real — Similar screenshot/sketch-to-code concept using tldraw canvas as input and GPT-4V as the backend.
openai/openai-python — Direct dependency — the Python SDK used in backend/agent/providers/openai.py for GPT-4V and GPT-5 API calls.
anthropics/anthropic-sdk-python — Used in backend/agent/providers/anthropic/ for Claude Opus API calls and multimodal image handling.
google-gemini/generative-ai-python — Used in backend/agent/providers/gemini.py for Gemini 3 Flash/Pro multimodal API calls.

PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add unit tests for backend/codegen/utils.py and backend/agent/tools/parsing.py

The repo has a backend/codegen/test_utils.py file (suggesting a test harness exists) and a TESTING.md doc, but there are no visible test files for the core code generation utilities or tool-call parsing logic. These are high-risk, high-churn modules: codegen/utils.py likely does HTML extraction/cleanup and tools/parsing.py handles LLM tool-call responses. Bugs here silently produce bad output. Adding pytest-based unit tests with representative fixture inputs (malformed HTML, partial tool calls, multi-stack outputs) would catch regressions immediately.

[ ] Read backend/codegen/utils.py and enumerate all exported functions (e.g. extract_html_content, cleanup_code, etc.)
[ ] Create backend/codegen/test_codegen_utils.py with pytest parametrize cases covering valid HTML, code wrapped in markdown fences, and edge-case empty strings
[ ] Read backend/agent/tools/parsing.py and identify the tool-call parsing entry points
[ ] Create backend/agent/tools/test_parsing.py with fixtures mimicking raw Anthropic/OpenAI tool-call JSON blobs, including malformed/partial responses
[ ] Wire the new tests into the existing test runner documented in TESTING.md and confirm they pass with pytest backend/

Add a GitHub Actions CI workflow for the Python backend (lint + test)

The repo has .github/ with only funding and issue templates — there is no CI workflow file at all. The backend uses poetry (evidenced by backend/poetry.lock) and pre-commit (.pre-commit-config.yaml), so the toolchain is already defined. Without CI, every PR to the backend can break imports, introduce type errors, or fail the pre-commit hooks silently. A focused workflow running pre-commit, mypy/pyright, and pytest on every push/PR to backend/** would provide an immediate safety net for contributors.

[ ] Create .github/workflows/backend-ci.yml triggered on push and pull_request paths backend/**
[ ] Set up Python with the version pinned in backend/pyproject.toml, install dependencies via poetry install
[ ] Add a step running pre-commit run --all-files using the hooks already defined in backend/.pre-commit-config.yaml
[ ] Add a step running pytest backend/ (initially this will be thin but will grow as tests are added per PR #1 above)
[ ] Add a step running the type checker (mypy or pyright) on backend/ to catch type regressions in typed modules like backend/custom_types.py and backend/agent/providers/types.py
[ ] Document the new CI badge in README.md

Split backend/agent/providers/ — extract Gemini and OpenAI providers into consistent subpackage structure matching the Anthropic provider

The backend/agent/providers/ directory shows an inconsistency: Anthropic has its own subpackage (anthropic/__init__.py, anthropic/image.py, anthropic/provider.py) with separated concerns, but gemini.py and openai.py are single flat files. As more models are added (the README already lists GPT-5.x, Gemini 3, etc.), these flat files will grow unwieldy and make provider-specific logic (image handling, token counting) hard to locate or test. Refactoring Gemini and OpenAI into the same subpackage pattern as Anthropic makes the codebase consistent and maintainable.

[ ] Create backend/agent/providers/openai/ directory with __init__.py, provider.py, and image.py mirroring the structure

Good first issues

Add unit tests for backend/agent/providers/gemini.py — the evals framework exists in backend/evals/ but provider-level unit tests appear absent from the listed files. 2. Add a backend/agent/providers/anthropic/provider.py docstring and usage example — the Anthropic provider has a sub-package structure suggesting complexity that is undocumented compared to the flat openai.py and gemini.py. 3. Create a CONTRIBUTING.md — AGENTS.md, CLAUDE.md, and TESTING.md exist but there is no unified contribution guide explaining the provider pattern, how to add a new stack, or how to run evals locally.

Top contributors

@abi — 77 commits
@claude — 23 commits

Recent commits

698ddfb — Add Lilo sponsor logo to README (#597) (abi)
aaaa838 — Support Gemini API keys from request settings (abi)
5366927 — update model for edits and format for prompt (abi)
1a6f88b — add caching-related tools and remove prompt_cache_key (abi)
5fb885f — set prompt cache retention to 24h for GPT 5.4 (abi)
9e8e245 — Merge branch 'claude/document-image-processing-eqFtJ' (abi)
d227265 — add gpt-5.4 reasoning model support (abi)
b9bdca7 — add openai prompt cache keys (abi)
e2413c4 — Merge branch 'openai-caching' (abi)
2809e84 — remove leftover openai test override logic (abi)

Security observations

Failed to generate security analysis.

LLM-derived; treat as a starting point, not a security audit.

Where to read next

Open issues — current backlog
Recent PRs — what's actively shipping
Source on GitHub

Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.

abi/screenshot-to-code

Embed this verdict

Onboarding doc