abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Solo project — review before adopting
- ✓Last commit 1d ago
- ✓MIT licensed
- ✓Tests present
- ⚠Solo or near-solo (2 contributors visible)
- ⚠Concentrated ownership — top contributor handles 77% of commits
- ⚠No CI workflows detected
Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests
Embed this verdict
[](https://repopilot.app/r/abi/screenshot-to-code)Paste into your README — the badge live-updates from the latest cached analysis.
Onboarding doc
Onboarding: abi/screenshot-to-code
Generated by RepoPilot · 2026-05-05 · Source
Verdict
WAIT — Solo project — review before adopting
- Last commit 1d ago
- MIT licensed
- Tests present
- ⚠ Solo or near-solo (2 contributors visible)
- ⚠ Concentrated ownership — top contributor handles 77% of commits
- ⚠ No CI workflows detected
<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>
TL;DR
screenshot-to-code is a web app that accepts a screenshot, mockup, or Figma design and uses multimodal AI (GPT-5, Claude Opus 4.5, Gemini 3) to generate clean, functional frontend code in stacks like HTML+Tailwind, React+Tailwind, Vue+Tailwind, Bootstrap, Ionic, and SVG. It also supports video/screen-recording input to produce interactive prototypes, and optionally calls DALL-E 3 or Flux Schnell (via Replicate) to replace placeholder images with AI-generated ones. Monorepo split into backend/ (Python FastAPI with a structured agent system under backend/agent/ containing providers, tools, codegen, and evals subdirectories) and frontend/ (React+Vite+TypeScript SPA with Zustand for state and CodeMirror for code display). The agent layer (backend/agent/engine.py, runner.py, state.py) orchestrates multi-step LLM calls through provider abstractions in backend/agent/providers/ (openai.py, gemini.py, anthropic/provider.py).
Who it's for
Frontend developers and UI/UX designers who want to rapidly scaffold working code from visual designs without manual translation, and AI/ML engineers exploring multimodal LLM capabilities in a real product context.
Maturity & risk
The repo has substantial code mass (750k+ lines across TypeScript and Python), a structured backend with evals (backend/evals/), pre-commit hooks (.pre-commit-config.yaml), and Docker support — indicating a serious project beyond a toy. It supports multiple production-grade AI providers and has a hosted paid version at screenshottocode.com, suggesting active maintenance. Verdict: actively developed and production-used, though it carries the volatility of depending on rapidly-changing LLM APIs.
Single-maintainer risk is real (@abi), and the project's correctness depends entirely on external LLM APIs (OpenAI, Anthropic, Google Gemini) that change pricing, rate limits, and model names frequently — the README already mentions model version names like 'GPT-5.3' and 'Gemini 3' that may drift. The frontend has many Radix UI + Tailwind dependencies (~20 @radix-ui packages) which add upgrade surface area. No visible CI configuration in the listed files beyond .pre-commit-config.yaml, which is a gap for automated regression detection.
Active areas of work
Recent activity visible in the README includes support for Gemini 3 Flash/Pro and Claude Opus 4.5, suggesting active model version tracking. The agent tools directory (backend/agent/tools/) with definitions.py, parsing.py, runtime.py, and summaries.py indicates ongoing work on a tool-use/agentic code-generation loop beyond simple single-shot prompting.
Get running
git clone https://github.com/abi/screenshot-to-code.git cd screenshot-to-code
Backend
cd backend echo 'OPENAI_API_KEY=sk-your-key' > .env echo 'ANTHROPIC_API_KEY=your-key' >> .env echo 'GEMINI_API_KEY=your-key' >> .env pip install --upgrade poetry poetry install poetry env activate # run the printed source command poetry run uvicorn main:app --reload --port 7001
Frontend (new terminal)
cd frontend yarn yarn dev
Open http://localhost:5173
Daily commands:
Backend: cd backend && poetry run uvicorn main:app --reload --port 7001
Frontend: cd frontend && yarn dev
Docker (full stack): echo 'OPENAI_API_KEY=sk-your-key' > .env && docker-compose up -d --build
Map of the codebase
- backend/agent/engine.py: Core orchestration logic that drives the multi-step LLM code generation loop.
- backend/agent/providers/factory.py: Provider factory that selects the correct AI provider (OpenAI/Anthropic/Gemini) at runtime.
- backend/agent/providers/base.py: Abstract base class that all AI provider implementations must conform to.
- backend/agent/runner.py: Entry point that connects the FastAPI WebSocket endpoint to the agent engine.
- backend/codegen/utils.py: Code generation utilities shared across stacks — where output formatting and post-processing happen.
- backend/custom_types.py: Defines the supported output stacks and shared type contracts between frontend and backend.
- backend/agent/tools/definitions.py: Defines the LLM tool-use schemas (function calling specs) used in the agentic loop.
- backend/evals/config.py: Configuration for the evaluation harness used to measure code generation quality.
- backend/config.py: Central config loading (API keys, feature flags) from environment variables.
- frontend/src: Root of the React frontend — all UI components, Zustand stores, and WebSocket client logic live here.
How to make changes
To add a new AI provider: create a file in backend/agent/providers/ following the pattern of openai.py or gemini.py, register it in backend/agent/providers/factory.py and __init__.py. To add a new output stack (e.g., Angular): look at backend/codegen/utils.py and backend/custom_types.py where stacks are defined. To change the UI: start in frontend/src/ — the main app flow is driven by the component tree wired to Zustand stores. To add evals: see backend/evals/ and backend/codegen/test_utils.py.
Traps & gotchas
- The backend WebSocket port defaults to 7001; if you change it, you must update
VITE_WS_BACKEND_URLinfrontend/.env.local— easy to miss. 2. You need at least one valid API key (OpenAI, Anthropic, or Gemini) inbackend/.envbefore the backend starts accepting requests — missing keys cause silent failures in generation, not startup errors. 3. Poetry environment activation requires running the printedsource ...command manually; forgetting this and runninguvicornoutside the venv will fail with missing imports. 4. Image generation via DALL-E 3 or Flux requires a separate Replicate API key, which is not mentioned in the primary setup flow. 5. Pre-commit hooks (.pre-commit-config.yaml) must be installed separately withpre-commit install— they are not run automatically on clone.
Concepts to learn
- Multimodal LLM prompting — The entire product depends on sending image+text to LLMs via vision APIs — understanding how image tokens are encoded and priced (see backend/agent/providers/pricing.py) is essential for cost and quality tuning.
- LLM Tool Use / Function Calling — The agent loop in backend/agent/tools/ uses structured tool definitions (definitions.py) so the LLM can invoke specific actions during code generation rather than producing a single monolithic response.
- WebSocket streaming — Generated code is streamed token-by-token from the FastAPI backend to the React frontend over WebSockets (not HTTP), enabling the live-update preview UX — understanding WebSocket lifecycle is required to debug connection issues.
- Provider abstraction pattern — backend/agent/providers/base.py + factory.py implement a classic strategy pattern so OpenAI, Anthropic, and Gemini can be swapped at runtime — new contributors must follow this pattern to add models.
- Token usage accounting — backend/agent/providers/token_usage.py and pricing.py track per-request token consumption across providers with different pricing models — critical for the hosted paid version's cost management.
- CodeMirror 6 editor integration — The frontend uses CodeMirror 6 (not the older CM5) with the @codemirror/lang-html extension for syntax-highlighted, editable code display — the API is significantly different from CM5 and most tutorials target the older version.
- Zustand state management — Global frontend state (selected model, generated code history, settings) is managed via Zustand stores rather than Redux or React Context — understanding Zustand's slice pattern is needed to trace data flow in the frontend.
Related repos
emilkowalski/v0— Vercel's v0 is the closest hosted alternative — AI-driven UI generation from prompts/screenshots targeting React/Tailwind.tldraw/make-real— Similar screenshot/sketch-to-code concept using tldraw canvas as input and GPT-4V as the backend.openai/openai-python— Direct dependency — the Python SDK used in backend/agent/providers/openai.py for GPT-4V and GPT-5 API calls.anthropics/anthropic-sdk-python— Used in backend/agent/providers/anthropic/ for Claude Opus API calls and multimodal image handling.google-gemini/generative-ai-python— Used in backend/agent/providers/gemini.py for Gemini 3 Flash/Pro multimodal API calls.
PR ideas
To work on one of these in Claude Code or Cursor, paste:
Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.
Add unit tests for backend/codegen/utils.py and backend/agent/tools/parsing.py
The repo has a backend/codegen/test_utils.py file (suggesting a test harness exists) and a TESTING.md doc, but there are no visible test files for the core code generation utilities or tool-call parsing logic. These are high-risk, high-churn modules: codegen/utils.py likely does HTML extraction/cleanup and tools/parsing.py handles LLM tool-call responses. Bugs here silently produce bad output. Adding pytest-based unit tests with representative fixture inputs (malformed HTML, partial tool calls, multi-stack outputs) would catch regressions immediately.
- [ ] Read
backend/codegen/utils.pyand enumerate all exported functions (e.g.extract_html_content,cleanup_code, etc.) - [ ] Create
backend/codegen/test_codegen_utils.pywith pytest parametrize cases covering valid HTML, code wrapped in markdown fences, and edge-case empty strings - [ ] Read
backend/agent/tools/parsing.pyand identify the tool-call parsing entry points - [ ] Create
backend/agent/tools/test_parsing.pywith fixtures mimicking raw Anthropic/OpenAI tool-call JSON blobs, including malformed/partial responses - [ ] Wire the new tests into the existing test runner documented in
TESTING.mdand confirm they pass withpytest backend/
Add a GitHub Actions CI workflow for the Python backend (lint + test)
The repo has .github/ with only funding and issue templates — there is no CI workflow file at all. The backend uses poetry (evidenced by backend/poetry.lock) and pre-commit (.pre-commit-config.yaml), so the toolchain is already defined. Without CI, every PR to the backend can break imports, introduce type errors, or fail the pre-commit hooks silently. A focused workflow running pre-commit, mypy/pyright, and pytest on every push/PR to backend/** would provide an immediate safety net for contributors.
- [ ] Create
.github/workflows/backend-ci.ymltriggered onpushandpull_requestpathsbackend/** - [ ] Set up Python with the version pinned in
backend/pyproject.toml, install dependencies viapoetry install - [ ] Add a step running
pre-commit run --all-filesusing the hooks already defined inbackend/.pre-commit-config.yaml - [ ] Add a step running
pytest backend/(initially this will be thin but will grow as tests are added per PR #1 above) - [ ] Add a step running the type checker (mypy or pyright) on
backend/to catch type regressions in typed modules likebackend/custom_types.pyandbackend/agent/providers/types.py - [ ] Document the new CI badge in
README.md
Split backend/agent/providers/ — extract Gemini and OpenAI providers into consistent subpackage structure matching the Anthropic provider
The backend/agent/providers/ directory shows an inconsistency: Anthropic has its own subpackage (anthropic/__init__.py, anthropic/image.py, anthropic/provider.py) with separated concerns, but gemini.py and openai.py are single flat files. As more models are added (the README already lists GPT-5.x, Gemini 3, etc.), these flat files will grow unwieldy and make provider-specific logic (image handling, token counting) hard to locate or test. Refactoring Gemini and OpenAI into the same subpackage pattern as Anthropic makes the codebase consistent and maintainable.
- [ ] Create
backend/agent/providers/openai/directory with__init__.py,provider.py, andimage.pymirroring the structure
Good first issues
- Add unit tests for
backend/agent/providers/gemini.py— the evals framework exists inbackend/evals/but provider-level unit tests appear absent from the listed files. 2. Add abackend/agent/providers/anthropic/provider.pydocstring and usage example — the Anthropic provider has a sub-package structure suggesting complexity that is undocumented compared to the flatopenai.pyandgemini.py. 3. Create aCONTRIBUTING.md—AGENTS.md,CLAUDE.md, andTESTING.mdexist but there is no unified contribution guide explaining the provider pattern, how to add a new stack, or how to run evals locally.
Top contributors
Recent commits
698ddfb— Add Lilo sponsor logo to README (#597) (abi)aaaa838— Support Gemini API keys from request settings (abi)5366927— update model for edits and format for prompt (abi)1a6f88b— add caching-related tools and remove prompt_cache_key (abi)5fb885f— set prompt cache retention to 24h for GPT 5.4 (abi)9e8e245— Merge branch 'claude/document-image-processing-eqFtJ' (abi)d227265— add gpt-5.4 reasoning model support (abi)b9bdca7— add openai prompt cache keys (abi)e2413c4— Merge branch 'openai-caching' (abi)2809e84— remove leftover openai test override logic (abi)
Security observations
Failed to generate security analysis.
LLM-derived; treat as a starting point, not a security audit.
Where to read next
- Open issues — current backlog
- Recent PRs — what's actively shipping
- Source on GitHub
Generated by RepoPilot. Verdict based on maintenance signals — see the live page for receipts. Re-run on a new commit to refresh.