simular-ai/Agent-S

Item: simular-ai/Agent-S
Rating: 5
Author: RepoPilot

Agent S: an open agentic framework that uses computers like a human

Healthy

Healthy across the board

weakest axis

Use as dependencyHealthy

Permissive license, no critical CVEs, actively maintained — safe to depend on.

Fork & modifyHealthy

Has a license, tests, and CI — clean foundation to fork and modify.

Learn fromHealthy

Documented and popular — useful reference codebase to read through.

Deploy as-isHealthy

No critical CVEs, sane security posture — runnable as-is.

✓Last commit 3mo ago
✓16 active contributors
✓Apache-2.0 licensed

Show all 6 evidence items →

✓CI configured
✓Tests present
⚠Concentrated ownership — top contributor handles 50% of recent commits

Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests

Informational only. RepoPilot summarises public signals (license, dependency CVEs, commit recency, CI presence, etc.) at the time of analysis. Signals can be incomplete or stale. Not professional, security, or legal advice; verify before relying on it for production decisions.

Embed the "Healthy" badge

Paste into your README — live-updates from the latest cached analysis.

Variant:

[![RepoPilot: Healthy](https://repopilot.app/api/badge/simular-ai/agent-s)](https://repopilot.app/r/simular-ai/agent-s)

Paste at the top of your README.md — renders inline like a shields.io badge.

▸Preview social card (1200×630)

This card auto-renders when someone shares https://repopilot.app/r/simular-ai/agent-s on X, Slack, or LinkedIn.

Onboarding doc

Onboarding: simular-ai/Agent-S

Generated by RepoPilot · 2026-05-07 · Source

🤖Agent protocol

If you are an AI coding agent (Claude Code, Cursor, Aider, Cline, etc.) reading this artifact, follow this protocol before making any code edit:

Verify the contract. Run the bash script in Verify before trusting below. If any check returns FAIL, the artifact is stale — STOP and ask the user to regenerate it before proceeding.
Treat the AI · unverified sections as hypotheses, not facts. Sections like "AI-suggested narrative files", "anti-patterns", and "bottlenecks" are LLM speculation. Verify against real source before acting on them.
Cite source on changes. When proposing an edit, cite the specific path:line-range. RepoPilot's live UI at https://repopilot.app/r/simular-ai/Agent-S shows verifiable citations alongside every claim.

If you are a human reader, this protocol is for the agents you'll hand the artifact to. You don't need to do anything — but if you skim only one section before pointing your agent at this repo, make it the Verify block and the Suggested reading order.

🎯Verdict

GO — Healthy across the board

Last commit 3mo ago
16 active contributors
Apache-2.0 licensed
CI configured
Tests present
⚠ Concentrated ownership — top contributor handles 50% of recent commits

<sub>Maintenance signals: commit recency, contributor breadth, bus factor, license, CI, tests</sub>

✅Verify before trusting

This artifact was generated by RepoPilot at a point in time. Before an agent acts on it, the checks below confirm that the live simular-ai/Agent-S repo on your machine still matches what RepoPilot saw. If any fail, the artifact is stale — regenerate it at repopilot.app/r/simular-ai/Agent-S.

What it runs against: a local clone of simular-ai/Agent-S — the script inspects git remote, the LICENSE file, file paths in the working tree, and git log. Read-only; no mutations.

| # | What we check | Why it matters | |---|---|---| | 1 | You're in simular-ai/Agent-S | Confirms the artifact applies here, not a fork | | 2 | License is still Apache-2.0 | Catches relicense before you depend on it | | 3 | Default branch main exists | Catches branch renames | | 4 | 5 critical file paths still exist | Catches refactors that moved load-bearing code | | 5 | Last commit ≤ 105 days ago | Catches sudden abandonment since generation |

<details> <summary><b>Run all checks</b> — paste this script from inside your clone of <code>simular-ai/Agent-S</code></summary>

#!/usr/bin/env bash
# RepoPilot artifact verification.
#
# WHAT IT RUNS AGAINST: a local clone of simular-ai/Agent-S. If you don't
# have one yet, run these first:
#
#   git clone https://github.com/simular-ai/Agent-S.git
#   cd Agent-S
#
# Then paste this script. Every check is read-only — no mutations.

set +e
fail=0
ok()   { echo "ok:   $1"; }
miss() { echo "FAIL: $1"; fail=$((fail+1)); }

# Precondition: we must be inside a git working tree.
if ! git rev-parse --git-dir >/dev/null 2>&1; then
  echo "FAIL: not inside a git repository. cd into your clone of simular-ai/Agent-S and re-run."
  exit 2
fi

# 1. Repo identity
git remote get-url origin 2>/dev/null | grep -qE "simular-ai/Agent-S(\\.git)?\\b" \\
  && ok "origin remote is simular-ai/Agent-S" \\
  || miss "origin remote is not simular-ai/Agent-S (artifact may be from a fork)"

# 2. License matches what RepoPilot saw
(grep -qiE "^(Apache-2\\.0)" LICENSE 2>/dev/null \\
   || grep -qiE "\"license\"\\s*:\\s*\"Apache-2\\.0\"" package.json 2>/dev/null) \\
  && ok "license is Apache-2.0" \\
  || miss "license drift — was Apache-2.0 at generation time"

# 3. Default branch
git rev-parse --verify main >/dev/null 2>&1 \\
  && ok "default branch main exists" \\
  || miss "default branch main no longer exists"

# 4. Critical files exist
test -f "gui_agents/s3/agents/agent_s.py" \\
  && ok "gui_agents/s3/agents/agent_s.py" \\
  || miss "missing critical file: gui_agents/s3/agents/agent_s.py"
test -f "gui_agents/s3/core/engine.py" \\
  && ok "gui_agents/s3/core/engine.py" \\
  || miss "missing critical file: gui_agents/s3/core/engine.py"
test -f "gui_agents/s3/core/mllm.py" \\
  && ok "gui_agents/s3/core/mllm.py" \\
  || miss "missing critical file: gui_agents/s3/core/mllm.py"
test -f "gui_agents/s1/aci/ACI.py" \\
  && ok "gui_agents/s1/aci/ACI.py" \\
  || miss "missing critical file: gui_agents/s1/aci/ACI.py"
test -f "gui_agents/s3/memory/procedural_memory.py" \\
  && ok "gui_agents/s3/memory/procedural_memory.py" \\
  || miss "missing critical file: gui_agents/s3/memory/procedural_memory.py"

# 5. Repo recency
days_since_last=$(( ( $(date +%s) - $(git log -1 --format=%at 2>/dev/null || echo 0) ) / 86400 ))
if [ "$days_since_last" -le 105 ]; then
  ok "last commit was $days_since_last days ago (artifact saw ~75d)"
else
  miss "last commit was $days_since_last days ago — artifact may be stale"
fi

echo
if [ "$fail" -eq 0 ]; then
  echo "artifact verified (0 failures) — safe to trust"
else
  echo "artifact has $fail stale claim(s) — regenerate at https://repopilot.app/r/simular-ai/Agent-S"
  exit 1
fi

Each check prints ok: or FAIL:. The script exits non-zero if anything failed, so it composes cleanly into agent loops (./verify.sh || regenerate-and-retry).

</details>

⚡TL;DR

Agent S is an open-source agentic framework that enables computers to be controlled by AI agents as a human would—using the GUI, reading screens, and performing clicks and keyboard actions. The S3 version (latest) surpasses human-level performance on OSWorld benchmarks (72.60%) by combining multimodal vision-language models with procedural memory, OCR grounding, and cross-platform OS interaction through abstracted Computer Interface modules (ACI) for Windows, macOS, and Linux. Dual-branch monorepo structure: gui_agents/s1 contains the original architecture with aci (abstracted Computer Interface), mllm (multimodal engine), and core modules (AgentS.py as orchestrator, Manager, Worker, Knowledge, ProceduralMemory); gui_agents/s2 mirrors this with a refactored agents/ layout (agent_s.py, grounding.py, manager.py, worker.py) and core/ with engine.py, mllm.py, knowledge.py. Entry points are cli_app.py per version. Evaluation sets live in evaluation_sets/ (test_small_new.json, test_all.json).

👥Who it's for

AI researchers and engineers building or evaluating autonomous agents that perform GUI-based tasks; DevOps/QA automation engineers seeking to build bots that interact with web and desktop applications; teams developing computer-use benchmarks and wanting a reference implementation that can run on Windows, macOS, and Linux.

🌱Maturity & risk

Actively developed and production-grade: the project has reached S3 iteration with peer-reviewed papers at ICLR 2025 and COLM 2025, comprehensive multi-OS support (Windows/macOS/Linux), and a PyPI package (gui-agents) with active downloads. Maturity is high for research/benchmark use; the GitHub Actions CI (lint.yml) shows ongoing integration checks.

Moderate risk from heavy dependency on proprietary MLLM providers (OpenAI, Anthropic, Google GenAI, Together) for core inference—any API changes or rate limits affect all users. The codebase splits into s1, s2, s2 versions which may cause maintenance fragmentation; OCR and platform-specific automation (pyautogui, pywinauto, pywin32) introduce OS-level brittleness. Dependency count is high (~17 core packages), but pinning specifics are unclear from the snippet.

Active areas of work

S3 milestone recently achieved (2025/12/15 per README) with OSWorld performance surpassing humans; active refinement of multimodal reasoning and grounding. s2 branch likely represents a cleaned architectural iteration, suggesting focus on code maintainability and extensibility. The WindowsAgentArena.md and GroundingAgent.py additions indicate work on improving agent grounding and Windows-specific scenarios.

🚀Get running

git clone https://github.com/simular-ai/Agent-S.git
cd Agent-S
pip install -e .

Then configure your MLLM provider (OpenAI/Anthropic/Google) via environment variables and run: python gui_agents/s2/cli_app.py or python gui_agents/s1/cli_app.py to launch the agent.

Daily commands: For s2 (recommended): python gui_agents/s2/cli_app.py (expects FastAPI on uvicorn or direct CLI invocation). For s1: python gui_agents/s1/cli_app.py. Both require environment variables for your chosen MLLM (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, or TOGETHER_API_KEY). No Makefile present; execution is direct Python.

🗺️Map of the codebase

gui_agents/s3/agents/agent_s.py — Main S3 agent orchestrator that coordinates multi-modal reasoning, procedural memory, and computer interaction—foundational entry point for the latest framework version
gui_agents/s3/core/engine.py — Core execution engine that manages LLM interactions, state transitions, and agent lifecycle—load-bearing abstraction for all agent versions
gui_agents/s3/core/mllm.py — Multi-modal LLM interface abstracting Claude, GPT, and other models—critical dependency for vision and reasoning capabilities
gui_agents/s1/aci/ACI.py — Abstract Computer Interaction interface defining platform-agnostic screen/action APIs that all OS implementations must provide
gui_agents/s3/memory/procedural_memory.py — Procedural memory system storing task trajectories and learned behaviors—enables few-shot learning and task reuse across episodes
gui_agents/s3/agents/grounding.py — Grounding agent responsible for translating high-level plans into concrete GUI actions—bridges symbolic reasoning and pixel-level execution
gui_agents/s1/aci/WindowsOSACI.py — Windows-specific ACI implementation using pywinauto and Win32 APIs—reference implementation for platform abstraction pattern

🛠️How to make changes

Add a new LLM provider

Create a new provider class inheriting from the base MLLM interface in gui_agents/s3/core/mllm.py—implement query() and query_with_vision() methods (gui_agents/s3/core/mllm.py)
Add provider initialization logic in mllm.py's factory/registry section to detect API keys and instantiate your provider (gui_agents/s3/core/mllm.py)
Update engine.py to route model selection to your new provider based on env config or model name prefix (gui_agents/s3/core/engine.py)
Add credentials/API key documentation to README.md with env var examples (README.md)

Add support for a new operating system

Create NewOSACI.py in gui_agents/s1/aci/ inheriting from ACI.py—implement screenshot(), mouse_move(), click(), type_text(), and key_press() methods (gui_agents/s1/aci/ACI.py)
Use OS-specific libraries (e.g., Xlib for Linux, Cocoa for future Darwin enhancements) to interface with window managers and input systems (gui_agents/s1/aci/NewOSACI.py)
Register your OS implementation in cli_app.py or engine.py's platform detection logic (gui_agents/s3/cli_app.py)
Add platform-specific dependencies to setup.py/pyproject.toml with conditional markers (e.g., platform_system == 'NewOS') (README.md)

Implement a custom memory storage backend

Create a new class in gui_agents/s3/memory/procedural_memory.py or extend it, implementing store_trajectory(), retrieve_similar(), and clear() methods (gui_agents/s3/memory/procedural_memory.py)
Override the default in-memory storage with your backend (e.g., vector DB, JSON files, database) using the same interface (gui_agents/s3/memory/procedural_memory.py)
Update agent_s.py to instantiate your memory backend via env config or factory pattern (gui_agents/s3/agents/agent_s.py)

Add a new agent behavior or action type

Define new action primitives in gui_agents/s3/agents/worker.py's action execution logic or grounding.py's action translation (gui_agents/s3/agents/worker.py)
Create a specialized prompt or reasoning path in engine.py for your new action type (gui_agents/s3/core/engine.py)
Register the new action in the ACI interface (gui_agents/s1/aci/ACI.py) if it requires OS-level interaction (gui_agents/s1/aci/ACI.py)
Add test cases in evaluation_sets/test_all.json or test_small_new.json with examples triggering your new action (evaluation_sets/test_all.json)

🔧Why these technologies

Multi-modal LLM (Claude, GPT-4V, Gemini) — Vision-language models enable joint reasoning over GUI screenshots and text, essential for grounding high-level tasks in pixel coordinates and understanding UI semant

🪤Traps & gotchas

MLLM provider API keys must be set as environment variables (OPENAI_API_KEY, etc.); missing any will silently fail or fall back to a broken provider. 2. OCR server (gui_agents/s1/utils/ocr_server.py) may require separate setup or daemon for PaddleOCR; not clear if auto-started. 3. Windows-specific dependencies (pywin32, pywinauto) require compilation on some systems; macOS requires Xcode for pyobjc. 4. Agent S1 and S2 have overlapping but non-identical APIs; mixing them causes import conflicts. 5. Evaluation sets reference task structures not fully documented in the repo—check osworld-bench external repo for task schema. 6. FastAPI uvicorn server runs on default localhost:8000 but is optional; unclear if all features work without it.

🏗️Architecture

💡Concepts to learn

Agentic Loop (Perception → Reasoning → Action → Observation) — Agent S's core execution pattern; understanding this loop in AgentS.py and agent_s.py is essential to extending or debugging the agent's behavior.
Vision-Language Models (VLMs) — Agent S uses VLMs (via OpenAI GPT-4V, Claude, Gemini) to interpret screenshots and decide actions; the multimodal reasoning in mllm.py depends entirely on understanding VLM capabilities and limitations.
Procedural Memory (Episodic + Semantic) — ProceduralMemory.py stores action sequences and task patterns; this allows Agent S to reuse learned behaviors across similar tasks, a key differentiator from stateless agents.
OS-Level Automation & Input Simulation — The ACI abstraction (WindowsOSACI, MacOSACI, LinuxOSACI) uses OS-native APIs (pyautogui, pywinauto, X11) to programmatically control mouse and keyboard; understanding which APIs are fragile on each OS is critical for debugging.
Optical Character Recognition (OCR) for UI Grounding — PaddleOCR in utils/ocr_server.py extracts text from screenshots to ground the agent's vision understanding to clickable elements; OCR accuracy directly impacts action precision.
Manager-Worker Async Pattern — Core agent architecture in Manager.py and Worker.py; Manager coordinates task decomposition and Worker threads execute actions in parallel, enabling scalable multi-task execution.
Token-Level Reasoning & Prompt Engineering — Agent S's performance (72.60% on OSWorld) hinges on precise prompt design for VLMs; understanding how reasoning is encoded in mllm.py calls is essential for model tuning and ablation studies.

OpenInterpreter/open-interpreter — Comparable framework for code-executing agents on local machines; shares GUI automation and OS-level task execution patterns.
xlwings/xlwings — Python library for Excel automation similar to Agent S's OS-specific interaction layer, though narrower scope (Excel only).
microsoft/TaskWeaver — Microsoft's agent framework for planning and execution; complementary approach to multi-step task decomposition.
anthropics/anthropic-sdk-python — Official Anthropic Python SDK used directly in gui_agents/s2/core/mllm.py for vision API calls; core dependency.
osworld-benchmark/osworld — The OSWorld benchmark suite that Agent S targets and achieves 72.60% on; defines task ground truth and evaluation metrics.

🪄PR ideas

To work on one of these in Claude Code or Cursor, paste: Implement the "<title>" PR idea from CLAUDE.md, working through the checklist as the task list.

Add platform-specific integration tests for ACI implementations

The repo has three OS-specific ACI implementations (WindowsOSACI.py, MacOSACI.py, LinuxOSACI.py) in gui_agents/s1/aci/ but no corresponding test files. These are critical for ensuring cross-platform GUI automation works correctly. Adding tests would catch regressions in pyautogui, pywinauto, and pyobjc integrations.

[ ] Create gui_agents/s1/aci/tests/ directory with init.py
[ ] Add test_WindowsOSACI.py with mocked pywinauto/pywin32 calls for screenshot, click, and keyboard operations
[ ] Add test_MacOSACI.py with mocked pyobjc calls for screenshot, click, and keyboard operations
[ ] Add test_LinuxOSACI.py with mocked X11/display server calls
[ ] Create conftest.py with platform detection fixtures to skip irrelevant tests
[ ] Add CI workflow in .github/workflows/ to run platform-specific tests in containers

Consolidate duplicate code between s1, s2, and s2_5 agent versions

The repo maintains three parallel agent implementations (gui_agents/s1/, gui_agents/s2/, gui_agents/s2_5/) with significant code duplication in core modules (MultimodalAgent/mllm, ProceduralMemory, Worker logic). Extracting shared functionality into a common base package would reduce maintenance burden and make the codebase more scalable for future versions.

[ ] Create gui_agents/common/ directory for shared modules
[ ] Extract common MLLM interface from gui_agents/s1/mllm/MultimodalEngine.py and gui_agents/s2/core/mllm.py into gui_agents/common/mllm_base.py
[ ] Extract procedural memory interface from s1/core/ProceduralMemory.py and s2/memory/procedural_memory.py into gui_agents/common/memory_base.py
[ ] Update s2 and s2_5 to inherit from common base classes while preserving version-specific overrides
[ ] Add integration tests in tests/ to verify all three versions work with shared base classes
[ ] Update README.md to document the architecture and relationship between s1, s2, s2_5

Add comprehensive logging and error handling for MultimodalAgent across all platforms

The MultimodalAgent (s1/mllm/MultimodalAgent.py) and MultimodalEngine handle critical inference logic but lack structured logging and error recovery. Without this, debugging deployment issues in production is difficult. Adding decorators for retry logic (backoff is already a dependency), logging, and graceful degradation would improve reliability.

[ ] Add logging configuration to gui_agents/s1/utils/common_utils.py with structured logging (handlers for file + stderr)
[ ] Add @retry decorator from backoff library to MultimodalAgent.py methods calling external LLMs (openai, anthropic, google-genai)
[ ] Add detailed logging at MLLM call boundaries (input tokens, model used, latency, error details)
[ ] Create gui_agents/s1/core/error_handler.py with custom exception classes for MLLM failures, OCR failures, and GUI automation failures
[ ] Add fallback logic in MultimodalEngine.py to retry with different models if primary model fails
[ ] Add logging to WindowsAgentArena.py GroundingAgent for arena-specific errors
[ ] Add unit tests in tests/test_multimodal_logging.py validating retry behavior and error messages

🌿Good first issues

Add missing type hints to gui_agents/s1/core/Knowledge.py and gui_agents/s1/core/ProceduralMemory.py to improve IDE support and reduce runtime errors; these are foundational but lack full annotation.
Create a comprehensive test suite for gui_agents/s1/aci/MacOSACI.py (currently no test files visible in the structure); add unit tests for screen coordinate mapping and click/type actions specific to macOS.
Document the evaluation_sets/ task schema with a YAML or Markdown spec detailing how OSWorld tasks are structured (goal, success criteria, expected actions); currently only JSON examples exist with no schema docs.

⭐Top contributors

Click to expand

@alckasoc — 50 commits
@eric-xw — 17 commits
@rossamurphy — 9 commits
@Richard-Simular — 7 commits
@simularlyon — 2 commits

📝Recent commits