Skip to main content

Case Study: Open SWE

"The best internal tools are built by teams who use them daily." — LangChain Team

Open SWE (Open Source Software Engineer) is LangChain's open-source framework for building internal coding agents. Built on LangGraph and Deep Agents, it provides a production-ready architecture that mirrors the internal coding agents used by elite engineering organizations like Stripe, Ramp, and Coinbase.

Project: langchain-ai/open-swe Tech Stack: Python, LangGraph, Deep Agents, Modal/Daytona Sandboxes License: MIT


1. Project Overview & Significance

What is Open SWE?

Open SWE is an asynchronous, cloud-native coding agent framework that:

  • Receives tasks from Slack, Linear, or GitHub mentions
  • Spawns isolated sandboxes for each task
  • Orchestrates multi-agent workflows to plan, implement, and review code changes
  • Automatically creates pull requests linked back to the original ticket

It represents the democratization of internal engineering tools — making the same architectural patterns used by trillion-dollar companies available to any development team.

Why This Matters

The Problem: Elite engineering orgs (Stripe, Ramp, Coinbase) have built powerful internal coding agents, but their implementations are proprietary. The industry lacked an open-source reference architecture.

Open SWE's Impact:

  • Reference Architecture: Provides a production blueprint for internal coding agents
  • Composability over Forking: Builds on Deep Agents framework, enabling upgrades and customization
  • Pluggable Design: Swap sandboxes, models, tools, and triggers
  • Community Innovation: Open foundation for org-specific extensions

Relationship to Proprietary Systems

CompanyInternal ToolOpen SWE Equivalent
StripeMinionsMulti-agent orchestration, rule files → AGENTS.md
RampInspectComposed on OpenCode → Deep Agents
CoinbaseCloudbotSlack-native, Linear-first integration

2. Agent Architecture (Core Focus)

This is the heart of Open SWE — a sophisticated multi-agent system built on LangGraph's state machine architecture.

High-Level Architecture

The Four-Agent System

Open SWE implements a specialized multi-agent architecture, where each agent has a distinct responsibility:

1. Manager Agent

Role: Entry point and orchestrator

# High-level concept (not actual code)
manager_agent = Agent(
name="manager",
system_prompt="Route tasks and coordinate workflow",
decisions=[
"route_to_planner", # For new tasks needing research
"route_to_programmer", # For straightforward fixes
"respond_complete", # Task finished successfully
"respond_error" # Task failed
]
)

Responsibilities:

  • Receives initial task from Slack/Linear/GitHub
  • Parses context (issue description, thread history)
  • Decides whether to route to Planner or Programmer
  • Coordinates hand-offs between agents
  • Sends final responses back to the trigger source

2. Planner Agent

Role: Research and plan before coding

# Concept: Planner's workflow
planner_agent = Agent(
name="planner",
tools=[ls, grep, read_file, glob, fetch_url],
system_prompt="Analyze codebase and create step-by-step plan",
output="structured_todos"
)

Responsibilities:

  • Researches the codebase using file operations
  • Analyzes the GitHub issue or Linear ticket
  • Creates a structured plan using write_todos tool
  • Identifies files that need modification
  • Proposes step-by-step execution strategy

Key Tools Used:

ToolPurpose
lsList directory contents
grepSearch for code patterns
read_fileRead file contents
globFind files by pattern
fetch_urlFetch documentation

3. Programmer Agent

Role: Implements the planned changes

# Concept: Programmer's workflow
programmer_agent = Agent(
name="programmer",
tools=[write_file, edit_file, execute, http_request],
system_prompt="Implement changes according to plan",
constraints=["run_linters", "run_tests", "ensure_tests_pass"]
)

Responsibilities:

  • Executes the plan created by Planner
  • Writes/edits files in the sandbox
  • Runs linters and formatters
  • Executes tests to verify changes
  • Iterates until tests pass

Key Tools Used:

ToolPurpose
write_fileCreate new files
edit_fileModify existing files
executeRun shell commands (npm test, black, etc.)
http_requestMake API calls

4. Reviewer Agent

Role: Validate before shipping

# Concept: Reviewer's workflow
reviewer_agent = Agent(
name="reviewer",
tools=[execute, read_file, git_diff],
system_prompt="Review changes and ensure quality",
checks=["code_quality", "test_coverage", "documentation"]
)

Responsibilities:

  • Reviews all code changes
  • Runs final validation tests
  • Checks for edge cases
  • Ensures linters and formatters pass
  • Reduces broken builds

LangGraph State Machine

The agents are orchestrated using LangGraph, which provides:

Why LangGraph?

  • Deterministic Transitions: Clear hand-offs between agents
  • State Persistence: Memory survives long-running tasks
  • Loop Prevention: Guards against infinite agent loops
  • Human Oversight: Can inject checkpoints for human review
  • Expressive Power: Not limited to single cognitive architecture

Tools System

Open SWE follows Stripe's insight: tool curation matters more than quantity.

Core Tools

ToolCategoryPurpose
executeShellRun commands in sandbox (bash, python, npm)
fetch_urlWebFetch web pages as markdown
http_requestAPIMake HTTP requests (GET, POST, etc.)
commit_and_open_prGitCommit changes + create GitHub draft PR
linear_commentCommunicationPost updates to Linear tickets
slack_thread_replyCommunicationReply in Slack threads

Built-in Deep Agents Tools

ToolCategoryPurpose
read_fileFilesystemRead file contents
write_fileFilesystemCreate new files
edit_fileFilesystemModify existing files (diff-based)
lsFilesystemList directory contents
globFilesystemFind files by pattern
grepFilesystemSearch file contents
write_todosPlanningCreate structured task lists
taskOrchestrationSpawn subagents

Tool Design Principles

Memory & Context Management

AGENTS.md

Open SWE reads an AGENTS.md file (if present) at the repository root and injects it into the system prompt. This is the repo-level rulebook:

# AGENTS.md Example

## Code Style
- Use 4 spaces for indentation
- Follow PEP 8 for Python
- Run `black` and `isort` before committing

## Testing Requirements
- All new features need unit tests
- Test coverage must not decrease
- Run `pytest` before committing

## Architecture Rules
- Use dependency injection
- No circular imports
- Database queries go through repository layer

Analogy: This is Stripe's "rule files" concept — encoding conventions that every agent run must follow.

Source Context

Open SWE assembles rich context before the agent starts:

  • Linear Issues: Title, description, all comments
  • Slack Threads: Full conversation history
  • GitHub PRs: Description + review comments

This prevents the agent from "discovering everything through tool calls" and enables faster, more accurate responses.

File-Based Memory

For larger codebases, Open SWE uses file-based memory:

Benefits:

  • Prevents Context Overflow: Large codebases don't exhaust token limits
  • Persistent: Survives agent restarts and crashes
  • Queryable: Agent can search past interactions

Middleware System

Middleware provides deterministic hooks around the agent loop:

Key Middleware

# Concept: Middleware chain
middleware_chain = [
ToolErrorMiddleware(), # Catch tool errors gracefully
check_message_queue_before_model, # Inject follow-up messages
open_pr_if_needed, # Auto-commit safety net
]

1. ToolErrorMiddleware

Purpose: Prevent cascading failures

# Concept: Error handling
try:
result = tool.execute(**args)
except ToolError as e:
# Gracefully handle error
return f"Tool {tool.name} failed: {e}. Please try an alternative approach."

2. check_message_queue_before_model

Purpose: Enable real-time human steering

Innovation: You can message the agent while it's working, and it'll pick up your input at its next step.

3. open_pr_if_needed

Purpose: Safety net for PR creation

# Concept: Post-agent hook
if agent_finished and no_pr_created:
# Agent forgot to open PR — do it automatically
commit_changes()
open_draft_pr()
notify_user("Opened PR as safety net")

Design Philosophy: Lightweight version of Stripe's deterministic nodes — ensuring critical steps happen regardless of LLM behavior.

Subagent Spawning

The task tool enables parallel subagent delegation:

Each subagent has:

  • Its own middleware stack
  • Independent todo list
  • Isolated file operations
  • Separate sandbox (if needed)

Use Case: Parallel work across independent subtasks (e.g., frontend + backend + tests).


3. End-to-End Design Flow

From Trigger to PR

Sandbox Lifecycle

Sandbox Providers:

  • Modal: Serverless containers
  • Daytona: Long-lived dev environments
  • Runloop: AI-optimized sandboxes
  • LangSmith: Built-in cloud sandbox
  • Custom: Plug in your own

Thread-to-Sandbox Mapping

Each invocation creates a deterministic thread ID:

Slack Thread: slack.com/archives/C1234/p5678

Thread ID: slack_C1234_p5678

Sandbox: sandbox-slack_C1234_p5678

Benefits:

  • Follow-up messages route to the same sandbox
  • State persists across conversation
  • Multiple tasks run in parallel (different sandboxes)

4. Design Philosophy

Open SWE's architecture reflects deliberate design choices:

1. Compose, Don't Fork

Decision: Compose on Deep Agents instead of forking an existing agent.

Benefits:

  • Upgrade Path: Pull in upstream improvements
  • Maintainability: Framework handles complex orchestration
  • Customizability: Override tools, middleware, prompts

2. Sandbox-First Isolation

Principle: "Isolate first, then give full permissions inside the boundary."

Rationale:

  • Blast Radius Containment: Mistakes don't affect production
  • No Confirmation Prompts: Agent can work autonomously
  • Parallel Execution: Multiple tasks run simultaneously

3. Tool Curation Over Accumulation

Stripe's Insight: 500 curated tools > 5000 accumulated tools.

Open SWE's Approach:

  • ~15 core tools
  • Each tool has single, clear purpose
  • Tools are composable primitives

4. Plan-First, Review-Before-Ship

Rationale:

  • Reduces Broken Builds: Planning catches edge cases early
  • Better Code: Reviewer catches issues before PR
  • Faster Iteration: Fail fast, fix fast

5. Comparison & Innovation

Feature Comparison

FeatureOpen SWEDevinSWE-agentAutoCodeRover
Open Source✅ MIT❌ Proprietary✅ MIT✅ MIT
FrameworkLangGraph + Deep AgentsProprietaryCustomCustom
SandboxPluggable (4+ providers)Built-inE2BCustom
Multi-Agent✅ (4 specialized agents)❓ Unknown❌ Single agent❌ Single agent
Slack Integration✅ Native
Linear Integration✅ Native
Subagent Spawning✅ Native
Mid-Task Messaging
AGENTS.md Support

Key Innovations

  1. Pluggable Sandboxes: Not locked into one provider
  2. Mid-Task Messaging: Humans can steer agents in real-time
  3. AGENTS.md Pattern: Repo-level convention encoding
  4. Composable Architecture: Builds on reusable frameworks
  5. Multi-Platform Triggers: Slack + Linear + GitHub

Trade-offs

DecisionBenefitTrade-off
LangGraph dependencyExpressive orchestrationFramework lock-in
Multi-agent systemSpecialized rolesHigher latency
Sandbox isolationSafe parallel executionSlower startup
AGENTS.mdRepo conventionsRequires file maintenance

6. Future Extensions & Use Cases

Potential Enhancements

User Personas

PersonaGoalOpen SWE Value
Startup CTOBuild internal toolingProduction-ready starting point
Enterprise DevCustomize for org stackPluggable architecture
Open Source MaintainerAutomate PR triageGitHub integration
Engineering ManagerIncrease team velocityAutomate routine tasks
ResearcherStudy agent architecturesTransparent implementation

User Journey Map

Use Cases

  1. Bug Fixes: "Fix the null pointer in user service"
  2. Feature Implementation: "Add dark mode to dashboard"
  3. Test Writing: "Write unit tests for auth module"
  4. Documentation: "Update README with new examples"
  5. Refactoring: "Extract pagination into shared utility"
  6. Dependency Updates: "Upgrade React to v19 and fix breaking changes"

7. Technical Implementation Highlights

Agent Creation (High-Level)

# Concept: How Open SWE creates an agent
from deep_agents import create_deep_agent

agent = create_deep_agent(
model="anthropic:claude-opus-4-6",
system_prompt=construct_system_prompt(repo_dir),
tools=[
http_request,
fetch_url,
commit_and_open_pr,
linear_comment,
slack_thread_reply,
],
backend=sandbox_backend,
middleware=[
ToolErrorMiddleware(),
check_message_queue_before_model,
],
)

Sandbox Backend (Concept)

# Concept: Pluggable sandbox backends
sandbox_backends = {
"modal": ModalSandbox(),
"daytona": DaytonaSandbox(),
"runloop": RunloopSandbox(),
"langsmith": LangSmithSandbox(),
}

# Select based on environment variable
backend = sandbox_backends[os.getenv("SANDBOX_PROVIDER", "langsmith")]

AGENTS.md Example

# Repository Conventions for AI Agents

## Code Style
- Python: PEP 8, 4-space indentation
- JavaScript: 2-space indentation, Prettier
- Run `black .` and `isort .` before committing Python files

## Testing
- All functions need docstrings
- Test coverage must exceed 80%
- Run `pytest` before committing

## Architecture
- Use dependency injection for services
- Database queries go through `repository/` layer
- No circular imports

## Git Commit Format
- Format: `[type] scope: description`
- Types: feat, fix, docs, refactor, test

Middleware Chain (Concept)

# Concept: Middleware execution
async def run_agent_with_middleware(agent, input_data):
for middleware in middleware_chain:
input_data = await middleware.before_agent(input_data)

result = await agent.run(input_data)

for middleware in reversed(middleware_chain):
result = await middleware.after_agent(result)

return result

8. Lessons Learned

What Works Well

  1. LangGraph for Orchestration: State machine approach prevents infinite loops and provides clear hand-offs
  2. Sandbox Isolation: Parallel execution works flawlessly with isolated environments
  3. AGENTS.md Pattern: Simple convention encoding is powerful and maintainable
  4. Tool Curation: 15 focused tools > 100 accumulated tools
  5. Mid-Task Messaging: Enables real-time human steering

Challenges

  1. Framework Dependency: Locked into LangGraph and Deep Agents
  2. Sandbox Startup Time: Cold starts add latency
  3. Cost: Running sandboxes for every task adds up
  4. Context Limits: Large codebases still require careful memory management

Recommendations

  1. Start Simple: Use default configuration, customize gradually
  2. Write AGENTS.md: Invest time in convention encoding
  3. Monitor Sandbox Costs: Set budgets and alerts
  4. Extend Middleware: Add org-specific validation in middleware chain
  5. Contribute Back: Share useful tools and middleware with community

9. Key Takeaways

Core Insight

Open SWE demonstrates that internal coding agents don't need to be proprietary. By composing on LangGraph and Deep Agents, teams can build production-grade agents that match the capabilities of elite engineering orgs.

Design Pattern

The AGENTS.md pattern is a simple yet powerful way to encode repository conventions. It's the "rule files" concept from Stripe, adapted for open source.

Trade-off Awareness

Multi-agent systems add latency. For simple tasks, a single agent might be faster. Use multi-agent architecture for complex tasks that benefit from specialization.

Further Learning


Sources