10 Architecture patterns for AI agents in production
Stripe recently published two articles describing Minions, their internal autonomous coding agents.
- Minions: Stripe’s one-shot, end-to-end coding agents
- Minions: Stripe’s one-shot, end-to-end coding agents—Part 2
Beyond the experience report, these articles reveal a series of architecture patterns for building reliable AI agents in production.
Here are the key patterns that emerge from Stripe’s design.
1. Treat developer attention as a scarce resource
The goal of coding agents is not to replace engineers, but to free up their attention.
Agents handle repetitive but simple tasks:
- fixing flaky tests
- small refactors
- dependency updates
- minor bug fixes
Engineers can then focus on higher-value problems.
Key idea: engineer attention > compute > tokens
Agents convert cheap compute into human attention time.
2. Build agents on top of existing developer infrastructure
Stripe didn’t build a separate AI platform.
Minions reuse existing tools:
- devboxes
- CI pipelines
- linters
- Sourcegraph
- internal documentation
- issue trackers
Principle: if the tools work for engineers, they should work for agents.
3. Isolate execution environments (agent sandboxes)
Autonomous agents must run in fully isolated environments.
Stripe uses devboxes with:
- no access to production
- no user data
- restricted network access
Key idea: autonomy requires sandboxing
Isolation enables high autonomy without significant risk.
4. Pre-warm execution environments
Agents need to start fast.
Stripe maintains pre-warmed devboxes where:
- the repository is already cloned
- caches are ready
- build tools are loaded
Startup time: ~10 seconds.
Pattern: pre-warmed environments
This model is similar to serverless or CI infrastructure.
5. Combine LLM reasoning with deterministic steps
LLMs are powerful but unpredictable.
Stripe wraps LLM reasoning inside deterministic system steps.
Example:
LLM reasoning
↓
deterministic step (git commit)
↓
LLM reasoning
↓
deterministic step (run CI)
Benefits:
- better reliability
- fewer tokens
- fewer agent errors
The LLM decides what to do, the system controls how to execute it.
6. Use state machines for agents (Blueprints)
Stripe orchestrates tasks with Blueprints.
Blueprints are state machines for agents.
Example:
Implement task
↓
Run linters
↓
Push changes
↓
Run CI
↓
Fix failures
Some steps use LLM reasoning, others are deterministic.
This constrains agent unpredictability.
7. Hydrate context before launching the agent
Agents perform better when context is prepared in advance.
Stripe builds this context deterministically.
Example:
Slack message
↓
extract links
↓
fetch tickets
↓
load documentation
↓
build structured context
↓
start agent
This significantly improves the one-shot success rate.
8. Scope context locally
Large codebases can’t rely on a single massive prompt.
Stripe uses file-scoped rules activated based on the relevant folder.
Example:
/payments/* → rules: payments conventions
/infrastructure/* → rules: infra conventions
This keeps prompts focused and relevant.
9. Centralize and curate agent capabilities (Toolshed)
Giving agents too many tools reduces reliability. Stripe’s solution has two parts.
First, they built Toolshed: a single, centralized MCP server that acts as a unified interface to all internal capabilities—documentation, Sourcegraph code intelligence, build statuses, Jira tickets, and more—hosting nearly 500 tools in one place.
Second, rather than exposing all of Toolshed to every agent, each minion receives a deliberately restricted subset of those tools.
Principle: unified interface + smaller action space → better decisions
Centralizing capabilities makes them maintainable; curating them per agent keeps performance high.
10. Design fast feedback loops — and shift them left
Agents need fast, automated feedback, and that feedback should arrive as early as possible.
Stripe uses multiple levels, ordered by speed:
- Pre-push local linting (<5s): before any CI run, a local hook executes selected linters instantly, returning errors directly to the agent and saving both tokens and CI compute
- CI test suite: millions of tests run after a push, with automatic autofixes applied where available
- One retry max: if unfixable failures remain, the agent gets exactly one more local attempt before the branch is handed back to a human
Principle: catch errors at the cheapest stage possible
Iterations are deliberately capped at two CI runs to avoid excessive compute costs, infinite loops, and token waste.
Conclusion
The main innovation lies in engineering discipline around AI agents, not in the model itself.