10 Architecture patterns for AI agents in production

Stripe recently published two articles describing Minions, their internal autonomous coding agents.

Beyond the experience report, these articles reveal a series of architecture patterns for building reliable AI agents in production.

Here are the key patterns that emerge from Stripe’s design.

1. Treat developer attention as a scarce resource

The goal of coding agents is not to replace engineers, but to free up their attention.

Agents handle repetitive but simple tasks:

fixing flaky tests
small refactors
dependency updates
minor bug fixes

Engineers can then focus on higher-value problems.

Key idea: engineer attention > compute > tokens

Agents convert cheap compute into human attention time.

2. Build agents on top of existing developer infrastructure

Stripe didn’t build a separate AI platform.

Minions reuse existing tools:

devboxes
CI pipelines
linters
Sourcegraph
internal documentation
issue trackers

Principle: if the tools work for engineers, they should work for agents.

3. Isolate execution environments (agent sandboxes)

Autonomous agents must run in fully isolated environments.

Stripe uses devboxes with:

no access to production
no user data
restricted network access

Key idea: autonomy requires sandboxing

Isolation enables high autonomy without significant risk.

4. Pre-warm execution environments

Agents need to start fast.

Stripe maintains pre-warmed devboxes where:

the repository is already cloned
caches are ready
build tools are loaded

Startup time: ~10 seconds.

Pattern: pre-warmed environments

This model is similar to serverless or CI infrastructure.

5. Combine LLM reasoning with deterministic steps

LLMs are powerful but unpredictable.

Stripe wraps LLM reasoning inside deterministic system steps.

Example:

LLM reasoning
↓
deterministic step (git commit)
↓
LLM reasoning
↓
deterministic step (run CI)

Benefits:

better reliability
fewer tokens
fewer agent errors

The LLM decides what to do, the system controls how to execute it.

6. Use state machines for agents (Blueprints)

Stripe orchestrates tasks with Blueprints.

Blueprints are state machines for agents.

Example:

Implement task
↓
Run linters
↓
Push changes
↓
Run CI
↓
Fix failures

Some steps use LLM reasoning, others are deterministic.

This constrains agent unpredictability.

7. Hydrate context before launching the agent

Agents perform better when context is prepared in advance.

Stripe builds this context deterministically.

Example:

Slack message
↓
extract links
↓
fetch tickets
↓
load documentation
↓
build structured context
↓
start agent

This significantly improves the one-shot success rate.

8. Scope context locally

Large codebases can’t rely on a single massive prompt.

Stripe uses file-scoped rules activated based on the relevant folder.

Example:

/payments/*       → rules: payments conventions
/infrastructure/* → rules: infra conventions

This keeps prompts focused and relevant.

9. Centralize and curate agent capabilities (Toolshed)

Giving agents too many tools reduces reliability. Stripe’s solution has two parts.

First, they built Toolshed: a single, centralized MCP server that acts as a unified interface to all internal capabilities—documentation, Sourcegraph code intelligence, build statuses, Jira tickets, and more—hosting nearly 500 tools in one place.

Second, rather than exposing all of Toolshed to every agent, each minion receives a deliberately restricted subset of those tools.

Principle: unified interface + smaller action space → better decisions

Centralizing capabilities makes them maintainable; curating them per agent keeps performance high.

10. Design fast feedback loops — and shift them left

Agents need fast, automated feedback, and that feedback should arrive as early as possible.

Stripe uses multiple levels, ordered by speed:

Pre-push local linting (<5s): before any CI run, a local hook executes selected linters instantly, returning errors directly to the agent and saving both tokens and CI compute
CI test suite: millions of tests run after a push, with automatic autofixes applied where available
One retry max: if unfixable failures remain, the agent gets exactly one more local attempt before the branch is handed back to a human

Principle: catch errors at the cheapest stage possible

Iterations are deliberately capped at two CI runs to avoid excessive compute costs, infinite loops, and token waste.

Conclusion

The main innovation lies in engineering discipline around AI agents, not in the model itself.