Skip to main content
Research
Engineering & AI15 min read min read

AI Agent Architecture: Production Patterns from the Field

Building an agent that works in a demo takes a day. Building one that runs reliably in production for six months takes a team. These are the architectural patterns that separate the two.

AuthorAbhishek Sharma· Fordel Studios

Agent demos are easy. You chain a few LLM calls, wire up some tools, and show the system completing a task. The audience is impressed. Then you try to deploy it. The tool returns a malformed response. The LLM loops. The context window fills. Costs spike. A user's edge case crashes the whole flow.

Production agent architecture is distributed systems engineering applied to non-deterministic components. The patterns that make backend services reliable — idempotency, circuit breakers, observability, graceful degradation — all apply. The difference is that your components now include an LLM that can surprise you.

···

Pattern 1: The Orchestrator-Worker Split

The most reliable agent architectures separate orchestration from execution. An orchestrator agent plans and routes — it decides what needs to happen, in what order, with what tools. Worker agents execute discrete, bounded tasks and return structured results. The orchestrator never executes directly; workers never plan.

This separation has concrete benefits. Workers can be tested in isolation with fixed inputs. Their failure modes are bounded and predictable. The orchestrator's behavior is visible as a sequence of routing decisions. When something goes wrong, you know immediately whether the failure was in planning or execution.

The anti-pattern is a single monolithic agent that both plans and executes. It works for demos. It becomes unmaintainable when the task space grows, because the system prompt balloons, context fills with execution history, and reasoning quality degrades.

Orchestrator-Worker Rules
  • Orchestrator emits structured task objects, never raw text instructions to workers.
  • Workers return structured results with a status field: success, failure, needs_human.
  • Orchestrator never retries workers inline — failures go to a retry queue.
  • Workers are stateless — all state lives in the orchestrator's state object.
  • Each worker has a documented input schema and output schema. Treat them like microservices.

Pattern 2: Explicit State Machines

Every agent workflow has implicit states: gathering information, validating inputs, executing actions, waiting for approval, completing. Make them explicit. A well-designed agent state machine defines what transitions are valid from each state, what triggers them, and what happens when a transition fails.

The payoff is observability and recovery. When an agent run fails, you know exactly which state it was in. Recovery means reinserting the job at the failed state, not re-running from scratch. For long-running workflows that involve expensive tool calls, this is the difference between a 5-second retry and a 5-minute retry.

StateEntry ConditionValid TransitionsFailure Handling
INTAKENew job receivedVALIDATING, REJECTEDReturn error to caller
VALIDATINGIntake completePLANNING, REJECTEDDead-letter queue
PLANNINGValidation passedEXECUTING, FAILEDRetry up to 3x then escalate
EXECUTINGPlan approvedREVIEWING, FAILEDCheckpoint and retry from last step
REVIEWINGExecution completeCOMPLETE, REVISINGHuman escalation
COMPLETEReview passedN/A
···

Pattern 3: Tool Call Hardening

Tools are the most common failure point in production agent systems. External APIs return unexpected formats. Rate limits hit. Services go down. The agent tries to parse a 500 error response as a valid tool result and hallucinates from there.

Every tool wrapper in production should validate its input before calling the external API (the LLM sometimes generates invalid arguments), validate the output schema before returning to the agent, implement retry logic with exponential backoff, respect rate limits with a token bucket or leaky bucket implementation, and return a structured error object rather than throwing exceptions. A structured error gives the LLM something to reason about; an exception gives it nothing.

···

Pattern 4: Cost and Token Budget Management

Unmanaged agent cost is the reason many internal agent projects get cancelled. A workflow that costs $0.02 in development can cost $2.00 in production when real users hit edge cases, the agent loops, or context grows beyond what was tested. Multiply by thousands of daily runs and you have a cost center that kills ROI.

Implementing Token Budget Management

01
Define a per-run token budget

Set a hard limit for total tokens (input + output) per agent run. This is a business decision as much as a technical one — what is the maximum acceptable cost for one execution? Start conservative and widen as you validate.

02
Instrument every LLM call

Every call to an LLM must log prompt tokens, completion tokens, model, and timestamp. Aggregate per-run. This is non-negotiable for production. You cannot optimize what you cannot measure.

03
Add context summarization nodes

For workflows that accumulate history, add periodic summarization: compress the last N messages into a summary and drop the originals from context. Implement this as a node in your state machine, triggered when context exceeds a threshold.

04
Use cheaper models for cheaper subtasks

Not every step needs the most capable model. Tool call parsing, format validation, and simple classification can run on smaller models at 10x lower cost. Reserve the large model for complex reasoning steps.

05
Set circuit breakers

If a run has consumed 80% of its token budget, emit an alert and route to a fallback path. Do not let runaway loops exhaust budgets silently. The fallback can be a human escalation, a simplified answer, or a graceful decline.


Pattern 5: Human-in-the-Loop Escalation

Fully autonomous agents are appropriate for bounded, low-stakes tasks. For anything involving financial decisions, customer-facing communication, compliance actions, or irreversible operations, you need human escalation paths. Building these in from the start is far easier than retrofitting them after an incident.

The key design question is not "should the agent escalate" but "what information does the human need to make a good decision quickly." A human escalation that dumps raw agent context is nearly unusable. A good escalation presents: what the agent was trying to do, what decision it needs made, the relevant options, and the expected impact of each. Design that interface with the same care as any user-facing product.

43%of enterprise agent deployments added human escalation after a production incidentEstimated from public case studies and engineering post-mortems
···

Observability as a First-Class Concern

Traces, not just logs. Every agent run should produce a trace: the sequence of LLM calls, tool calls, state transitions, and decisions made. Structured traces enable both real-time monitoring and post-hoc debugging. LangSmith, Langfuse, and Arize Phoenix all provide agent-specific tracing. Pick one before you deploy.

The metrics that matter in production: run success rate, average cost per run, p95 latency, tool error rate by tool, escalation rate, and context utilization percentage. Alert on run success rate dropping below baseline and cost-per-run exceeding 2x the rolling average. Everything else is nice to have.

You cannot debug a system you cannot observe. Agent systems are not special — they follow the same rule as every other distributed component.
Keep Exploring

Related services, agents, and capabilities

Services
01
AI Agent DevelopmentAgents that ship to production — not just pass a demo.
02
Cloud Architecture & DevOpsInfrastructure that runs AI workloads without surprising your budget.
03
Machine Learning EngineeringMLOps that gets models from notebooks to production and keeps them working.
Agents
04
Customer Support AgentResolve support tickets with context-aware AI, not canned responses.
05
Financial Compliance MonitorContinuous regulatory monitoring with automated obligation mapping.
06
Healthcare Prior Authorization AgentAutomate prior auth from clinical documentation to payer submission.
Capabilities
07
AI Agent DevelopmentAutonomous systems that act, not just answer
08
Backend DevelopmentThe infrastructure that makes AI-powered systems reliable
Industries
09
SaaSThe SaaSocalypse narrative is real and it is not done. Cursor with Claude built Anysphere into a $2.5B company selling to developers who used to pay for multiple separate tools. Bolt, Lovable, and Replit Agent are letting non-engineers ship MVPs in hours. Zero-seat software is emerging — AI agents as the only users of your API, with no human seat count to price against. The "wrapper problem" is killing thin AI wrappers with no moat. Single-person billion-dollar companies are no longer theoretical. Vertical AI is eating horizontal SaaS in category after category. And the great SaaS repricing is underway: customers are refusing to renew at legacy prices when AI does the same job for less.
10
FinanceAI-first neobanks are emerging. Bloomberg GPT and domain-specific financial LLMs are in production. Upstart and Zest AI are disrupting FICO-based credit scoring. Deepfake voice fraud is hitting bank call centers at scale. The RegTech market is heading toward $20B+ as compliance automation replaces compliance headcount. JP Morgan's LOXM and Goldman's AI initiatives are setting expectations for what institutional-grade financial AI looks like — and the compliance infrastructure required to deploy it.