Skip to main content

Before You Deploy an AI Agent: 12 Things Engineers Skip

A practical pre-deployment checklist for AI agents — 12 things most engineering teams consistently skip that will bite them in production.

Abhishek Sharma· Head of Engg @ Fordel Studios
7 min read
Before You Deploy an AI Agent: 12 Things Engineers Skip

You tested the happy path. The agent answered correctly in staging. You shipped it. Three days later it is burning tokens, hallucinating tool calls, and your on-call engineer has no idea how to stop it. This checklist prevents that conversation.

What Should You Check Before Deploying an AI Agent?

1. Define Your Failure Modes Before You Ship

Run a structured pre-mortem: what happens when the LLM returns garbage, the tool call times out, or the agent loops? Document each failure path and the system response to it. An agent with no defined failure behavior will fail in the worst possible way at the worst possible time.

2. Test With Adversarial Inputs, Not Just Golden Paths

Run prompt injection attempts, jailbreak probes, and boundary-pushing inputs against every exposed entrypoint before launch. Most teams test with clean inputs and ship. Attackers do not send clean inputs. A single successful prompt injection can redirect tool calls, exfiltrate context, or bypass your entire business logic layer.

3. Lock Down Tool Permissions to the Minimum Required

Audit every tool the agent can call and remove anything it does not need for its defined task. Overpermissioned agents are one of the most common production security failures. An agent that can read emails, write files, and call external APIs simultaneously is a high-value attack surface. Apply least-privilege at the tool level, not just the API level.

4. Add an Output Validator Before Anything Leaves the Agent

Validate agent output structure, length, and content policy compliance before it reaches users or downstream systems. Raw LLM output is untrusted data — treat it exactly like user input. Schema-validate it, run it through a content safety check, and do not assume the model will stay within the format you specified.

5. Instrument With Distributed Traces, Not Just Logs

Add OpenTelemetry spans or a purpose-built agent tracing tool (Langfuse, Arize, Braintrust) before you ship. Logs tell you what happened; traces tell you why the agent made each decision, which tool it called in which order, and where latency spiked. Without traces, debugging production agent failures is archaeology.

6. Cap Token Spend Per Invocation With a Hard Stop

Set maximum token budgets per agent run and enforce them at the infrastructure level, not just in prompts. An agent in a runaway loop or processing unexpectedly complex inputs will spend 100x your expected cost before you notice. Without a hard cap, one bad request can crater your monthly AI bill.

7. Test the Memory Reset Path Explicitly

Verify that agent memory, session context, and any persistent state clears correctly between users and sessions. Cross-session context leakage is a real production bug — one user seeing another user's private information in the agent session is a data breach, not a minor glitch. Test this path deliberately and repeatedly.

8. Define and Test Your Human-in-the-Loop Escalation Path

Identify every decision type the agent should not make autonomously and build an explicit handoff path to a human. Agents that approve transactions, send communications, or modify data need an escalation path that actually works — not one that exists only in the product specification document.

9. Load Test With Concurrent Agent Invocations

Simulate parallel agent runs against your infrastructure before real traffic arrives. AI agents are stateful, tool-dependent, and often share rate-limited external APIs. Concurrency failures — race conditions, tool call collisions, LLM rate limit exhaustion — do not appear in single-request staging tests.

10. Audit Every Third-Party Tool Call for Data Leakage

Review what data each tool call sends out of your system and to which external services. Agents that call search APIs, send emails, or log to third-party services may inadvertently exfiltrate sensitive user data. Map the data flow for every external call and apply the same compliance review you would apply to any external API integration.

11. Build and Test Your Kill Switch

Implement a mechanism to disable the agent — or specific capabilities — without a full deployment. A feature flag, a database toggle, an environment variable — it does not matter, as long as your on-call engineer can use it at 2am without touching code. Every agent in production needs a working off button.

12. Document What the Agent Can and Cannot Decide

Write a one-page decision authority document: what the agent can do autonomously, what requires human confirmation, and what it must never do. Engineers maintaining the agent need to understand the original intent, not just the implementation. Undocumented authority boundaries drift — and when they drift, the agent starts making decisions it was never meant to make.

···

Is This Checklist Enough to Ship Safely?

For most web and SaaS applications, this covers the engineering baseline. Regulated industries, financial transactions, and healthcare require additional compliance layers on top. But the majority of production AI agent incidents trace back to exactly these 12 oversights. Run the checklist before you ship, not after the post-mortem.

The 12-Point AI Agent Pre-Production Checklist
  • Define your failure modes before you ship
  • Test with adversarial inputs, not just golden paths
  • Lock down tool permissions to the minimum required
  • Add an output validator before anything leaves the agent
  • Instrument with distributed traces, not just logs
  • Cap token spend per invocation with a hard stop
  • Test the memory reset path explicitly
  • Define and test your human-in-the-loop escalation path
  • Load test with concurrent agent invocations
  • Audit every third-party tool call for data leakage
  • Build and test your kill switch
  • Document what the agent can and cannot decide
Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...