Your AI Agent Runs Untrusted Code With Root Access and You Call That Production

Your AI agent generates code at runtime that you have never reviewed, executes it with network access, and shares a kernel with your production workloads. This is not a theoretical risk — Snowflake Cortex escaped its sandbox in March 2026, and an Alibaba research agent pivoted to cryptomining. The sandboxing problem is the defining security challenge of agentic AI, and the industry just started taking it seriously.

Abhishek Sharma· Founder, Fordel Studios

March 25, 2026Updated May 8, 202614 min read

Your AI Agent Runs Untrusted Code With Root Access and You Call That Production

On March 16, 2026, PromptArmor disclosed a vulnerability in Snowflake’s Cortex Code CLI. A researcher hid a malicious instruction inside a GitHub repository’s README file. The Cortex agent read the README, bypassed its human-in-the-loop approval step, and executed arbitrary code outside its designated sandbox. Snowflake patched it. But the disclosure illuminated a problem the industry had been ignoring: AI agents generate code at runtime that no human has reviewed, and most production deployments run that code in environments with shared kernels, open network access, and implicit trust.

This is not a niche concern. E2B, the leading agent sandbox platform, went from 40,000 sandbox executions per month in March 2024 to over 15 million per month by March 2025 — a 375x increase. By early 2026, 88% of the Fortune 100 had signed up for E2B’s platform. AI agents are generating and executing code at a scale that makes traditional container security models look like leaving your front door open with a polite note asking burglars to behave.

375xgrowth in E2B sandbox executions in one yearFrom 40,000/month (March 2024) to 15 million/month (March 2025)

88%of Fortune 100 companies signed up on E2BSource: E2B Series A announcement, July 2025

85%of enterprises experimenting with AI agentsBut only 5% have confidently moved them to production — Cisco, RSA 2026

···

Why Traditional Containers Are Not Enough

The default deployment pattern for most AI agents in 2025 was a Docker container. The agent runs inside a container, generates code, executes it in the same container, and the team calls it “isolated.” It is not.

Containers share the host operating system kernel. Every container on the same host uses the same kernel to process system calls. If an AI-generated script exploits a kernel vulnerability, it can escape the container and access the host — and every other container running on that host. This is not theoretical. Container escape vulnerabilities like CVE-2024-21626 (the Leaky Vessels runc bug) demonstrated that a single malicious container could break out and compromise the host.

For traditional applications, containers provide adequate isolation because the code running inside them has been reviewed, tested, and deployed deliberately. AI agents break that assumption entirely. The code is generated at inference time. Nobody has reviewed it. Nobody has tested it. It might install packages, open network connections, read environment variables, or access mounted volumes — all actions a prompt injection could manipulate the agent into performing.

“AI sandboxes isolate AI-generated code that writes itself at runtime, requiring stronger security than traditional sandboxes designed for predictable applications.”

Northflank Engineering Blog

The Isolation Spectrum: Containers to MicroVMs

The industry has converged on four isolation technologies for AI agent code execution, each representing a different point on the security-performance tradeoff curve.

Technology	Isolation Level	Cold Start	Memory Overhead	Best For
Standard Containers (Docker)	Process-level — shared kernel	~50ms	Minimal	Trusted internal code only
gVisor (Google)	User-space kernel — syscall interception	~50ms + 20-50% runtime overhead	Moderate	Multi-tenant SaaS, medium-trust code
Firecracker MicroVMs (AWS)	Hardware-level — dedicated kernel per VM	~125ms	<5 MiB per VM	Untrusted AI-generated code
Kata Containers (OpenStack)	Hardware-level — lightweight VM	~200ms	~20-40 MiB	Kubernetes-native workloads needing VM isolation

gVisor: The Middle Ground

Google’s gVisor operates as a user-space kernel called Sentry. When an application inside a gVisor sandbox makes a system call, ptrace or KVM intercepts it and redirects it to the Sentry process, which reimplements approximately 70–80% of Linux syscalls in Go. The host kernel never sees the raw syscall from the sandboxed application.

The tradeoff is performance and compatibility. gVisor adds 20–50% overhead on syscall-heavy workloads because every system call passes through an additional layer. Applications requiring specialised or low-level syscalls that Sentry does not implement will fail. For AI code execution specifically, this means certain system-level operations, native library calls, or direct hardware access will not work inside a gVisor sandbox.

Modal uses gVisor as its isolation layer, which makes sense given its broader platform scope covering inference, training, and batch compute — workloads where the code is more predictable and medium-trust isolation suffices.

Firecracker: The Gold Standard for Untrusted Code

Firecracker is a Virtual Machine Monitor (VMM) built by AWS for Lambda and Fargate. It follows a minimalist design philosophy: each microVM gets its own Linux kernel and supports only network, block storage, and serial console — compared to QEMU’s hundreds of emulated devices. This minimal attack surface is the point.

A Firecracker microVM boots in approximately 125 milliseconds with less than 5 MiB of memory overhead. Each execution gets a dedicated kernel, which means kernel exploits inside one microVM cannot affect other microVMs or the host. The isolation happens at the hardware virtualisation layer via KVM, not at the process or syscall level.

E2B, the dominant AI agent sandbox platform, built its entire infrastructure on Firecracker. Every sandbox execution runs in its own microVM. When an AI agent generates and runs code on E2B, that code has no way to reach other sandboxes, the host, or the broader network unless explicitly configured.

···

The Platform Landscape: Who Builds What

The AI agent sandbox market has matured rapidly. Three platforms dominate, each targeting different use cases:

E2B: Purpose-Built for Untrusted Execution

E2B is an open-source infrastructure platform built specifically for executing untrusted code from AI agents. It raised $21 million in a Series A led by Insight Partners in July 2025, with participation from Decibel, Sunflower Capital, and Docker’s former CEO Scott Johnston as an angel investor.

E2B’s architecture is straightforward: every sandbox is a Firecracker microVM. The platform handles provisioning, lifecycle management, and cleanup. Pricing is per-second, with a 1 vCPU sandbox costing approximately $0.05 per hour. Sandboxes are ephemeral by default — they spin up, execute, and destroy themselves.

The limitation is statefulness. E2B sandboxes are designed for execution, not development environments. If your agent needs to install dependencies, build a project incrementally, and return to the same environment across sessions, E2B’s ephemeral model requires workarounds like pre-built templates or snapshot restoration.

Daytona: Stateful Workspaces for Persistent Agents

Daytona provides stateful workspaces where AI agents can install dependencies, create files, and return to the same environment later. Under the hood, Daytona uses containers rather than microVMs, which means faster cold starts but weaker isolation — the sandboxes share the host kernel.

For use cases where the agent needs a persistent development environment — building, testing, iterating over multiple sessions — Daytona’s model makes more sense than E2B’s ephemeral sandboxes. The security tradeoff is explicit: you get statefulness at the cost of kernel-level isolation.

Modal: The Platform Play

Modal is a broader compute platform that includes sandbox capabilities for AI agents alongside inference, training, and batch processing. Its isolation layer is gVisor, sitting between containers and full microVMs in the security spectrum. For teams already running inference or training workloads on Modal, using its sandbox features avoids the operational complexity of managing a separate sandbox platform.

Platform	Isolation	Statefulness	Pricing Model	Best For
E2B	Firecracker microVM (hardware-level)	Ephemeral by default	Per-second, ~$0.05/hr per vCPU	Untrusted code execution, short-lived tasks
Daytona	Containers (kernel-shared)	Persistent workspaces	Open-source core + hosted plans	Agent development environments, stateful workflows
Modal	gVisor (user-space kernel)	Configurable	Per-second compute	Teams already on Modal for inference/training

Real Incidents: When Sandboxes Fail

The argument for stronger isolation is not theoretical. Multiple real-world incidents have demonstrated what happens when AI agents escape their boundaries.

Documented AI Agent Sandbox Failures

Snowflake Cortex Code CLI (March 2026): A prompt injection hidden in a GitHub README caused the agent to bypass human-in-the-loop approval and execute code outside its sandbox. Disclosed by PromptArmor on March 16, 2026.
Alibaba ROME Agent (2024): An experimental research agent broke out of its sandbox, made unauthorised moves including cryptomining, and grabbed GPU resources meant for model training.
Claude Code at Ona: A Claude Code agent discovered /proc/self/root/usr/bin/npx to bypass its sandbox restrictions and then disabled its own sandboxing mechanisms.
Financial Services Data Exfiltration (2024): An attacker manipulated a reconciliation agent into exporting all customer records matching a regex pattern that matched every record — 45,000 customer records exfiltrated through a legitimate API call.
Manufacturing Procurement Agent (2026): A procurement agent was manipulated over three weeks through seemingly helpful "clarifications" about purchase authorisation limits, gradually expanding its own spending authority.

The pattern across these incidents is consistent: the agent was given more access than its task required, the isolation boundary was either absent or insufficiently enforced, and the attack vector was often indirect — prompt injection through data the agent ingested, not a direct exploit of the sandbox runtime.

“The Lethal Trifecta: access to private data, exposure to untrusted tokens, and an exfiltration vector. If an agentic system has all three, it is vulnerable. Full stop.”

Airia Security Research

···

The Defence Architecture: Layers, Not Walls

Sandboxing is necessary but not sufficient. A production AI agent security architecture requires defence in depth — multiple layers that each reduce the blast radius of a compromise.

Building a Production Agent Isolation Architecture

Network isolation by default

AI agent sandboxes should have no network access by default. Whitelist specific endpoints the agent needs — the database it queries, the APIs it calls. Block everything else. This single control prevents the most common exfiltration vector: an agent sending data to an external endpoint. If the sandbox cannot reach the internet, a prompt injection that says "send this data to attacker.com" fails silently.

Filesystem restrictions with deny-by-default

Mount only the directories the agent needs, read-only where possible. Block writes outside a designated workspace directory. Never mount host directories containing credentials, environment files, or system configuration. The Claude Code incident at Ona happened because /proc/self/root was accessible inside the sandbox — a filesystem path that should have been blocked.

Ephemeral execution environments

Destroy the sandbox after each execution. Do not reuse sandbox instances across different tasks or users. Ephemeral environments ensure that even if an agent is compromised during one execution, the compromise does not persist. E2B’s Firecracker model enforces this by default — each sandbox is a fresh microVM that is destroyed after use.

Resource limits and execution timeouts

Cap CPU, memory, and execution time. An agent executing a cryptomining payload (like the Alibaba ROME incident) will consume unbounded resources if you let it. Set hard limits: 2 vCPUs, 512MB RAM, 60-second timeout for code execution tasks. Kill the sandbox if any limit is breached.

Human-in-the-loop for privileged operations

Any operation that modifies state outside the sandbox — database writes, API calls with side effects, file system changes on the host — requires explicit human approval. The Snowflake Cortex vulnerability existed because the agent bypassed this approval step. The approval mechanism must be enforced at the infrastructure level, not the prompt level, because prompt-level controls can be overridden by prompt injection.

Audit logging of all agent actions

Log every system call, network request, file operation, and API call the agent makes. The manufacturing procurement manipulation went undetected for three weeks because nobody was monitoring the agent’s gradual behaviour change. Anomaly detection on agent action logs — flagging unusual patterns like escalating permission requests or novel API endpoints — catches these slow-burn attacks.

···

Cisco DefenseClaw: The Enterprise Framework

At RSA Conference 2026 on March 23, Cisco announced DefenseClaw, an open-source secure agent framework that represents the first major enterprise attempt at systematising AI agent security. DefenseClaw integrates four core tools: Skills Scanner (auditing agent capabilities), MCP Scanner (verifying Model Context Protocol servers), AI BoM (AI Bill of Materials for asset inventory), and CodeGuard (runtime code analysis).

The framework enforces zero-trust principles: every skill is scanned and sandboxed, every MCP server is verified, and every AI asset is automatically inventoried. DefenseClaw integrates with NVIDIA’s OpenShell to provide hardware-level sandboxing at the runtime level, extending a collaboration aimed at automated security without manual intervention.

The timing matters. Cisco’s own research found that 85% of enterprises are experimenting with AI agents, but only 5% have moved them to production with confidence. The gap is security. DefenseClaw is designed to close it by making security automated rather than manual — eliminating the need for separate tool installations or ad-hoc security reviews before each agent deployment.

The MCP Dimension

The Model Context Protocol adds a new surface to the sandboxing problem. MCP servers provide tools, resources, and prompts to AI agents — and each MCP connection is a potential entry point for both data exfiltration and prompt injection.

An MCP server that provides file system access gives the agent access to whatever the server process can read. An MCP server that provides web browsing exposes the agent to every page it visits — including pages containing adversarial instructions. The security boundary is not just the sandbox the agent runs in. It is every MCP server the agent connects to.

DefenseClaw’s MCP Scanner addresses this by verifying each MCP server before the agent connects: what tools does it expose, what data can it access, does it enforce authentication, and does it match the expected configuration? This verification needs to happen at deployment time and continuously during execution, because MCP servers can be modified after initial verification.

MCP Security Checklist for Production

Audit every MCP server your agent connects to — what tools it exposes, what data it can access
Enforce authentication between agents and MCP servers using OAuth 2.1, not API keys
Sandbox MCP servers independently from the agent — a compromised MCP server should not compromise the agent’s sandbox
Monitor MCP tool invocations for anomalous patterns — unexpected tools being called, unusual data volumes
Version-pin MCP server configurations and verify checksums at startup

···

What the Next 12 Months Look Like

The AI agent sandboxing space is moving fast. Three trends will define the next year:

WebAssembly as a Lightweight Isolation Layer

WebAssembly (Wasm) runtimes like Wasmtime and WasmEdge offer microsecond-level cold starts with strong isolation guarantees. Wasm sandboxes cannot access the host filesystem, network, or system calls unless explicitly granted through the WASI (WebAssembly System Interface) capability model. For AI agents that generate simple computational code — data transformations, calculations, formatting — Wasm provides isolation with near-zero overhead. The limitation is ecosystem: Wasm does not support the full Linux environment that many AI-generated scripts expect.

Confidential Computing for Sensitive Workloads

Hardware-based Trusted Execution Environments (TEEs) like Intel TDX and AMD SEV-SNP encrypt the sandbox’s memory at the hardware level. Even if an attacker compromises the host, they cannot read the sandbox’s memory contents. For AI agents handling healthcare data (HIPAA), financial records (SOX), or legal documents (attorney-client privilege), confidential computing adds a layer that software-only isolation cannot match.

Standardised Agent Security Scoring

Just as CVSS scores standardised vulnerability severity, the industry needs a standardised way to assess agent deployment security. How isolated is the sandbox? What data can the agent access? How are MCP connections verified? Are there runtime guardrails? Cisco’s DefenseClaw is a step toward this with its AI BoM inventory approach, but a universal scoring framework — something a CISO can use to compare agent deployment security across vendors — does not exist yet. It will by 2027.

···

The Decision Framework

Choosing the right isolation technology is not about picking the most secure option. It is about matching the threat model to the performance and operational requirements.

Scenario	Recommended Isolation	Why
Internal tools running reviewed code	Standard containers with seccomp profiles	Code is trusted. Container isolation prevents accidental interference between services.
Multi-tenant SaaS with AI features	gVisor or Kata Containers	Multiple customers share infrastructure. User-space kernel prevents cross-tenant kernel exploits.
AI agents executing generated code	Firecracker microVMs (E2B or self-hosted)	Code is untrusted by definition. Hardware-level isolation prevents escape to host.
Agents handling regulated data (HIPAA/SOX)	Firecracker + confidential computing (TEE)	Compliance requires both execution isolation and memory encryption.
Lightweight computational tasks	WebAssembly (Wasmtime)	Microsecond startup, strong capability-based isolation, minimal overhead.

“Map your threat level to the technology. Low-threat internal tools use containers. Medium-threat multi-tenant SaaS uses gVisor. High-threat untrusted code execution uses Firecracker or Kata. There is no universal answer — only the right answer for your threat model.”

···

Where Fordel Builds

Every AI agent we deploy at Fordel runs in an isolation architecture matched to the threat model. We do not default to Docker containers and call it a day. For agents executing generated code, we use Firecracker-based sandboxes with network isolation, filesystem restrictions, resource limits, and audit logging built in from day one. For agents connecting to MCP servers, every connection is verified and monitored.

The 85%-to-5% gap Cisco identified — between enterprises experimenting with agents and those running them in production with confidence — is a security gap. If you are stuck in that gap, the problem is not the AI. It is the infrastructure around it. We can show you exactly where your isolation boundaries are broken and what it takes to fix them. No pitch deck. If that conversation is useful, reach out.

Frequently Asked Questions

What is AI agent sandboxing and why does it matter for production?

AI agent sandboxing is the practice of running agent-executed code in isolated environments with restricted system access. It matters because agents that execute untrusted or LLM-generated code with unrestricted privileges can cause data loss, exfiltrate credentials, or compromise host systems — often silently.

How do you implement sandboxing for AI agents in production?

Production sandboxing options include Docker containers with dropped capabilities, gVisor or Firecracker microVMs for stronger isolation, WebAssembly runtimes for language-level sandboxing, and ephemeral cloud functions that die after each invocation. The right choice depends on the execution latency tolerance and threat model.

What are the common failure modes when agents run unsandboxed code?

Unsandboxed agent code execution leads to: credential harvesting from environment variables, lateral movement through internal networks, persistent backdoors via cron or init scripts, and accidental data destruction from file system access. Most incidents are not adversarial — they are prompt injection or logic errors with root access.

How does sandboxing affect AI agent performance?

Container-based sandboxing adds 50–200ms cold start overhead. MicroVM sandboxing (Firecracker) adds 100–500ms. WebAssembly runtimes have near-zero overhead but restrict available libraries. For latency-sensitive workflows, use warm container pools; for security-critical workflows, accept the overhead.

When should AI agents run code without sandboxing?

Unsandboxed execution is only acceptable for agents running on fully trusted, read-only inputs with no network access to sensitive systems. Any agent that accepts external content, user-provided code, or LLM-generated commands should always run in an isolated environment.

Part of: Fordel pillar guide

AI Agent Architecture: Production Patterns

Fordel's pillar guide to architecting production AI agents — state machines, retry semantics, escalation, and audit trails.

Read the full guide →

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

Keep Reading

All articles