AI & Agents 17 May 2026 · 4 min

Sandboxed Agents Are A Runtime Feature, Not A Policy

By wGrow Project Team · 17 May 2026

OpenAI’s built-in code-execution tool runs code inside a sandboxed environment [S1]. This prevents an agent from deleting the host operating system. It does not prevent an agent from executing a perfectly valid, fully destructive database query. Sandboxes isolate the runtime. They do not enforce business logic.

That distinction matters more than most teams realize until it’s too late.

The Runtime Sandbox Illusion

Control Boundaries

Runtime Sandbox

Strict IAM & VPC

Stops Arbitrary OS Code

✓

✗

Prevents Host FS Damage

✓

✗

Blocks Destructive SQL

✗

✓

Prevents API Quota Burn

✗

✓

A runtime sandbox mitigates arbitrary code execution risks and contains filesystem damage — genuinely useful constraints. But it doesn’t understand your data model, your retention policy, or your DR tier. An agent operating inside a sandbox can still call any tool its execution role permits. If that role permits a DROP TABLE, the sandbox watches it happen cleanly.

The right architectural stance: treat AI agents exactly like untrusted third-party vendors. You don’t hand a vendor unrestricted access to your production database because they signed an NDA. You give them a scoped credential and log every call. The system prompt is equivalent to the NDA. It is a suggestion. Hard IAM and strict network boundaries are the actual contract.

Infrastructure dictates security. Everything else is commentary.

The Cost of Open Egress

Technical illustration showing blocked network egress from an isolated container.

In 2023 we built an internal code review agent. The brief: pull a diff, run static analysis, flag issues. We gave it full network access to fetch dependencies and run linter checks against external registries — standard setup at the time, and in retrospect, the obvious mistake.

The agent hit a logic loop on a noisy, high-churn diff. It began calling external linters repeatedly, each pass generating new findings that triggered further passes. It burned through the configured monthly API quota in roughly four hours [S2]. No data was exfiltrated. No files were corrupted. A runtime sandbox would not have changed the outcome, because the agent was operating exactly as permitted within its execution environment.

The failure was open egress — no spending cap on outbound API calls, no network boundary preventing recursive external calls.

The pattern you need isn’t complicated. Network-off by default. Agents operate in isolated VPCs. You maintain an allowlist of specific endpoints the agent is permitted to call. All other egress is denied at the infrastructure layer, not in the system prompt. “Do not call external linters more than three times” is not a network policy.

Hardcoding Boundaries Through IAM Roles

A deep-tech investee in our portfolio builds IoT-based water system monitoring infrastructure. We built a telemetry agent to evaluate maintenance alerts from raw sensor ingestion pipelines: classify anomalies, flag calibration drift, surface priority events for field engineers. Straightforward in scope. Consequential in execution.

LLMs hallucinate. That is not a criticism; it is an operational fact you plan around. During testing, the agent generated a faulty diagnostic conclusion that would have triggered a sensor recalibration command against a correctly functioning baseline. The conclusion was coherent, well-structured, and wrong.

The agent had read-only access to the telemetry database. The ingestion pipeline was isolated behind a separate service boundary. The agent could not issue a write, regardless of what it concluded. No prompt instruction stopped it. The IAM role stopped it.

If the agent had possessed direct write access, a hallucinated recalibration command would have overwritten actual calibration baselines. Field engineers would have been dispatched to fix sensors that were not broken. Telling an agent “do not overwrite baselines” in a system prompt is a failure of architecture.

Bind the agent’s execution role to the minimum privilege required for its task. When the agent attempts an unauthorized tool call, the cloud provider drops the request. The model’s confidence in its conclusion is irrelevant.

Human Release Gates via State Hydration

A software engineer focuses on reviewing system logs on a multi-monitor workstation.

Release Gate Workflow

Execute

Pause & Snapshot

Review

Rehydrate & Resume

Agent

Plan tool execution

Yield state to SDK

Execute tool

Human

Receive payload

Approve or Deny

Agent runtimes that can persist execution state before a tool call and resume from that checkpoint [S3] change how long-running workflows get designed. Previously, pausing mid-execution risked context loss or timeout failures — so you either let an agent run to completion or restarted from scratch. Neither option worked for operations involving privileged actions.

State hydration removes that constraint. Snapshot agent state at a defined checkpoint, pause execution, and rehydrate after an out-of-band step completes. That step can be a human.

For any agent touching destructive or irreversible tooling, the pattern is: snapshot immediately before a high-privilege tool call; route a payload summary to an engineering lead through your team’s approval channel; rehydrate and execute only after confirmation. The agent doesn’t timeout. Context doesn’t degrade. The human isn’t bypassed because someone decided the model was accurate enough.

This is not theoretical caution. It is the same release gate you apply to infrastructure changes — the same logic that stops a schema migration from shipping without a second pair of eyes.

Architecting for Untrusted Execution

The bottleneck in agentic systems is no longer reasoning capability. It is trust infrastructure. The required stack isn’t complex, but each layer is load-bearing: runtime sandboxes for execution containment, network-off defaults for cost control and exfiltration prevention, hard IAM for tool permissions, human release gates for destructive actions.

Do not wait for models to stop hallucinating. Build infrastructure that expects them to fail securely.

← All field notes Brief a crew →