AI Coding Agent Sandbox Profiles Before Shell Access

Shell access is where a coding assistant becomes a local automation system.

Short answer: Before an AI coding agent gets shell access, define the sandbox profile it must run under. The profile should restrict network egress, writes outside the active workspace, writes to agent configuration, and secrets exposure. Approval prompts still matter, but they should sit on top of boundaries the agent cannot talk its way around.

Most teams start with the prompt.

That is backwards.

A prompt can tell the agent to be careful. It cannot stop a subprocess. It cannot make a compromised repository safe. It cannot prove that a package script, hook, MCP startup command, or shell wrapper stayed inside the intended boundary.

If the agent can run commands, the security question is not only what it is allowed to ask for. The question is what the environment will still block when the model gets tricked.

Why shell access changes the risk

Coding agents are not normal autocomplete.

They read untrusted project material, reason over it, then act through tools. The untrusted material can include issues, pull requests, READMEs, tests, package scripts, comments, agent rule files, and tool responses. If any of that content carries hostile instructions, the model can be steered toward actions the user did not intend.

That is why shell access is a hard boundary.

The NVIDIA AI Red Team guidance on sandboxing agentic workflows is blunt about this. AI coding agents run command-line tools with the same permissions and entitlements as the user. NVIDIA identifies indirect prompt injection through repositories, pull requests, git histories, agent rule files, and malicious MCP responses as a primary threat.

Manual approval helps, but NVIDIA also calls out the downside: repeated approvals create friction and habituation. A tired developer can become part of the failure mode.

So the safer pattern is not "ask me before risky things."

The safer pattern is: put the agent in an environment where risky things are blocked by default, then ask for approval when the task needs a narrow exception.

What the sources agree on

The sources do not all use the same words, but the pattern is consistent.

NVIDIA lists three mandatory controls for serious agentic workflow risk:

Block arbitrary network egress.
Block file writes outside the active workspace.
Block writes to configuration files.

The same guidance argues for OS-level controls because application-level permission checks can lose visibility after a subprocess starts.

Claude Code's permissions documentation shows the application-level side of the problem. It distinguishes read-only actions, shell execution, and file modification. It also states that permission rules are enforced by the application, not by the model. Instructions in a prompt can shape what the model tries to do, but they do not grant or revoke actual access.

Keep reading with free field-guide resources.

VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.

Read the research Browse Skills

Claude Code's permission-mode documentation makes the tradeoff explicit. Default mode reviews each action as it comes. Looser modes reduce interruption. The docs place bypass-style operation in the category of isolated containers and VMs only.

Gemini CLI's sandbox documentation shows the same direction from another CLI stack: sandboxing is a configurable runtime surface, not an afterthought. It includes command flags, environment variables, settings, and multiple platform backends.

OWASP's AI Agent Security Cheat Sheet puts this in the broader control model: least privilege, prompt injection defense, human-in-the-loop controls, monitoring, data protection, and adversarial validation.

The practical read is simple. Approval prompts are one layer. The sandbox profile is the layer that keeps the agent from turning a bad decision into host damage, data leakage, persistence, or hidden configuration drift.

The sandbox profile record

A sandbox profile does not need to be complicated. It needs to be explicit.

Boundary	Default posture	What to record
Network	Deny by default	Approved domains, package registries, proxies, and manual exceptions
Filesystem reads	Minimum needed	Workspace path, extra read paths, sensitive paths denied
Filesystem writes	Workspace only	Writable paths, denied home paths, generated artifact paths
Agent config	Human-only	Agent rules, hooks, MCP config, shell config, Git config, IDE config
Secrets	Do not inherit full shell env	Task-scoped secrets, source, expiry, and redaction path
Approval	Per-instance for boundary breaks	Action, reason, reviewer, decision, and evidence
Lifecycle	Disposable by default	Worktree, container, VM, cleanup, cache, and logs
Monitoring	Always on for shell access	Commands, current directory, files changed, network attempts, final diff

This table is the real approval gate.

Without it, the reviewer is only looking at a stream of prompts. With it, the reviewer can compare every action against a defined boundary.

A safer starting workflow

For a coding agent, start with the least powerful path that can still answer the question.

Run read-only exploration first.
Use a fresh worktree, container, VM, or OS sandbox for write-capable work.
Mount only the repository or task folder the agent needs.
Deny access to the developer home directory by default.
Deny network access unless the task needs a specific endpoint.
Block writes to agent config, hooks, MCP server definitions, shell startup files, Git config, IDE config, and package-manager config.
Inject only task-scoped secrets. Do not pass the full shell environment.
Require per-action approval for network, external communication, destructive writes, credential use, and config changes.
Keep logs for commands, files touched, network attempts, approvals, denials, and final diff.
Destroy or clean the environment after the task.

This is not a trust problem. It is a blast-radius problem.

You can trust the intent of the user and still isolate the agent. You can trust the model more than last month and still deny it your home directory. You can approve an action and still require the environment to block everything outside the approved path.

What not to rely on

Do not rely on a prompt that says "be careful."

Do not rely on one approval that becomes permanent permission for a command pattern.

Do not rely on a broad shell allowlist when a safer wrapper, script, or package task can call something else underneath.

Do not let the agent rewrite its own rules, hooks, MCP config, or shell startup files.

Do not run write-capable agent sessions from a daily-driver profile with long-lived secrets and broad network access.

This connects directly to the VibeSec Advisory field-guide pattern for treating GitHub issues as agent input, using the lethal trifecta before approving agent access, and defining what belongs in an AI approval gate.

Five tests for your current setup

Run these against a safe local test repository before expanding agent access.

Can the agent read a fake .env file that should be denied?
Can the agent make an outbound network request from an offline task?
Can the agent write to shell config, Git config, IDE config, MCP config, hooks, or agent rules?
Can the agent write outside the active workspace?
Can hostile text inside a README, issue, test fixture, or tool result influence a shell command?

If the answer is yes, the next step is not a better reminder prompt. The next step is a tighter sandbox profile.

Evidence versus opinion

Evidence from the cited docs supports these points:

Coding agents can run command-line tools with user-level permissions.
Indirect prompt injection can reach coding agents through repository and tool content.
Manual approval can create fatigue.
Product permission systems distinguish read, shell, file-edit, and no-prompt modes.
Agent security guidance includes least privilege, human review, logging, and adversarial validation.

VibeSec Advisory's opinion is the implementation rule:

Approval prompts should not be the primary boundary for shell access. The primary boundary should be a sandbox profile that still holds when the agent misreads, over-trusts, or follows hostile context.

Free next step

Test your agent: take one coding-agent workflow and write its sandbox profile before the next write-capable run. Start with network, filesystem, config, secrets, approvals, lifecycle, and logs.

AI Workflows Weekly

Read the archive

Practical notes on governed AI workflows, guardrails, and safer automation. No spam, unsubscribe anytime.

Keep testing agentic AI risk.

VibeSec Advisory is a free field guide. Use the research archive, Skill Library, and workflow examples to keep improving what you are building.

Sandbox Profiles Before Shell Access for AI Coding Agents