Skip to main content
Back to all posts
5 minAgentic AI SecurityJune 29, 2026

Write the Stop Rules Before the Agent Starts

Autonomous AI workflows need explicit stop rules for uncertainty, permission drift, evidence failure, tool risk, scope change, and irreversible actions.

RM

Ryan Macomber

Editor, VibeSec Advisory

Autonomous AI work needs a stop rule before it needs a longer prompt.

Most teams write the success path first. They tell the agent what to do, what tools it can use, what output they want, and maybe who reviews it at the end.

That is not enough once the agent can act across files, tools, records, messages, or other agents.

The missing artifact is the pause condition. What has to happen before the agent stops continuing and asks for direction?

Why This Matters

AI agents are moving from suggestion to action. They do not just draft. They can call tools, edit files, inspect data, route work, trigger downstream steps, and hand context to other agents.

That changes the safety problem.

A human approval button is useful, but it is not a policy. A blanket rule to "ask before risky actions" is better than nothing, but it still leaves the agent to decide what risky means.

The research points to a more practical pattern: define the stop rules before autonomy starts.

What The Research Says

Adjustable autonomy research has been studying this problem for a long time. Sceri, Pynadath, and Tambe framed the core issue as transfer of control: when should an agent keep decision-making authority, and when should it hand control to a human or another actor? Their answer was not a rigid choice between autonomy and human control. It was a strategy that weighs decision quality, delay, and coordination costs in the team.

Turan's 2026 paper on LLM-agent oversight argues that the hard problem is not the existence of a pause gate. The hard problem is deciding which actions should pause. The paper reports only moderate reviewer agreement on agent-action risk labels and models reviewer fatigue as part of the system. The practical implication is uncomfortable: escalating every action can make oversight worse because human attention is finite.

Ramaswamy's managed autonomy paper describes a related failure mode: agents tend to keep operating even when their grounding has degraded. The proposed answer is an autonomy lifecycle with explicit states for suspension, recovery, escalation, and regulated control.

Tool-use safety research points in the same direction. ToolSafe focuses on step-level tool invocation safety, because agent risk can appear before a tool call executes, not only in the final output.

Keep reading with free field-guide resources.

VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.

The Organizational Control Layer paper makes the execution-boundary pattern explicit. Agents can propose actions, but a separate control layer decides whether to approve, revise, block, or escalate before the environment changes.

Automation-bias research adds the human side. Overreliance and complacency get worse under workload, time pressure, trust, and task complexity. That matters because an overloaded reviewer can become part of the failure mode.

The Stop-Rule Table

For a real workflow, the useful artifact is small.

Stop ruleWhat the agent should noticeDefault action
Uncertainty stopMissing context, conflicting sources, weak evidence, or unclear intentPause and ask
Permission stopA needed tool, data source, repository, customer record, or action was not approvedEscalate
Evidence stopThe central claim cannot be traced to an inspected sourceBlock claim or ask
Tool-risk stopThe next tool call can mutate state, disclose data, execute code, spend money, or create durable outputEscalate
Scope-change stopThe agent is solving a different problem than the one assignedPause and restate
Irreversibility stopThe action is hard to undo, hard to audit, or likely to create downstream workEscalate

This is not bureaucracy. It is process design.

The stop-rule table turns "be careful" into an executable boundary. It tells the agent what to do when the task leaves the safe path.

A Practical Example

Take an agent that drafts and opens a pull request.

The success path is straightforward:

  1. Read the issue.
  2. Inspect the relevant files.
  3. Make the smallest safe change.
  4. Run focused checks.
  5. Open the PR.

The stop rules are where the workflow gets safer:

  • If the issue requires credentials not already present, stop.
  • If the fix touches auth, billing, customer data, or deployment config, stop.
  • If tests fail for an unrelated reason, stop and record it.
  • If the agent needs to delete data or rewrite history, stop.
  • If the requested change conflicts with a security control, stop.
  • If the agent cannot explain why a source file is relevant, stop.

That list is more useful than telling the agent to "use judgment." It names the places where judgment returns to the human.

The Recommendation

Do not start by asking how much autonomy the agent should have.

Start by asking what would force the agent to stop.

For each workflow, define:

  • The approved goal.
  • The approved tools.
  • The approved data surfaces.
  • The actions that can run without review.
  • The actions that must pause.
  • The person or role that resolves each pause.
  • The evidence needed before work resumes.

Then test the workflow against failure cases, not only happy paths.

Run the same task with missing context. Run it with conflicting sources. Run it with an injected instruction in a tool result. Run it with a tempting shortcut. Run it with a scope change.

If the agent keeps going when it should pause, the prompt is not ready. The workflow is not ready either.

Autonomy should be earned at the boundary where the work can cause harm.

AI Workflows Weekly

Read the archive

Practical notes on governed AI workflows, guardrails, and safer automation. No spam, unsubscribe anytime.

First-party signup with double opt-in. No embedded newsletter iframe, no analytics cookies, and unsubscribe anytime.

Keep testing agentic AI risk.

VibeSec Advisory is a free field guide. Use the research archive, Skill Library, and workflow examples to keep improving what you are building.