Most agent mistakes do not start with a bad tool call. They start with an unstated assumption.
The user asks for something. The model fills in the missing pieces. The agent turns that interpretation into a tool call, file edit, email, deployment, ticket update, or customer-facing answer.
That is where a small control helps: an assumption register.
Before a risky action, the agent should state what it thinks the user asked, what it inferred, what evidence it used, what is still missing, and what it is about to do next.
This is not the same as telling AI to "ask clarifying questions."
Clarifying questions are useful when the answer would materially change the work. An assumption register is different. It creates a checkpoint before the agent acts on an interpretation that may be wrong.
Why Prompt Clarity Is Not Enough
Prompting guidance from OpenAI and Anthropic both points in the same direction: be explicit.
OpenAI describes prompt engineering as writing effective instructions so the model output meets requirements. Its API guide also recommends tests and evaluation suites so teams can monitor prompt behavior as they iterate or change model versions.
Anthropic's prompting guidance is even more direct. Be clear. Be specific about output formats and constraints. Use sequential steps when order matters. Do not rely on the model to infer what "above and beyond" means.
That is good prompting hygiene.
But agents create a second problem. The model is not only answering. It may be planning, calling tools, delegating, writing files, storing memory, or producing something another human will treat as done.
The risk is no longer just "bad answer."
The risk is "wrong assumption, successful action."
What The Register Should Contain
Keep it short. If the register is too heavy, people will bypass it.
Keep reading with free field-guide resources.
VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.
Use these fields before any state-changing, external, sensitive, or customer-facing action:
- User request: the exact task the human gave.
- Inferred intent: what the agent believes the user wants.
- Planned action: the next action it wants to take.
- Evidence used: approved source, user-provided input, retrieval, tool output, or model-only inference.
- Source trust: trusted, untrusted, mixed, or unknown.
- Assumptions: the facts or choices the agent is filling in.
- Missing context: what would materially change the action.
- Confidence: high, medium, or low, with one plain reason.
- Risk class: low, medium, high, or blocked.
- Stop condition: what would make the agent pause.
- Reviewer question: the one decision a human needs to make.
The point is not paperwork. The point is to expose the invisible leap.
Where It Belongs In The Workflow
The register belongs before the side effect.
OpenAI's Agents SDK guardrails docs make this distinction concrete. Input guardrails can run before an agent starts. Tool guardrails can validate or block function tool calls before and after execution. The docs also note that if a guardrail runs in parallel, the agent may have already consumed tokens or executed tools before cancellation.
For a workflow control, timing matters.
If the assumption register is supposed to prevent a bad email, file write, deployment, data export, or memory update, it has to run before that action. A final review after the agent acts is useful for learning. It is not prevention.
Why This Is Also A Security Control
OWASP's prompt injection guidance is a useful reminder: untrusted input can alter model behavior in unintended ways. That input can come directly from a user, or indirectly from a website, file, document, email, ticket, pull request, or tool result.
The impact depends on the agent's business context and agency. If the model only drafts text, the failure may be contained. If it has tool access, the same misunderstanding can lead to unauthorized function access, command execution, or manipulated decisions.
An assumption register does not solve prompt injection.
It gives the workflow one more place to compare:
- What did the user ask?
- What did the agent infer?
- Which source shaped the inference?
- Is the source trusted?
- What action is about to happen?
- Does a human need to approve it?
That is a practical boundary.
The Field-Guide Recommendation
Do not make every AI interaction stop for review.
Make the register conditional.
Trigger it when the agent is about to:
- Use a tool that changes state.
- Email, message, publish, deploy, or submit something.
- Write or delete files.
- Store durable memory.
- Use sensitive, regulated, customer, employee, credential, or confidential data.
- Interpret untrusted content from the web, a document, an email, a ticket, a pull request, or a tool result.
- Continue after failed attempts.
- Act with medium or low confidence.
Then give the agent a hard rule:
"Before any state-changing, external, sensitive, or customer-facing action, write a compact assumption register. Separate explicit user instructions from inferred intent. Label source trust. List missing context that would materially change the action. If risk is medium or high, ask one approval question before continuing."
That one pattern will not make agents safe.
But it makes the most dangerous part visible: the gap between what the human meant and what the agent is about to do.
Sources
- OpenAI prompt engineering guide: https://developers.openai.com/api/docs/guides/prompt-engineering
- OpenAI prompt engineering best practices: https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api
- Anthropic prompting best practices: https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices
- OpenAI Agents SDK overview: https://developers.openai.com/api/docs/guides/agents
- OpenAI Agents SDK guardrails: https://openai.github.io/openai-agents-python/guardrails/
- OWASP LLM01 Prompt Injection: https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- OWASP Agentic AI Threats and Mitigations: https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/
- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework
- NIST Generative AI Profile: https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence
- Microsoft HAX Guidelines: https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/
- Guidelines for Human-AI Interaction paper: https://dl.acm.org/doi/fullHtml/10.1145/3290605.3300233