A tool result is not automatically safe just because it came from a tool.
Short answer
Tool results are agent input. Treat every result as context with a source, trust level, data class, allowed influence, downstream tool path, approval trigger, and trace record. A result can answer a question, but it should not automatically change instructions, choose new tools, write memory, communicate externally, or approve a high-impact action.
This matters because agents do not only read tool results. They reason over them. They use them to decide what to do next.
The boundary most teams miss
A tool inventory tells you what an agent can call. A trace tells you what it did. A tool result review tells you what evidence shaped the next action.
That is the missing boundary.
A browser agent reads a web page. A coding agent reads an issue. An MCP client receives a resource link. A support agent gets a CRM result. A security agent gets scanner output. In each case, the returned content enters the model's context.
The security question is not only "Can the agent call this tool?"
It is also "What is this result allowed to influence?"
MCP makes the shape visible
The Model Context Protocol Tools specification says tools let models interact with external systems such as databases, APIs, and computations. It also says tools are model-controlled, which means the model can discover and invoke tools based on context and the user's prompt.
Tool results can contain structured content or unstructured content. They can include text, images, audio, resource links, and embedded resources. The spec also describes output schemas for structured results and says clients should validate structured results against the schema when one exists.
That is useful.
But it is not the same as trust.
A schema can tell you whether the response is shaped correctly. It cannot tell you whether the text inside the response is hostile, stale, misleading, over-permissioned, sensitive, or trying to steer the agent.
Prompt injection enters through results
OWASP LLM01 describes indirect prompt injection as the case where an LLM accepts input from external sources, such as websites or files. OWASP recommends separating and clearly denoting untrusted content, least privilege, and human approval for high-risk actions.
That maps directly to tool results.
Keep reading with free field-guide resources.
VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.
If a tool returns public web content, user-generated comments, repository files, ticket descriptions, email bodies, PDF text, OCR output, or an MCP resource, that content may contain instructions. The agent may interpret those instructions as part of the task unless the client and workflow make the boundary clear.
This is not theoretical.
Unit 42 reported web-based indirect prompt injection observed in the wild, including an AI-based ad review evasion case. The report describes attacker-controlled instructions embedded in web content that LLM systems later consume during summarization, content analysis, translation, or automated decision-making.
The lesson is simple: tool output can be a prompt delivery path.
Excessive agency turns bad results into bad actions
The risk gets worse when the agent can act.
OWASP LLM06 describes excessive agency in systems that call tools, skills, plugins, or extensions. It says agent-based systems often make repeated model calls where output from earlier invocations grounds and directs later invocations.
That sentence is the key.
A bad result does not need to compromise the whole system at once. It only needs to steer the next step:
- Call a broader tool.
- Read a different file.
- Include private data in an answer.
- Write memory from untrusted content.
- Send a message.
- Approve a workflow.
- Publish or modify something externally.
If the workflow lets tool results drive those actions without labels or approval, the tool boundary is too weak.
Use a Tool Result Review Record
For high-impact workflows, record the result boundary before the agent acts on it.
Minimum fields:
- Tool name and server.
- Result source system.
- Trust level: trusted, internal, partner, public, user-generated, unknown, or hostile.
- Data class: public, internal, sensitive, credential-adjacent, regulated, or unknown.
- Content types: text, structured JSON, image, audio, resource link, embedded resource, file, or API response.
- Allowed influence: answer only, summarize, cite, plan, select tools, write memory, request approval, or take action.
- Blocked influence: system instruction changes, credential requests, hidden tool calls, external communication, deletion, purchase, publish, or production modification.
- Downstream tools the result could affect.
- Approval trigger.
- Evidence kept in the trace or log.
- Reviewer decision.
This is not bureaucracy. It is how you make the model's next step reviewable.
Suggested defaults
Use conservative defaults until the workflow proves it needs more autonomy.
- Public web content can inform an answer, but it cannot approve an action.
- User-generated content can be summarized, but it cannot change instructions.
- Tool results from unknown sources cannot write memory.
- Resource links returned by tools need review before automatic follow-up fetches.
- Structured output should be schema-validated, source-labeled, and influence-limited.
- Any result that requests credentials, expands scope, changes tools, sends data, deletes data, publishes, purchases, or modifies production should stop the run.
If that feels strict, good. The agent can always ask for review. It should not invent authority from a tool response.
Evidence vs opinion
The evidence says tool-bearing agents need explicit trust boundaries:
- MCP describes model-controlled tools and tool results that can include multiple content types, resource links, and embedded resources.
- OWASP treats external source content as a prompt injection risk and recommends clearly denoting untrusted content.
- OWASP excessive agency guidance warns that tool access, repeated model calls, functionality, permissions, and autonomy can combine into damaging actions.
- Unit 42 reports real web-based indirect prompt injection activity.
- Microsoft's AI threat modeling guidance recommends mapping prompt construction, memory access, tool invocation, external data ingestion, human approval points, data flows, trust boundaries, and tool permissions.
My opinion: a tool result boundary should exist before an agent gets write access, browser access, repository access, customer data, memory, or external communication.
Without it, you are trusting the most recent observation to steer the next action.
That is not a security model.
Free next step
Test your agent: create a Tool Result Review Record for one tool your agent already uses. Pick the riskiest result type first, then decide what that result is allowed to influence.