Skip to main content
Back to all posts
6 minMCP And ToolingJune 22, 2026

MCP Tool Annotations Are Hints, Not Approval Gates

MCP tool annotations can help describe risk, but they are not proof that a tool is safe. Review the server, scopes, observed behavior, approval path, and change history before trusting them.

RM

Ryan Macomber

Editor, VibeSec Advisory

A tool that says it is read-only has not proven it is safe.

Short answer

MCP tool annotations are useful risk hints. They are not approval gates. Treat readOnlyHint, destructiveHint, idempotentHint, and openWorldHint as claims supplied by a server, then verify them against the server owner, credentials, OAuth scopes, code, observed tool calls, output handling, and downstream actions.

An untrusted annotation can trigger review. It should not bypass review.

If a client uses annotations to auto-approve tools, it needs a trust model, a change detector, a policy engine, and trace evidence. Otherwise the annotation is just a label.

Why this matters

MCP makes tools easier to connect to agents. That is useful. It also means a model can discover and call tools that touch databases, APIs, files, browsers, ticket systems, calendars, email, and internal automation.

The MCP Tools specification says tools are model-controlled. The model can discover and invoke them based on context and the user's prompt. The same spec says there should always be a human in the loop with the ability to deny tool invocations, and that clients must consider tool annotations untrusted unless they come from trusted servers.

That last part matters.

A server can say a tool is read-only. A server can say a tool is not destructive. A server can say a call is idempotent. A server can say the tool does or does not reach the open world.

Those claims may be true. They may be wrong. They may be stale. They may come from a server you should not trust.

That is why annotations belong in the review packet, not in the approval slot.

What annotations can do

The official MCP annotation guidance describes the current tool annotation set as a small risk vocabulary:

  • readOnlyHint: does the tool modify its environment?
  • destructiveHint: if it modifies things, can the change destroy or overwrite something?
  • idempotentHint: can the same call be repeated safely?
  • openWorldHint: does the tool interact with external entities or open-ended content?

Used well, those hints can improve the client experience.

They can make confirmation prompts clearer. They can help a policy engine decide which actions need review. They can help a team sort tools into allowed, review-required, and blocked categories. They can also make missing metadata obvious, which is useful because a tool with no annotations should not get the benefit of the doubt.

But a hint is still a hint.

The annotation can tell you what the server claims. It cannot prove what the tool does.

What annotations cannot do

An annotation cannot enforce least privilege.

It cannot prove that the tool's OAuth scope is read-only.

It cannot prove that a database account lacks write access.

It cannot prove that a file tool stays inside the intended root.

It cannot prove that a browser tool will ignore malicious page content.

It cannot prove that the server will not change the tool definition after approval.

Keep reading with free field-guide resources.

VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.

It cannot prove that the tool result will not steer the agent into a riskier next step.

That is the security gap. Teams see a label and treat it like a control.

It is not a control unless trusted code enforces it.

The poisoning problem

Tool metadata is not harmless just because it looks like documentation.

Invariant Labs describes MCP Tool Poisoning Attacks where malicious instructions are embedded in tool descriptions that the model sees but the user may not see clearly. Their examples include tool descriptions that instruct the agent to read sensitive files, pass secrets through hidden arguments, or change how the agent uses another trusted tool.

That is not only a tool execution problem. It is an input-boundary problem.

The model reads tool descriptions. The model reads tool annotations. The model reads tool results. If those fields are supplied by a server, especially a third-party server, they are part of the agent input surface.

OWASP LLM01 makes the same general point about prompt injection: external content parsed by the model can alter behavior even when it is not human-visible. OWASP LLM06 connects the impact to excessive agency: too much functionality, too much permission, or too much autonomy lets manipulated outputs become damaging actions.

MCP annotations help name risk. They do not remove that risk.

Pay special attention to open-world tools

openWorldHint is the one I would review most carefully.

A read-only local lookup can still be risky, but an open-world tool changes the threat model. It may fetch web pages, read emails, inspect tickets, open documents, browse sites, or bring external content back into the agent loop.

That output can carry instructions.

If the same agent can also read private data and communicate externally, you are close to the lethal-trifecta pattern: private data, untrusted content, and exfiltration path in one workflow.

This is where annotation review should connect to broader agent permission review. Do not ask only, "Is this tool read-only?" Ask:

  • Does this tool read private data?
  • Does this tool see untrusted content?
  • Can this tool or another available tool communicate externally?
  • Can this tool's output influence a later write, send, delete, or memory update?
  • Would a changed tool description be detected before the next run?

If those answers create a dangerous combination, the annotation should raise friction, not remove it.

Use an annotation review record

For each MCP tool, capture a small review record before allowing low-friction use:

  1. Server name and owner.
  2. Why the server is trusted.
  3. Raw tool name, description, input schema, output schema, and annotations.
  4. Annotation values and missing defaults.
  5. Claimed behavior versus observed behavior.
  6. OAuth scopes, API keys, filesystem roots, and downstream permissions.
  7. Whether the tool reads private data.
  8. Whether the tool reaches external content.
  9. Whether the tool can communicate externally.
  10. Whether the tool result can influence another tool call.
  11. Approval state: blocked, review-required, or allowed.
  12. Trace evidence from a dry run or sandbox run.
  13. Change trigger that forces re-review.

That record is not heavy process. It is the minimum evidence needed before a label becomes a decision.

A practical policy

Use this default policy until you have a stronger one:

  • No annotations means review-required.
  • Untrusted server annotations are informational only.
  • Trusted server annotations can reduce friction only after scope and behavior review.
  • destructiveHint: true requires explicit approval.
  • openWorldHint: true requires prompt-injection review and output handling rules.
  • Tool-list changes require re-review.
  • Annotation changes require re-review.
  • Tool descriptions with hidden instructions, cross-tool instructions, secret requests, or user-deception language are blocked.
  • Auto-approval requires trace evidence and downstream enforcement outside the model.

That last line is the key.

The model should not be the thing deciding whether the model gets to take the action.

Evidence versus opinion

Evidence:

  • The MCP Tools specification says tool annotations must be treated as untrusted unless they come from trusted servers.
  • The official MCP annotation guidance says annotation fields are hints, not guaranteed descriptions of behavior.
  • Invariant Labs shows how MCP tool descriptions can carry malicious instructions, including rug pulls and cross-server shadowing.
  • OWASP LLM01 frames external content parsed by models as prompt-injection risk.
  • OWASP LLM06 frames over-broad tool access as excessive agency.

Opinion:

MCP clients and internal wrappers should treat annotations as a review accelerator, not a review replacement. The useful pattern is not "this says read-only, so auto-approve it." The useful pattern is "this claims read-only, so verify the claim, record the evidence, and keep watching for change."

Sources

Free next step

Review one MCP server you already use. Pick the tool with the most attractive label, then prove whether the label matches the code, scopes, traces, and downstream actions. If you cannot prove it, keep the tool review-required.

Then browse the VibeSec Advisory Skill Library or test your agent with the same review record.

AI Workflows Weekly

Read the archive

Practical notes on governed AI workflows, guardrails, and safer automation. No spam, unsubscribe anytime.

First-party signup with double opt-in. No embedded newsletter iframe, no analytics cookies, and unsubscribe anytime.

Keep testing agentic AI risk.

VibeSec Advisory is a free field guide. Use the research archive, Skill Library, and workflow examples to keep improving what you are building.