Most agent security problems start before the first tool call.
The risky moment is not when an agent writes a bad sentence. It is when the system gives that agent a browser, file handle, MCP server, API, memory store, or message-sending tool without a clear record of what that tool can touch.
That record should exist before the connection.
Call it a tool permission manifest.
Prompts Are Not the Permission Boundary
Recent agent-security research keeps pointing at the same control surface: the tool call.
Progent treats tool calls as the enforcement point because every external action flows through a structured interface: the tool name and its typed arguments. The framework uses symbolic rules over tool names and arguments, blocks unmatched calls by default, and requires explicit approval when a policy update expands privilege.
That is a useful lesson for teams even if they never adopt Progent directly. A permission boundary has to say which calls are allowed, with which arguments, under which task context.
AgenTRIM makes a related point from a different angle. It frames agent risk as unbalanced tool-driven agency. Too much agency means the agent keeps access to tools it does not need. Too little agency means it cannot complete the task. Its proposed control loop starts with a verified tool inventory, then filters tool access per step at runtime.
That is the part many teams skip.
They connect the tools first. Then they write policy language after something feels risky.
The Manifest Is the Missing Middle
A practical manifest should answer basic questions before an agent can use a tool:
- What tool can the agent see?
- Who owns it?
- What server or integration exposes it?
- What can it read?
- What can it write, send, delete, execute, or persist?
- Which argument patterns are allowed by default?
- Which calls require human approval?
- Which data classes are blocked?
- What cross-tool data flow is allowed?
- What evidence proves what happened after the call?
- Who can revoke access?
The key is argument-level specificity.
send_email is not one permission. Sending a draft to an internal reviewer is different from sending customer data to an external address. Reading a local project file is different from reading a credential file. Querying a database is different from modifying it.
If the manifest only lists tool names, it is an inventory. Useful, but incomplete.
Tool Descriptions Are Evidence, Not Authority
MCP makes tool discovery easier. It also raises the cost of trusting metadata blindly.
Keep reading with free field-guide resources.
VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.
One MCP security analysis points to protocol-level risks around capability attestation, message origin, and trust propagation across multiple servers. The practical takeaway is simple: a tool's claimed capability is not the same thing as a verified permission boundary.
Another recent benchmark studies tool description poisoning, where malicious instructions are hidden in the tool metadata the agent uses as its planning manual. The attack is not only in the tool's executable code. It can live in the description the model trusts while planning.
That is why the manifest needs a field for description provenance.
Who reviewed the tool description? Did anyone compare it to the code or execution trace? Is the description allowed to instruct the agent, or only describe input and output behavior?
Those questions sound boring until a tool manual quietly becomes an instruction channel.
Logs Need to Show State Change
Agent reviews often over-index on the transcript.
The transcript is not enough.
SafeClawBench separates semantic attack acceptance, audit-visible harm evidence, and sandbox-observed state harm. That distinction matters. A model can pass a text-level check while a separate executable protocol still produces observable harm.
The important evidence is not only what the model said. It is what files changed, what messages sent, what memory updated, what database rows moved, and what protected objects were exposed.
A permission manifest should therefore include a receipt requirement for every high-risk tool:
- tool called
- arguments supplied
- data source used
- approval state
- output returned
- state changed
- downstream tool invoked
- rollback owner
Without that receipt, the next reviewer is guessing.
Static Access Is Too Blunt for Agents
Traditional access control assumes the authenticated actor keeps behaving like the same actor.
Agents make that assumption weaker. Their runtime belief and intent can shift after reading email, web pages, documents, tool outputs, memory, or MCP metadata. The SoK on trust-authorization mismatch argues that static permissions can become decoupled from runtime trustworthiness.
For working teams, that means a manifest should not be a permanent grant.
It should support downgrade, revoke, and re-approval rules:
- If the agent reads untrusted content, downgrade write tools.
- If the agent needs a new external recipient, ask for approval.
- If a tool description changes, re-review the manifest.
- If a server gains a new capability, treat that as a permission expansion.
- If a high-risk tool call fails policy, log the denial.
This is not bureaucracy. It is how you stop tool access from drifting quietly.
The Recommendation
Before connecting an agent to tools, write the permission manifest.
Keep it short enough that someone will maintain it. Make it specific enough that it can block a bad call.
Minimum fields:
- tool name
- owner
- source server or integration
- reviewed description
- read surfaces
- write or execute surfaces
- allowed arguments
- blocked arguments
- approval trigger
- cross-tool data-flow rule
- receipt requirement
- revocation owner
The manifest will not solve prompt injection. It will not make MCP risk disappear. It will not replace sandboxing, monitoring, or human review.
It does something narrower and more useful.
It makes the authority visible before the agent uses it.
That is the starting point for governed agent workflows.