An agent is dangerous when it can read private data, absorb untrusted content, and talk to the outside world.
Short answer
Use the lethal trifecta as a fast approval test. If an agent can access private data, ingest untrusted content, and communicate externally, treat the workflow as high risk. Break at least one leg before approval. Remove unnecessary tools, label untrusted inputs, block external sends by default, require approval outside the model, and log the exact artifact a human approved.
The review shortcut
Most agent reviews look at one tool at a time.
Can this tool read files? Can this tool send messages? Can this MCP server open pull requests? Can this browser agent visit the web?
That review is useful, but it misses the combination risk.
Simon Willison calls the dangerous combination the lethal trifecta for AI agents: private data, untrusted content, and external communication.
Any one of those can be reasonable in isolation.
A support agent may need private customer records. A research agent may need to read web pages. A workflow agent may need to send a message or open a ticket.
The risk changes when the same agent can do all three.
Now an attacker-controlled input can become an instruction source. The agent can read something sensitive. Then the agent can send that sensitive data somewhere else.
That is not a vibe. That is the basic failure mode behind indirect prompt injection, tool poisoning, and a lot of agent permission problems.
What counts as private data
Private data is any information the agent should not freely expose outside the workflow.
Examples:
- Source code in a private repository
- Internal documents
- Customer records
- Inbox contents
- Credentials or local config files
- Incident notes
- Financial data
- Roadmaps and product plans
- Security questionnaires and vendor answers
- Data from a permissioned RAG system
The question is not only "is this secret?"
The better question is "would we be comfortable if an attacker-controlled page caused the agent to quote this into an external channel?"
If the answer is no, it belongs in the private data column.
What counts as untrusted content
Untrusted content is any content the agent can read that is controlled by someone outside the trusted workflow boundary.
That includes obvious sources:
- Web pages
- Emails
- Uploaded documents
- Support tickets
- Chat messages
- Pull request comments
- Issue descriptions
- Screenshots
- API responses
- Search results
It also includes less obvious sources:
- MCP tool descriptions from a third-party server
- Tool annotations
- Tool error messages
- Tool results
- Resource links returned by tools
- Embedded resources
- Retrieved chunks from a mixed-trust knowledge base
OWASP LLM01 frames indirect prompt injection as external content altering model behavior. That is the key point. The content does not need to look like a prompt to the user. It only needs to be parsed by the model.
The model sees text, metadata, files, images, and tool outputs in one working context. If the system does not preserve trust boundaries, hostile content can compete with the user's intent.
What counts as external communication
External communication is any path the agent can use to move information outside the current trust boundary.
Examples:
- Sending an email
- Posting a message
- Creating a pull request
- Opening an issue
- Calling a webhook
- Making an HTTP request
- Loading a remote image or URL
- Updating a customer record
- Writing to a shared document
- Pushing a commit
- Triggering a deployment
- Producing a link a human is likely to click
This part is easy to undercount.
Keep reading with free field-guide resources.
VibeSec Advisory publishes practical research, Skills, workflow examples, MCP notes, prompt injection tests, and AI red-team lessons for builders working with agentic AI.
A tool does not need to be named send_data to exfiltrate data. If it can write text somewhere an attacker can read, it may be an exfiltration path.
Why this matters for MCP and tool-connected agents
The MCP Tools specification says tools are model-controlled. The language model can discover and invoke tools automatically based on context and user prompts.
The same specification says clients must consider tool annotations untrusted unless they come from trusted servers. It also describes tool results that can contain text, images, audio, resource links, embedded resources, and structured content.
That matters because tool material is not just data plumbing. It becomes model context.
Invariant Labs showed this clearly in its MCP tool poisoning research. A malicious tool description can contain instructions that are visible to the model but hidden or simplified for the user. Their research also describes rug pulls, where tool behavior changes after approval, and tool shadowing, where one tool influences how the agent handles another tool.
The practical lesson is not "never use MCP."
The lesson is simpler: review tool descriptions, annotations, results, and changes as possible instruction sources. Do not approve an MCP server only because the install prompt looked harmless.
The three-column approval test
Before approving an agent, fill out this record:
Private data the agent can read:
Untrusted content the agent can ingest:
External communication paths the agent can use:
Actions that require approval outside the model:
Logs a reviewer can inspect:
If the first three fields are all non-empty, do not approve the workflow as-is.
That does not mean the workflow is useless. It means it needs a boundary change.
How to break the trifecta
1. Remove private data access
If the task does not need sensitive data, do not expose it.
Use a clean workspace. Use a limited test record. Use a redacted document set. Use scoped credentials. Keep production tokens out of the agent shell.
This is boring least privilege. It still works.
2. Remove untrusted content from the action path
Sometimes the agent needs to read untrusted content, but it does not need to act directly on it.
A safer pattern is a two-step workflow:
- Summarize or classify the untrusted content into a review artifact.
- Use a separate approval step before any tool can send, write, delete, commit, deploy, or update records.
The key is that untrusted content should not be able to trigger consequential action by itself.
3. Remove external communication
If the agent only needs to reason, make the output local by default.
Draft instead of send. Prepare instead of post. Suggest instead of commit. Write to a local file instead of a shared system.
A human can still move the work forward after review.
4. Add approval outside the model
Some workflows need all three legs.
For those, approval has to sit outside the model. The model should not be the only thing deciding whether its own tool call is safe.
High-risk actions include:
- Send an external message
- Commit or push code
- Open or merge a pull request
- Trigger a deployment
- Delete data
- Update customer records
- Change permissions
- Spend money
- Call an admin API
This maps directly to OWASP LLM06: Excessive Agency. The issue is not only bad model output. The issue is too much functionality, too much permission, and too much autonomy.
A small test
Pick one agent or MCP-connected workflow.
Write the three columns.
Then ask one question:
Which leg can I remove without breaking the useful part of the workflow?
If the agent only needs to summarize incoming tickets, it probably does not need to send external messages.
If it only needs to draft a pull request description, it probably does not need repository write access.
If it only needs to review public docs, it probably does not need private customer records in context.
If removing a permission does not break the workflow, leave it removed.
Evidence versus opinion
Evidence from the sources:
- Simon Willison defines the lethal trifecta as private data, untrusted content, and external communication.
- OWASP LLM01 describes indirect prompt injection through external sources such as websites or files, with potential impacts including sensitive information disclosure, unauthorized function access, command execution, and manipulation of critical decisions.
- OWASP LLM06 describes excessive agency as damaging actions caused by unexpected, ambiguous, or manipulated LLM outputs, with root causes in excessive functionality, permissions, and autonomy.
- The MCP Tools specification describes model-controlled tools, untrusted annotations unless from trusted servers, and tool results that can carry several content types.
- Invariant Labs describes MCP tool poisoning patterns where tool descriptions can steer model behavior and users may not see the same detail the model sees.
My opinion:
The lethal trifecta should be a default first-pass review for agent approvals. It is not a full threat model. It is a fast way to catch the workflows most likely to turn prompt injection into real-world damage.
Prompting the model to ignore malicious instructions is not enough. The safer move is to remove a leg of the trifecta, narrow the tool, or force approval outside the model before the action leaves the trust boundary.
Free next step
Test one agent this week. Write the three columns, then compare the result to the AI agent tool inventory, the agent action approval matrix, and the MCP permission review. If all three legs are present, break one before you approve the workflow.