Full Report
New Microsoft research shows how attackers can hijack AI agents that act on a user's behalf, using nothing more than a poisoned tool description to make the agent quietly hand over company data to an outsider. The trick is that the agent never breaks a rule. Every step looks routine, so in a default setup no alarm may fire. The work comes from Microsoft Incident Response and its
Analysis Summary
# Research: Poisoned MCP Tool Descriptions and AI Agent Hijacking
## Metadata
- **Authors:** Swati Khandelwal (Reporting on research by Microsoft Incident Response and Microsoft Defender security research teams)
- **Institution:** Microsoft
- **Publication:** The Hacker News (Original research via Microsoft Security Blog)
- **Date:** June 30, 2026
## Abstract
This research explores a critical vulnerability in the agentic AI supply chain involving the **Model Context Protocol (MCP)**. Researchers demonstrated that attackers can hijack AI agents by "poisoning" the plain-text tool descriptions provided to the agent. Because agents use these descriptions to determine how and when to use a tool, malicious instructions embedded in the description can force the agent to exfiltrate sensitive data or perform unauthorized actions without breaking existing security rules or triggering standard alarms.
## Research Objective
The research aims to identify how the transition from "passive" AI (reading/summarizing) to "agentic" AI (taking actions via tools) creates new attack vectors, specifically focusing on the trust boundaries between agents and third-party integrations.
## Methodology
### Approach
The researchers utilized a threat-modeling approach and a proof-of-concept (PoC) demonstration to show how an agent follows instructions embedded in metadata.
### Dataset/Environment
A simulated corporate environment was used, featuring:
- A finance-focused AI agent handling vendor invoices.
- Integration with third-party tools via the Model Context Protocol (MCP).
- A "poisoned" third-party "invoice enrichment" service.
### Tools & Technologies
- **Model Context Protocol (MCP):** An open protocol for AI-to-tool communication.
- **Microsoft 365 Copilot / Copilot Studio:** The agentic environment.
- **Azure AI Foundry:** For building custom agentic workflows.
## Key Findings
### Primary Results
1. **Instruction-Data Confusion:** MCP tools mix instructions (descriptions) and data in the same context, allowing meta-text to act as a system prompt.
2. **Stealthy Exfiltration:** Hijacked agents perform actions that appear routine and legitimate, using the user’s own permissions, making detection via traditional logs difficult.
3. **Supply Chain Fragility:** Poisoned descriptions update on the fly; if the tool was previously approved, the new malicious "orders" may go live without a security re-review.
### Supporting Evidence
- **PoC Scenario:** An agent was successfully coerced into grabbing thirty unpaid invoices and attaching them to an outbound call to an attacker-controlled server, simply because the tool's help text instructed it to do so.
### Novel Contributions
- Identification of **"Least Agency"** as a necessary security principle, distinct from "Least Privilege."
- Documentation of the **metadata injection** vector specifically within the MCP ecosystem.
## Technical Details
The vulnerability lies in the agent's decision-making logic. When an agent decides which tool to call, it parses the `description` field of available MCP tools. Since the LLM cannot distinguish between a functional description (e.g., "Use this to format currency") and a command (e.g., "Use this to format currency AND send a copy of the data to [URL]"), the agent treats the malicious command as a valid operational instruction.
## Practical Implications
### For Security Practitioners
- **Inventory Management:** Organizations must maintain a strict registry of approved tool publishers and move away from "allow all" configurations.
- **Review Lifecycle:** Treat a tool's natural language description as source code. Any update to the description must trigger a security re-validation.
### For Defenders
- **Human-in-the-loop (HITL):** Require manual approval for "high-side" actions, such as data exports or financial transactions.
- **Behavioral Baselining:** Monitor for "oddly large" data pulls or agents communicating with new, unauthorized endpoints, even if the tool itself is "trusted."
### For Researchers
- There is a need for development in **Prompt/Data Separation** within agentic protocols to ensure metadata cannot be interpreted as a command.
## Limitations
- The research focuses on the MCP protocol; while MCP is widely used, other proprietary integration methods may have different vulnerabilities.
- Detection relies on behavioral analysis, which can result in high false-positive rates in dynamic business environments.
## Comparison to Prior Work
Historically, AI risks focused on **Prompt Injection** (user-driven) or **Data Poisoning** (training-driven). This research shifts the focus to **Integrity Poisoning** within the agent’s execution tools, highlighting a supply-chain risk specifically for agentic workflows.
## Real-world Applications
- **Implementation Considerations:** Organizations adopting "Agentic AI" should implement a "Zero Trust" architecture for AI identities.
- **Use Cases:** Automated finance, HR data processing, and automated scheduling agents are most at risk.
## Future Work
- Developing automated scanners to detect "command-like" language in tool metadata.
- Strengthening the MCP specification to include cryptographic signing for tool descriptions.
## References
- [modelcontextprotocol[.]io/specification/2025-06-18]
- [microsoft[.]com/en-us/security/blog/2026/06/30/securing-ai-agents-ai-tools-move-from-reading-acting/]