Full Report
Worms could potentially steal data and deploy malware.
Analysis Summary
# Tool/Technique: Morris II Generative AI Worm
## Overview
Morris II is a novel, proof-of-concept generative AI worm created by researchers (Ben Nassi, Stav Cohen, and Ron Bitton) to demonstrate the security risks associated with connected, autonomous AI ecosystems, particularly those built around Large Language Models (LLMs) like ChatGPT and Gemini. Its purpose is to self-replicate across systems, potentially stealing data (like confidential information) or spreading spam.
## Technical Details
- Type: Malware (Worm concept demonstration)
- Platform: Generative AI email assistants utilizing LLMs (specifically tested against ChatGPT, Gemini, and LLaVA in a controlled environment).
- Capabilities: Self-replication via adversarial prompts, data exfiltration from emails, spam message deployment, bypassing certain security protections in LLM services.
- First Seen: Research demonstrated in 2024.
## MITRE ATT&CK Mapping
The worm leverages prompt manipulation techniques akin to traditional injection attacks, targeting the LLM processing layer.
- **T1566 - Phishing**
- **T1566.002 - Spearphishing Link** (If used in email payload dissemination)
- **T1059 - Command and Scripting Interpreter**
- **T1059.011 - Application Software and Scripting Interpreters** (Conceptual mapping, as the "command" is an adversarial prompt injected into the system's context/RAG database)
- **T1190 - Exploit Public-Facing Application** (Conceptual mapping, as the LLM API/interface is the exploited externally facing function)
## Functionality
### Core Capabilities
- **Adversarial Self-Replicating Prompt:** The core mechanism where the worm consists of a carefully crafted prompt designed to instruct the receiving generative AI model to output *another* malicious prompt in its response.
- **Data Poisoning via RAG:** Utilizing Retrieval-Augmented Generation (RAG) systems within email assistants to poison the external data sources the LLM consults, allowing the adversarial prompt to be reintroduced into new user queries.
- **Self-Replication:** The worm spreads when the poisoned response (containing the adversarial prompt) is used to reply to a new email, infecting the new client's database/context.
### Advanced Features
- **Multimodal Infection:** The self-replicating prompt can be embedded within an image file, allowing the worm to propagate to new clients when the infected email assistant forwards the image.
- **Data Exfiltration:** Capable of stealing sensitive data residing in emails, including PII such as names, phone numbers, credit card numbers, and SSNs.
- **Security Bypass:** The attack demonstrated the ability to "jailbreak" security protections in ChatGPT and Gemini services to achieve its objective.
## Indicators of Compromise
The primary IOCs relate to the *content* of the attack rather than traditional file hashes, as the payload is textual/contextual within the AI system.
- File Hashes: N/A (Proof of concept relied on prompt injection, not traditional file execution)
- File Names: N/A
- Registry Keys: N/A
- Network Indicators: N/A (The attack focuses on data manipulation within the LLM context rather than specific C2 external beacons)
- Behavioral Indicators:
- LLMs generating unexpected external instructions (prompts).
- LLMs exhibiting behavior inconsistent with safety guidelines (jailbreaking).
- Unsolicited forwarding of emails containing embedded anomalous data (images/text).
- Sudden large-scale exfiltration of structured data from internal email contexts.
## Associated Threat Actors
- Researchers: Ben Nassi, Stav Cohen, Ron Bitton (in a controlled research environment).
- Current Status: Not yet observed "in the wild," but anticipated by researchers within the next 2-3 years.
## Detection Methods
Detection focuses heavily on anomalous LLM behavior and input validation.
- Signature-based detection: Traditional signatures are insufficient; detection may rely on signatures of known adversarial prompt structures if cataloged.
- Behavioral detection: Monitoring for patterns where LLM output contains self-referential instructions or instructions designed to manipulate external data sources (like RAG entries). Monitoring API usage patterns for unusual high-frequency repetition of specific query types.
- YARA rules: Not explicitly mentioned, but could potentially be written to detect known adversarial prompt structures embedded in text or image metadata/steganography.
## Mitigation Strategies
Mitigations focus on robust application design and human oversight, acknowledging the prompt injection vector.
- **Secure Application Design:** Implementing defenses similar to those against traditional injection attacks (SQL injection, buffer overflows) but applied contextually to LLM interaction. Never trust LLM output blindly within the application logic.
- **Human-in-the-Loop (HITL):** Crucial mitigation where AI agents are prevented from taking autonomous actions (like sending emails) without explicit human approval, establishing clear boundaries.
- **Monitoring and Rate Limiting:** Monitoring for patterns where the same prompt/instruction structure is repeated thousands of times to identify bulk manipulation attempts.
- **Input Validation/Sanitization:** While difficult for natural language, attempting to sanitize prompts or establish strict context boundaries for RAG access.
## Related Tools/Techniques
- Morris Computer Worm (1988): Historical reference point for the self-replicating nature.
- Prompt Injection Attacks: The fundamental vector used to control the LLM behavior.
- Jailbreaks: Techniques used to bypass the LLM's inherent safety rules.
- SQL Injection/Buffer Overflows: Analogous traditional attacks leveraging improper input handling.