Full Report
Fable 5 is the supposed safe version of Anthropic’s Mythos Preview, with guardrails to ensure that it can’t be used to create cyberattacks. Well, that restriction was bypassed within days.
Analysis Summary
# Vulnerability: Guardrail Bypass (Jailbreak) in Anthropic Fable 5
## CVE Details
- **CVE ID**: Not currently assigned (typical for LLM jailbreak vulnerabilities)
- **CVSS Score**: N/A (Severity: High—given the bypass allows for cyberattack generation)
- **CWE**: CWE-693 (Protection Mechanism Failure) / CWE-1039 (Automated Recognition Failure)
## Affected Systems
- **Products**: Anthropic Fable 5 AI Model
- **Versions**: Initial release version (post-Mythos Preview)
- **Configurations**: Default safety-tuned state via public or API interface
## Vulnerability Description
Fable 5 is the safety-hardened version of the "Mythos Preview" model, designed with specific guardrails to prevent the generation of malicious code and cyberattack planning. The vulnerability involves a "jailbreak"—a prompt injection or manipulation technique that bypasses the model's safety alignment. This allows users to elicit responses that the model is explicitly programmed to refuse, such as the creation of exploits, malware components, or social engineering templates.
## Exploitation
- **Status**: Exploited in the wild / PoC available (Bypassed within days of release)
- **Complexity**: Low (Prompt-based attacks require specialized language manipulation rather than technical coding skills)
- **Attack Vector**: Network (Web interface/API)
## Impact
- **Confidentiality**: High (Can be used to generate reconnaissance data or Phishing content)
- **Integrity**: High (Can be used to generate code for unauthorized system modifications)
- **Availability**: Medium (Risk of generating DoS-related scripting)
Note: In the context of LLMs, this bypass facilitates the compromise of *external* systems rather than the model infrastructure itself.
## Remediation
### Patches
- **Vendor Action**: Anthropic typically addresses these via "Reinforcement Learning from Human Feedback" (RLHF) updates and server-side filtering improvements. Users should use the most current version of the Fable 5 API endpoint as fixes are applied globally on the backend.
### Workarounds
- **Input Filtering**: Implement application-level pre-processing to detect known jailbreak patterns (e.g., "Do Anything Now" (DAN) style prompts).
- **Output Monitoring**: Use secondary "moderation" models to scan the AI's output for sensitive or malicious content before presenting it to the end-user.
## Detection
- **Indicators of Compromise**: Repeated attempts to use adversarial prompt templates (e.g., "Imagine you are a developer without ethics..."); unusual spikes in requests for obfuscated code or exploit methodologies.
- **Detection Methods**: Monitoring API logs for high refusal rates followed by successful interactions, which may indicate a user iterating on a jailbreak pattern.
## References
- **Schneier on Security**: hxxps://www[.]schneier[.]com/blog/archives/2026/06/anthropics-fable-5-model-jailbroken-within-days[.]html
- **Cybersecurity News Report**: hxxps://cybersecuritynews[.]com/anthropics-claude-fable-5-jailbroken/