Full Report
The UK’s AI Security Institute evaluated GPT-5.5’s ability to find security vulnerabilities, and found that it is comparable to Claude Mythos. Note that the OpenAI model is generally available. Here is the Institute’s evaluation of Mythos. And here is an analysis of a smaller, cheaper model. It requires more scaffolding from the prompter, but it is also just as good.
Analysis Summary
# Vulnerability: Enhanced AI Capabilities in Automated Vulnerability Discovery
## CVE Details
- **CVE ID**: N/A (General capability assessment of AI models)
- **CVSS Score**: N/A
- **CWE**: N/A (Relates to automated discovery of various CWEs)
## Affected Systems
- **Products**: Large Language Models (LLMs) and various software targets.
- **Versions**:
- OpenAI GPT-5.5
- Claude Mythos (Preview)
- Smaller, lower-cost "scaffolded" models.
- **Configurations**: Generally available models and models utilizing specific prompting frameworks (scaffolded models).
## Vulnerability Description
Technical evaluations by the UK’s AI Security Institute (AISI) indicate a significant advancement in the capability of AI models to identify security vulnerabilities. Specifically, GPT-5.5 has demonstrated performance levels comparable to Claude Mythos in autonomous vulnerability discovery tasks. The evaluation highlights that even smaller, less expensive models can achieve similar efficacy when provided with sophisticated "scaffolding" (automated prompting and environmental tools) by the user. This represents a paradigm shift where sophisticated vulnerability research is becoming increasingly accessible via general-purpose AI.
## Exploitation
- **Status**: Not applicable (Assessment of discovery capabilities rather than a specific flaw).
- **Complexity**: Low to Medium (Varies based on model and scaffolding requirements).
- **Attack Vector**: Network / Local (LLMs can be applied to diverse attack surfaces).
## Impact
- **Confidentiality**: High (Increased risk of discovery of 0-day or unpatched flaws).
- **Integrity**: High (Automated identification of bypasses or injection flaws).
- **Availability**: High (Discovery of DoS or system-crashing bugs).
## Remediation
### Patches
- No direct software patch is applicable. Organizations should focus on hardening software against AI-augmented discovery.
### Workarounds
- **Shift-Left Security**: Increase the use of AI-driven defensive tools to identify and fix flaws before external actors can utilize similar LLM capabilities.
- **API Rate Limiting**: Limit large-scale automated probing of public-facing endpoints.
## Detection
- **Indicators of Compromise**: High-frequency, varied, and context-aware probing patterns in web/application logs that reflect iterative "reasoning" or "trial-and-error" typical of LLM-driven agents.
- **Detection Methods**: Use of AI-based Web Application Firewalls (WAFs) and anomaly detection to identify non-human traffic patterns.
## References
- UK AI Security Institute Evaluation: hxxps[://]www[.]aisi[.]gov[.]uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities
- Claude Mythos Evaluation: hxxps[://]www[.]aisi[.]gov[.]uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
- Analysis of Smaller Models: hxxps[://]aisle[.]com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
- Original Post: hxxps[://]www[.]schneier[.]com/blog/archives/2026/05/openais-gpt-5-5-is-as-good-as-mythos-at-finding-security-vulnerabilities[.]html