Full Report
Researchers tried to get ChatGPT to do evil, but it didn't do a good job LLMs are getting better at writing malware - but they're still not ready for prime time.…
Analysis Summary
# Tool/Technique: LLM-Generated Malware Scripts (Python)
## Overview
This entry summarizes code generated by Large Language Models (LLMs) like GPT-3.5-Turbo and GPT-4 when specifically prompted to create malicious functionality, such as process injection and anti-VM/sandbox checks. The primary purpose observed in the research was testing the LLMs' ability to generate code for defense evasion. The generated code, while functional in some contexts, was deemed "too unreliable and ineffective for operational deployment."
## Technical Details
- Type: Tool (Generated Code Artifacts)
- Platform: Primarily Windows (implied by targeting `svchost.exe`), scripts are Python.
- Capabilities: Attempted process injection, anti-virtual machine/sandbox artifact detection.
- First Seen: The research period described in the article (circa November 2025 reference date).
## MITRE ATT&CK Mapping
The generated scripts directly relate to techniques aimed at system interrogation and evasion:
- **Defense Evasion**
- T1027 - Obfuscated Files or Information
- T1055 - Process Injection (Targeting `svchost.exe`)
- T1497 - Virtualization/Emulation Bypass
- T1497.001 - Virtual Machine Artifacts
- T1497.002 - Hypervisor Detection
## Functionality
### Core Capabilities
- **Process Manipulation:** Attempted to generate Python code to inject itself into the legitimate Windows process `svchost.exe`.
- **Detection Termination:** Attempted to generate code to terminate anti-virus (AV) or Endpoint Detection and Response (EDR) related processes.
- **Environmental Checks:** Generated Python scripts designed to detect artifacts indicating the execution environment is a Virtual Machine (VM) or Sandbox.
### Advanced Features
- **Error Handling:** Scripts were developed under constraints that required error handling mechanisms, indicating an attempt at basic operational robustness.
- **Model-Specific Performance:** GPT-4 showed poorer performance than GPT-3.5-Turbo in early environmental detection tests (e.g., AWS VDI), though later models (GPT-5) showed greater sophistication but were harder to jailbreak effectively.
## Indicators of Compromise
*Note: Since these are experimental scripts targeting specific internal researcher setups, specific live IOCs are not present. The IOCs listed reflect the *intent* of the generated code.*
- File Hashes: N/A (Code is highly variable based on prompting)
- File Names: N/A (Hypothetical malware name not provided)
- Registry Keys: N/A
- Network Indicators: N/A (No C2 communication mentioned in the described tests)
- Behavioral Indicators:
- Attempts to enumerate running processes for AV/EDR names.
- File system or registry checks associated with common hypervisors (VMware, AWS VDI).
- Attempted memory allocation or injection into remote processes, specifically `svchost.exe`.
## Associated Threat Actors
- **Researchers/Security Testers:** The primary users were Netskope Threat Labs researchers attempting to test LLM capabilities.
- **Hypothetical Future Actors:** The context suggests that malicious actors are currently exploring these methods, though operational success remains unproven.
- **Chinese Cyber Spies (Anthropic context):** Mentioned that state actors using LLMs still require a "human in the loop," indicating a similar level of immaturity in fully autonomous attacks.
## Detection Methods
- **Signature-based detection:** Traditional signatures are unlikely to catch novel LLM-generated code immediately, though signatures targeting the specific *intent* (e.g., termination of AV processes) might succeed.
- **Behavioral detection:** Monitoring for unexpected process injection attempts targeting established Windows processes like `svchost.exe`.
- **YARA rules:** Rules could be developed based on known patterns or variable names used by modern LLMs in generating code snippets, though this is highly environment-dependent.
## Mitigation Strategies
- **Guardrail Robustness:** Continuous monitoring and updating of LLM guardrails to prevent the generation of functional malicious code, as evidenced by GPT-5's tendency to produce "safer" versions.
- **Process Monitoring:** Strict controls and monitoring over API calls related to process injection (e.g., `VirtualAllocEx`, `CreateRemoteThread`).
- **Anti-VM/Sandbox Hardening:** Ensure virtualization artifacts are robustly hidden or unavailable to unexpected processes executing on hosts.
## Related Tools/Techniques
- **Thinking Robot Malware:** Mentioned experimental malware that *can* rewrite its own code to avoid detection (though currently not operational).
- **Role-based Prompt Injection:** The technique used to bypass LLM safety controls (a form of social engineering/prompt engineering).
- **Traditional Malware utilizing T1055/T1497:** Any standard malware employing process injection or virtualization evasion techniques.