Full Report
Backslash Security found that naïve prompts resulted in code vulnerable to at least four of the of the 10 most common vulnerabilities across popular LLMs
Analysis Summary
This vulnerability summary is based on the analysis of coding practices in Large Language Models (LLMs) as reported by Backslash Security. Since this is a general finding about LLM behavior rather than a specific patch for a single product, CVEs are not explicitly assigned in the report.
# Vulnerability: LLMs Frequently Produce Insecure Code by Default
## CVE Details
- CVE ID: N/A (Systemic issue identified, no specific CVE assigned in the report)
- CVSS Score: N/A (Score not provided for the general finding)
- CWE: Multiple high-severity CWEs implied (e.g., Command Injection, XSS, Insecure File Handling).
## Affected Systems
- Products: Popular LLMs including OpenAI's GPT (current versions analyzed), Anthropic's Claude, and Google's Gemini.
- Versions: Current versions of the models tested at the time of the analysis (April 2025).
- Configurations: Code generated in response to "naïve" or simple prompts that do not explicitly specify security requirements.
## Vulnerability Description
The study found that several leading LLMs, when prompted to generate code for specific applications without explicit security instructions, default to producing insecure code. This insecure code is often vulnerable to common weaknesses listed in the CWE Top 10, including Command Injection, Cross-Site Scripting (XSS - Backend and Frontend), Insecure File Upload, and Path Traversal. Even prompts that attempt to request secure code often still result in flawed output. This introduces a risk where developers relying on "AI-generated code" or "vibe coding" may unknowingly integrate critical vulnerabilities into applications.
## Exploitation
- Status: PoC available (Implied through the research demonstrating the code generation capability).
- Complexity: Low (The complexity lies in crafting the initial prompt that leads to the vulnerable output, but the resulting code is inherently flawed).
- Attack Vector: Dependent on the specific vulnerability generated (e.g., Network for XSS/Injection, Local for file handling).
## Impact
The impact is determined by the resulting code vulnerability:
- Confidentiality: High (e.g., successful Command Injection or Path Traversal)
- Integrity: High (e.g., unauthorized data modification or system compromise)
- Availability: Medium to High (e.g., application denial of service from injection attacks)
## Remediation
### Patches
As this is a finding regarding the output behavior of foundational models, a traditional patch does not apply directly to the LLMs in a software update sense. Remediation focuses on output validation and improved prompting.
- **Vendor Resolution:** LLM providers are expected to refine training data, reinforcement learning, and safety guardrails to prevent the generation of insecure patterns.
### Workarounds
1. **Explicit Security Prompting:** Developers must rigorously specify security requirements (e.g., "Ensure this function is protected against SQL Injection using parameterized queries") in all generation prompts, even if the basic function is requested.
2. **Code Review Mandate:** Treat all AI-generated code as untrusted input. Mandate stricter, mandatory security code review and static analysis scanning (SAST) before integration into production environments.
3. **Use Security-Focused LLMs/Tools:** Leverage tools or specialized LLM pipelines designed specifically for secure code generation, if available.
## Detection
- **Indicators of Compromise (IoCs):** Look for code patterns matching known insecure implementations (e.g., using unsanitized user input directly in system calls, unsafe file operation functions).
- **Detection Methods and Tools:**
* **SAST Tools:** Utilize Static Application Security Testing tools configured to flag high-risk CWEs mentioned (Command Injection, XSS vectors).
* **Dynamic Analysis (DAST):** Run application security tests immediately after integrating AI-generated segments to catch runtime issues.
* **Prompt Auditing:** Review the historical prompts used by developers interacting with code generation models to identify 'naïve' prompting patterns.
## References
- Backslash Security Analysis (Specific link not provided in the text, search for "Backslash Security AI code generation vulnerability 2025").
- Infosecurity Magazine article: Popular LLMs Found to Produce Vulnerable Code by Default