Full Report
Bug bounty programs allow security researchers to disclosure vulnerabilities to get patched. Many of these programs pay money for reporting these issues. Given that there's money on the line, there's an incentive to get a payout even if there's no real vulnerability. LLM's are great at generating content. Unfortunately, they can create content for anything, including bug bounty reports. Security is very contextual and subtle things can change whether something is exploitable or not. Because of this, incorrect LLM generated reports are becoming a major issue in the security realm. The problem with these reports is that, at a glance, they seem legitimate. To disprove the issue, it requires a large amount of context on the codebase and a deep understanding of security issues. Historically, we have assumed "good faith" research but this is starting to be abused. The is the problem is that triaging these issues takes a large amount of time. Some projects do not have the bandwidth to handle these security reports. So, they end up just paying a small bounty to avoid the delay and PR fallout. It's just cheaper to pay for the bug than hire an expert to perform the true analysis. In the case of curl, they have a large amount of reports to handle from LLMs. At curl, they have very technical folks and are able to handle these. They are usually able to identify fake reports but it still takes time. If this keeps up, restrictions may be added to bug bounty programs on the users doing it. What's the solution? Detectors and verification in my opinion. A few detectors: It's common for these reports to not include reproduction steps, making the vulnerability impossible to reproduce. So, adding a hard requirement on PoCs that run would be useful. It's common for reports to have illegitimate code links. If code being linked doesn't exist then, then it's likely trash. Making vulnerabilities needlessly complex. The styling of ChatGPT and other LLMs really likes Markdown with a lot of bullets. On the other side is verification. Platforms, like HackerOne, need to have better account verification. Once an account has been flagged as using spam, they need to ban the account, the IP and the email going forward. Sort of like cheat detection repercussions on Chess websites. Eventually, the beg bounty people would likely stop reporting things altogether. This is a hard problem to solve but it'll eventually be worked out!
Analysis Summary
# Main Topic
The proliferation of AI-generated "slop" vulnerability reports targeting bug bounty programs, leading to wasted maintainer time, strained trust, and financial incentives for fraudulent submissions.
## Key Points
- LLMs are being used to create technically convincing but factually incorrect bug bounty reports, lacking true exploitability or referencing non-existent code/functions/patches.
- The core exploitation vector is that under-resourced organizations often pay small bounties to avoid the time commitment and public relations fallout associated with deep, expert-level triage.
- The **curl project** specifically highlighted this issue, having to dedicate time to debunking numerous AI-generated reports, though their technical expertise allowed them to identify the fraud.
- Submitting such reports risks damaging researcher reputation on platforms like HackerOne, but perpetrators may still receive payouts in easier-to-triage environments. The long-term risk is the collapse of the bug bounty model if genuine researchers are frustrated or programs are abandoned.
## Threat Actors
- Unattributed malicious actors leveraging LLMs to automate the creation of fraudulent bug bounty reports.
- One specific actor linked to the **@evilginx** account was cited as using similar tactics against multiple organizations and receiving payouts in some cases.
## TTPs
- **Automated Report Generation:** Using LLMs to create reports with technical-sounding jargon, vague reproduction steps, and fabricated code references (non-existent functions, unverified patches, fake commit hashes).
- **Exploitation of Resource Constraints:** Targeting organizations lacking the necessary subject matter expertise and bandwidth to perform expert analysis ("It's cheaper to pay the bug bounty than hire an expert to perform true analysis").
- **Social Engineering/Deception:** Submitting reports that appear legitimate at a glance to pressure under-resourced teams into quick payouts.
- **Styling:** Reports often exhibit distinct LLM/ChatGPT styling, such as excessive Markdown/bullet points.
- **Lack of Proof:** Reports commonly fail to include reproducible Proofs of Concept (PoCs) that actually run.
## Affected Systems
- **Bug Bounty Platforms:** Specifically mentioned is **HackerOne**, which needs better account verification and enforcement against abusers.
- **Vulnerable Projects:** Open-source projects with limited triage bandwidth (e.g., Python ecosystem projects like CPython, pip, urllib3 reported to be affected by the trend).
- **Victims:** Organizations, particularly under-resourced ones, that operate bug bounty programs and are susceptible to paying for fake reports.
- **Specific Example:** The **curl project** was directly targeted, though their expertise allowed them to handle the verification.
## Mitigations
- **For Projects/Maintainers (Detection):**
- Hard requirement for runnable Proofs of Concept (PoCs) that demonstrate reproducibility.
- Verification that linked code references (e.g., files, commits) actually exist.
- Implementing detectors to analyze styling/complexity common in LLM outputs.
- Committing to thorough, expert analysis rather than paying to avoid triage overhead.
- **For Platforms (Verification):**
- Implementing stricter account verification processes (e.g., only allowing submissions from verified researchers).
- Implementing robust account flagging, subsequent banning of the user, associated IP, and email address upon confirming spam/fraudulent activity (similar to cheat detection on chess platforms).
## Conclusion
The rise of AI slop reports presents a severe challenge to the integrity and sustainability of bug bounty programs. If organizations continue to reward fakery due to triage capacity issues, genuine researchers may quit, eventually leading to the abandonment of these essential disclosure channels. The solution requires a dual approach: development of automated detectors by researchers coupled with platform-level enforcement and higher triage standards by participating organizations.