Full Report
A publicly accessible database belonging to DeepSeek allowed full control over database operations, including the ability to access internal data. The exposure includes over a million lines of log streams with highly sensitive information.
Analysis Summary
# Incident Report: Publicly Exposed ClickHouse Database at DeepSeek
## Executive Summary
Wiz Research discovered a publicly accessible, unauthenticated ClickHouse database belonging to DeepSeek, an AI startup. This exposure allowed full control over database operations and revealed over a million lines of sensitive log data, including chat history, API keys, and backend details. The issue was promptly disclosed by the researchers, and DeepSeek successfully mitigated the vulnerability by securing the database immediately upon notification.
## Incident Details
- Discovery Date: Not explicitly stated, but occurred rapidly during external posture assessment.
- Incident Date: Not explicitly stated, but reflects the period the database was exposed.
- Affected Organization: DeepSeek
- Sector: Artificial Intelligence (AI Startup)
- Geography: China (Inferred, as DeepSeek is a Chinese AI startup)
## Timeline of Events
### Initial Access
- Date/Time: Assessment began shortly after DeepSeek garnered media attention for its AI models.
- Vector: External Reconnaissance and Port Scanning.
- Details: Passive and active discovery of subdomains identified hosts like `oauth2callback.deepseek.com` and `dev.deepseek.com`. Unusual open ports (8123 and 9000) were detected on these hosts, leading to an unauthenticated ClickHouse database.
### Lateral Movement
- No evidence of internal lateral movement by the reporter; however, the exposure allowed for direct execution of arbitrary SQL queries via the hosted ClickHouse instance’s HTTP interface (`/play` path), enabling potential access, enumeration, and exfiltration of data from the database server.
### Data Exfiltration/Impact
- Over 1 million log entries from the `log_stream` table were accessible.
- Sensitive data included: Chat History, API Keys, backend details, various internal DeepSeek API endpoint references, and operational metadata.
- The access permitted execution of commands like `SELECT LOAD_FILE('{FileName}');`, indicating potential for sensitive file exfiltration or server compromise had it been exploited maliciously.
### Detection & Response
- **Detection:** Proactive security assessment conducted by Wiz Research team.
- **Response Actions:** Wiz Research team immediately and responsibly disclosed the issue directly to DeepSeek. DeepSeek promptly secured the exposure.
## Attack Methodology
- **Initial Access:** Network scanning identified publicly accessible services on non-standard ports (9000, 8123) hosting a database. Direct, unauthenticated access to the ClickHouse HTTP interface `/play` path.
- **Persistence:** N/A (This was an exposure, not a sustained intrusion campaign).
- **Privilege Escalation:** The exposure inherently granted administrative control over the database, allowing for arbitrary SQL execution, which could be used for further internal access or potential privilege escalation within the database environment.
- **Defense Evasion:** No active defense mechanisms (authentication) were in place for the database port.
- **Credential Access:** API Keys and potentially credentials were explicitly present in plaintext within the log data columns (`string.values`, `_source`).
- **Discovery:** Enumeration performed via `SHOW TABLES;` query to map accessible datasets.
- **Lateral Movement:** N/A (External exposure only).
- **Collection:** Direct querying of the accessible `log_stream` table.
- **Exfiltration:** Potential for data exfiltration existed via SQL commands (e.g., `LOAD_FILE`). (Note: The researcher refrained from executing intrusive exfiltration queries.)
- **Impact:** Data exposure including proprietary and sensitive user information.
## Impact Assessment
- **Financial:** Not quantified in the report.
- **Data Breach:** Confirmed exposure of **over 1 million log entries** containing Chat History, API Secrets, backend details, and operational system information.
- **Operational:** Potential for disruption if malicious actors had executed destructive commands, but minimal operational impact reported as the access was discovered rapidly.
- **Reputational:** DeepSeek gained media attention regarding the rapid response to the discovery.
## Indicators of Compromise
- **Network indicators:**
- `oauth2callback.deepseek.com:9000` (Defanged)
- `dev.deepseek.com:9000` (Defanged)
- `oauth2callback.deepseek.com:8123` (Defanged)
- `dev.deepseek.com:8123` (Defanged)
- **File/Database Indicators:** Presence of ClickHouse database tables, specifically `log_stream`.
- **Behavioral indicators:** Unauthenticated access to the ClickHouse HTTP interface `/play` path.
## Response Actions
- **Containment measures:** Wiz Research immediately notified DeepSeek. DeepSeek secured the exposure (presumably by disabling public access or adding authentication).
- **Eradication steps:** Not detailed, but would involve patching the configuration exposing the database.
- **Recovery actions:** Not detailed, but would involve auditing logs for prior malicious access between January 6, 2025, and the time of remediation.
## Lessons Learned
- The immediate security risks for rapidly adopting AI applications often stem from fundamental infrastructure misconfigurations (accidental external database exposure), rather than complex, futuristic threats.
- Rapid growth in the AI sector can lead to overlooking baseline security best practices for supporting tools and infrastructure.
- Entrusting sensitive data requires enforcing security practices equivalent to those of major cloud providers, even for startups.
## Recommendations
- Enforce strict segmentation and authentication (e.g., strong passwords, IP whitelisting, TLS) on all analytical and logging databases (like ClickHouse).
- Security teams must maintain visibility across the entire technology stack used by AI engineers, including tooling and backend services.
- Prioritize basic, foundational security checks (e.g., external cloud asset inventory, port scanning) before high-level model performance assessments.