Full Report
Cybersecurity researchers have uncovered two malicious machine learning (ML) models on Hugging Face that leveraged an unusual technique of "broken" pickle files to evade detection. "The pickle files extracted from the mentioned PyTorch archives revealed the malicious Python content at the beginning of the file," ReversingLabs researcher Karlo Zanki said in a report shared with The Hacker News. "
Analysis Summary
# Tool/Technique: Malicious ML Models leveraging "Broken Pickle" Format (nullifAI)
## Overview
This refers to a novel evasion technique, dubbed **nullifAI**, where malicious Python code is embedded within Machine Learning (ML) model serialization files (specifically PyTorch models saved as pickle files) hosted on platforms like Hugging Face. The technique exploits the loading/deserialization process by placing the malicious payload at the beginning of the file, often breaking the standard serialization format afterward to hinder deep analysis or automated scanning tools.
## Technical Details
- Type: Technique / Attack Vector
- Platform: Python/ML Environments (specifically targeting PyTorch model loading)
- Capabilities: Execution of arbitrary code upon deserialization of a seemingly benign ML model file; evasion of standard security scanners like Picklescan.
- First Seen: Information indicates discovery shortly before February 2025.
## MITRE ATT&CK Mapping
* [T1552 - Unsecured Credentials] (Less direct, but related to code execution/trust exploitation)
* [T1588 - Obtain Capabilities] (If attackers upload these models to compromise victims)
* [T1195 - Supply Chain Compromise]
* [T1195.008 - Compromise Software Supply Chain: Libraries and Dependencies] (Loading and executing code from a third-party dependency/resource)
## Functionality
### Core Capabilities
- **Arbitrary Code Execution:** The primary function is to execute arbitrary Python code embedded within the model file immediately upon the Python process loading/deserializing the file.
- **Reverse Shell Payload:** The deployed malicious payload observed was a "typical platform-aware reverse shell."
- **Evasion via Format Corruption:** The malicious payload is situated at the start of the file/archive. After execution, the structure of the file is intentionally broken (decompilation fails shortly after payload execution), potentially evading deeper static analysis by automated tools.
### Advanced Features
- **Archive Format Obfuscation:** While PyTorch often uses ZIP for compression, the identified malicious models utilized the **7z format**, which allowed them to bypass standard security checks that might only scrutinize ZIP-compressed pickle files.
- **Bypassing Picklescan:** The combination of payload placement and subsequent file structure breakage successfully evaded detection by Hugging Face's detection tool, Picklescan.
## Indicators of Compromise
- File Hashes: Not provided in the summary.
- File Names: Models were found in Hugging Face repositories.
- Registry Keys: Not applicable/mentioned.
- Network Indicators: The payload connects to a **hard-coded IP address** (address not provided/defanged).
- Behavioral Indicators: Execution of code during the deserialization/loading of ML model files (e.g., PyTorch `.pt` files loaded via `torch.load()`).
## Associated Threat Actors
- No specific threat actor group was named; the report suggests these were more likely **Proof-of-Concept (PoC)** demonstrations of the supply chain vulnerability rather than being linked to established APTs.
## Detection Methods
- Signature-based detection: Standard signature checks are likely ineffective due to the novel embedding technique.
- Behavioral detection: Critical for detecting unexpected network connections (reverse shell activity) stemming from legitimate processes loading ML libraries.
- YARA rules if available: Not mentioned, but custom rules targeting the structure of serialized Python objects containing shell commands at the beginning of 7z/Pickle archives could be developed.
- **Specific Detection:** Analysis of ML model archives (especially those compressed oddly, like 7z) for unexpected content placement prior to standard serialization markers.
## Mitigation Strategies
- Prevention measures: Restricting library loading sources; utilizing sandboxed environments for loading models from untrusted sources.
- Hardening recommendations: Disabling or carefully reviewing the use of `pickle` serialization altogether where possible, given its inherent risks (MIMT/Sleepy Pickle vulnerability lineage). Developers must use safer serialization formats (e.g., `safetensors`).
## Related Tools/Techniques
- Sleepy Pickle (Mentioned as a related security risk associated with pickle serialization).
- General ML Model Supply Chain Attacks.