Full Report
git is a distributed version control system used everywhere. Under the hood, the entire history of the repository is tracked. Git has blobs for the files, trees for the directory structure, and comments for snapshot information. A blob is a large binary object that is saved based upon the sha256 hash of the contents and is zlib compressed. Many of these are compressed into a single file called a pack when they are no longer referenced by other objects (dangling). The commit history represents a snapshot of the repository at a point in time. They store a reference to a ree object, pointers to parent commits and metadata. When a file is removed via git rm, they can still be accessed because the history is immutable. The data of a commit is stored forever in the .git/objects folder. Additionally, the pack files contain information that is no longer referenceable by normal means. The author wanted to target all dangling objects by traversing commits with their parent commits. If a file was dangled and deleted, they dumped it to disk. More there, they would run the tool TruffleHog to check for secrets on the repo. TruffleHog supports over 800 different secret formats! They also have a verify-only flag that will check if the secret is valid or not. My main question, which they cover, is why not just use TruffleHog from the beginning? It will often skip .pack files if they were too big. By uncompressing these ourselves with the mechanism from above, TruffleHog can do its magic like normal. They scanned a crazy number of projects doing this. They found the organization names by looking at various GitHub repos with names, using the GitHub search and directly with repos over 5000 stars. All in all, they made 64K off of this research. This goes to show that novel research pays. There were a large number of false positives. In particular, dummy users for testing and canaries were very common. Why does this happen so much? The author claims that many developers just don't understand how git works with regard to deleting files. Additionally, bad .gitignores including .env and binary files were common as well. Overall, great research!
Analysis Summary
# Vulnerability: Persistent Secret Exposure in Git History and Dangling Objects
## CVE Details
- **CVE ID**: N/A (Note: This is a class of vulnerability related to improper data sanitization and misconfiguration rather than a single software flaw).
- **CVSS Score**: 7.5 - 9.8 (Estimated Variable)
- **Severity**: High to Critical (Depending on the exposed secret’s scope)
- **CWE**: CWE-200 (Exposure of Sensitive Information to an Unauthorized Actor), CWE-312 (Cleartext Storage of Sensitive Information)
## Affected Systems
- **Products**: Git-based version control repositories (GitHub, GitLab, Bitbucket, etc.).
- **Versions**: All versions of Git (by design of the protocol).
- **Configurations**:
- Publicly accessible repositories.
- Repositories where sensitive files (e.g., `.env`, `config.json`) were committed and subsequently deleted via `git rm` without rewriting history.
- Repositories with "dangling objects" or unreferenced blobs remaining in the `.git/objects` directory.
## Vulnerability Description
The vulnerability stems from a fundamental misunderstanding of Git internals. Git is a content-addressable filesystem that tracks history immutably. When a file is deleted using standard commands, its content remains stored as a **blob** within the `.git` directory to allow for historical recovery.
Furthermore, Git optimizes storage by compressing loose objects into **.pack** files. Even if a developer "removes" a secret from the current branch, the secret remains in:
1. **Commit History**: Previous snapshots of the tree.
2. **Dangling Blobs**: Objects no longer referenced by any branch/tag but not yet garbage-collected.
3. **Pack Files**: Compressed archives that may contain unreferenced data that standard secret scanners often skip due to file size or complexity.
## Exploitation
- **Status**: Exploited in the wild (Demonstrated by security researchers for Bug Bounty).
- **Complexity**: Low
- **Attack Vector**: Network (Public Internet)
- **Technique**: Automated cloning of repositories, unpacking `.pack` files using `git-unpack-objects`, and scanning all blobs—referenced or not—using tools like TruffleHog.
## Impact
- **Confidentiality**: Total (Exposure of API keys, cloud credentials, database strings, and SSH keys).
- **Integrity**: High (Leaked credentials often allow unauthorized modification of production environments).
- **Availability**: High (Potential for attackers to delete cloud infrastructure via leaked administrative tokens).
## Remediation
### Patches
There is no "patch" for Git, as this is intended functionality. Remediation involves administrative action.
### Workarounds / Best Practices
1. **History Scrubbing**: Use tools like `git-filter-repo` or BFG Repo-Cleaner to permanently remove sensitive files from all commits.
2. **Secret Rotation**: If a secret is ever committed, it must be considered compromised. **Revoke and rotate the secret immediately.**
3. **Local Prevention**: Use pre-commit hooks (e.g., `pre-commit` framework) to scan for secrets before they are committed to the local index.
## Detection
- **Indicators of Compromise**: Unauthorized access logs for cloud providers (AWS, Azure, GCP) or SaaS platforms following a repository update.
- **Detection Methods**:
- **TruffleHog**: Specifically used to scan Git history and verified secrets.
- **Git FSCK**: Run `git fsck --dangling` to identify unreferenced objects that might contain sensitive data.
- **Manual Inspection**: Monitoring `.git/objects/pack` for unusually large or old data.
## References
- **How Git Works Internally**: [https://octobot.medium.com/how-git-internally-works-1f0932067bee](https://octobot.medium.com/how-git-internally-works-1f0932067bee)
- **TruffleHog Scanner**: [https://github[.]com/trufflesecurity/trufflehog](https://github[.]com/trufflesecurity/trufflehog)
- **Author's Research Blog**: [https://medium[.]com/@sharon.brizinov/how-i-made-64k-from-deleted-files-a-bug-bounty-story-c5bd3a6f5f9b](https://medium[.]com/@sharon.brizinov/how-i-made-64k-from-deleted-files-a-bug-bounty-story-c5bd3a6f5f9b)