Full Report
This post assumes a passing familiarity with what a Distinguishing Attack on a cryptographic hash is, as well as the high level composition of Bitcoin block headers and mining them. tldr: To distinguish between an ideal random permutation hash and SHA256, hash a large amount (~2^80) of candidate 1024 bit blocks twice, as done in Bitcoin. Ensure that the bits of the candidate blocks are sparsely set (much fewer than the 512 mean expected), according to the Bitcoin protocol, discarding candidate blocks that do not meet the Bitcoin “difficulty” standard (where the resultant hashes start with a the large number of 0’s). With the remaining set of valid input candidates (467369 when this analysis was done), observe a particular set of 32 bits in the input block (located where Bitcoin has the nonce, input bits 607-639). Note that the mean number of bits set in the nonce field is skewed to the left, i.e. fewer than the expected value of 16 bits set (estimated mean 15.428).
Analysis Summary
# Research: A distinguisher for SHA256 using Bitcoin (mining faster along the way)
## Metadata
- Authors: frans (SensePost Author)
- Institution: SensePost
- Publication: SensePost Blog
- Date: 16 October 2017
## Abstract
This research explores the possibility of distinguishing the SHA-256 hash function, as used in Bitcoin mining, from an ideal random permutation (IRP) by observing statistical biases in inputs that satisfy Bitcoin's difficulty targets. By hashing a corpus of $\sim 2^{80}$ candidate block headers and filtering those that meet the current difficulty (producing a starting sequence of zeros), the research focuses on the statistical distribution of bits within the remaining valid input space, specifically targeting the 32-bit nonce field. The key finding is a statistically significant deviation from expected randomness in the nonce bits, suggesting a weak distinguishing characteristic exploited by the specific input construction required by Bitcoin.
## Research Objective
The primary objective is to empirically demonstrate a **distinguishing attack** against the SHA-256 function when used under the specific constraints imposed by the Bitcoin mining protocol (i.e., hashing sparsely set 1024-bit inputs that result in low-entropy outputs due to high difficulty). A secondary objective is to determine if these observed biases can be leveraged to optimize Bitcoin mining speed.
## Methodology
### Approach
The methodology involves a large-scale empirical test:
1. **Input Generation:** Candidate 1024-bit block headers are generated. These inputs are characterized by having sparsely set bits, adhering to the non-average sparsity expected in many cryptographic inputs.
2. **Two-Pass Hashing:** Each candidate block header is hashed twice, mirroring the double-SHA256 operation inherent in Bitcoin block processing.
3. **Difficulty Filtering:** Candidate blocks are retained only if their resulting hash meets the current Bitcoin difficulty standard (i.e., starts with a required large number of leading zeros).
4. **Statistical Analysis:** For the subset of valid inputs ($\sim 467,369$ examples at the time of analysis), the distribution of bits in the nonce field (bits 607-639 of the 1024-bit input) is meticulously examined.
### Dataset/Environment
The study utilized a massive dataset of hash operations, estimated to involve hashing $\sim 2^{80}$ candidate 1024-bit blocks. The "dataset" analyzed comprised the $\sim 467,369$ input blocks that successfully met the Bitcoin difficulty target during the test period.
### Tools & Technologies
The specific tools are not detailed, but the process requires high-performance hashing infrastructure capable of managing $2^{80}$ hashing operations and specialized statistical analysis tools to observe bit distributions.
## Key Findings
### Primary Results
1. **Distinguishability Demonstrated:** The research concludes that it is theoretically possible to distinguish between SHA-256 and an ideal random permutation using this input construction method after approximately $2^{80}$ hash operations.
2. **Nonce Bias:** The mean number of set bits within the 32-bit nonce field of successful inputs is significantly lower than the expected random average of 16 bits (estimated mean of 15.428 achieved, or 15.701 for a specific test set).
3. **Histogram Skew:** The distribution histogram of set bits across the nonce field is skewed towards the left, meaning partitions corresponding to fewer set bits are taller than expected relative to the right side of the mean of 16.
4. **Individual Bit Correlation:** Specific bits within the nonce show a propensity to be zero more than 50% of the time (e.g., Bit 7 of the nonce was 0 55.9% of the time for difficulty 1000+). Furthermore, the incidence of multiple specific bits being zero simultaneously is much higher than predicted by independence (e.g., bits 5, 6, and 7 being zero occurred 17% of the time, versus an expected 16.2% derived from conditional probabilities, or 12.5% if fully independent).
### Supporting Evidence
The observed mean number of set bits (15.701) for a large set of nonces showed a "highly significant deviation" from the expected value of 16. The correlation among bits 5, 6, and 7 being zero strongly deviates from independence checks.
### Novel Contributions
The core innovation lies in applying the framework of a **distinguishing attack** to a real-world, high-volume cryptographic application (Bitcoin mining) under specific input constraints (sparse inputs and high-output entropy requirements dictated by difficulty). It directly links these statistical biases to the practical non-randomness of the nonce field based on the required hash output.
## Technical Details
The observed bias is hypothesized to result from the *combination* of two factors:
1. **Sparsity of Input Bits:** The requirement for the input blocks to have few bits set overall (due to the nature of the 1024-bit block structure and sparse assignment of initial elements).
2. **Output Constraint:** The constraint that the resulting double-SHA256 hash must start with many zeros (high difficulty).
This coupling between sparse input selection and high-order zero-bits in the output seems to induce a feedback effect that pushes the exploitable variable (the nonce) toward inputs that contain fewer set bits than random chance would suggest. Historically, early mining software resetting the nonce to 0 may have introduced initial bias, but late-stage optimizations removed this, leaving the observed bias attributable to the hash function's interaction with the difficulty requirements.
## Practical Implications
### For Security Practitioners
The findings suggest that the SHA-256 function, while robust against generic attacks, exhibits input-dependent statistical deviations when subjected to extreme constraints (like those found in proof-of-work systems). This underscores the importance of considering specific application contexts when assessing cryptographic security margins.
### For Defenders
A potential optimization for miners is identified: **skipping the testing of nonces where specific bits (like bits 5, 6, or 7) are set to 1.** By avoiding 87.5% of candidate nonces based on this statistical insight, a miner could theoretically achieve a mining speed improvement or hashing efficiency gain (reported as a potential 36% speedup in hashing effort expended vs. hashes found).
### For Researchers
This work serves as a strong empirical foundation for investigating side-channel or bias-based attacks against specific cryptographic uses of SHA-256, rather than challenging the algorithm's fundamental theoretical properties against truly random inputs.
## Limitations
The primary technical limitation mentioned is the massive computational overhead required ($\sim 2^{80}$ hashes) to observe the necessary confidence level for the distinguishing effect. The observed speedup is based on an exploitation of a *conditional* bias—it does not imply SHA-256 is insecure against arbitrary inputs. Furthermore, the analysis acknowledges that early, non-optimized Bitcoin software might have artificially inflated initial biases, though later analysis suggested the observed deviation persisted past these initial transient states.
## Comparison to Prior Work
The work builds upon the generic definition of a distinguishing attack on a cryptographic hash (differentiating it from an IRP). It deviates from purely theoretical cryptographic analysis by grounding the experiment within the extremely specific, non-ideal inputs generated by the Bitcoin mining process (1024-bit block headers, double hashing, extreme output sparsity).
## Real-world Applications
- **Mining Optimization:** Directly applicable to increasing the profitability of Bitcoin mining operations by reducing the search space effectively explored.
## Future Work
- Quantifying the exact computational cost versus the realized speed benefit of the suggested nonce exclusion optimization.
- Investigating whether similar input/output constraints in other proof-of-work systems relying on SHA-256 exhibit comparable statistical biases.
## References
- Bitcoin Block Hashing Algorithm documentation.
- General concepts of Distinguishing Attacks.
- Previous analyses on Bitcoin mining algorithms (e.g., Righto's work cited for background on mining).