Full Report
In Solidity, msg.data is the incoming data in the request as defined by the ABI. Using a hash of this for some cryptographic operation is a real bad idea. But why? The original issue with this existed in the V1 ABI encoder. When encoding information, all data bytes are 32 bytes when sent in the unpacked format. However, some of the bits may be dropped with datatypes, such as uint8s first 31 bytes. Since the truncation occurs with unused bytes, it does not affect any of the actual values but changes the hash. According to Solidity Github issue this only works on the V1 ABI encoder and not the V2 version, which is the default in Solidity 8.0.0+. Regardless of this, there are other ways to abuse this. For instance, you can append arbitrary data to the end of the msg.data that will simply be ignored to change the hash. Additionally, some things, like dynamic data types, have infinite ways they can be encoded. Overall, interesting Solidity quirk that many people may not consider. Thanks for calling this out.
Analysis Summary
# Vulnerability: Deterministic msg.data Hashing Inconsistency
## CVE Details
- **CVE ID**: N/A (General Logic Error/Design Pattern Flaw)
- **CVSS Score**: 5.3 (Medium) - *Estimated based on integrity impact*
- **CWE**: CWE-345: Insufficient Verification of Data Authenticity
## Affected Systems
- **Products**: Smart Contracts written in Solidity.
- **Versions**: Primarily impacts Solidity versions using the **ABI Encoder V1** (Pre-Solidity 0.8.0).
- **Configurations**: Contracts that use `keccak256(msg.data)` or similar hashing of the raw input buffer for cryptographic signatures, replay protection, or state commitments.
## Vulnerability Description
The vulnerability arises from the assumption that a specific set of input parameters will always result in a unique `msg.data` hash. In Solidity, several factors allow multiple different byte sequences to represent the same logical input:
1. **ABI Encoder V1 Padding**: In the V1 encoder, data is padded to 32-byte words. For smaller types (like `uint8`), the first 31 bytes are ignored by the contract logic but included in the hash. Manipulating these "unused" bits changes the hash without changing the transaction's result.
2. **Trailing Data (Calldata Over-allocation)**: Arbitrary garbage data can be appended to the end of a transaction's `msg.data`. The Ethereum Virtual Machine (EVM) ignores data beyond what is required by the function signatures, but this extra data alters the hash of `msg.data`.
3. **Dynamic Type Malleability**: Certain dynamic data types have multiple valid encoding patterns (non-canonical representations) that resolve to the same values but produce different raw byte sequences.
## Exploitation
- **Status**: Theoretical/PoC available (Commonly discussed in smart contract security audits).
- **Complexity**: Low.
- **Attack Vector**: Network (Smart Contract Interaction).
## Impact
- **Confidentiality**: None.
- **Integrity**: Medium (Allows for hash-collision attacks, bypass of replay protection, or invalidation of commitments if the hash is used as a unique identifier).
- **Availability**: None.
## Remediation
### Patches
- **Upgrade Compiler**: Use **Solidity 0.8.0 or higher**, which enables the **ABI Encoder V2** by default. This version has stricter validation and reduces (though does not entirely eliminate) encoding malleability.
### Workarounds
- **Avoid Hashing Raw msg.data**: Instead of hashing `msg.data`, hash the specific parameters directly (e.g., `keccak256(abi.encode(param1, param2))`).
- **Use abi.encode over abi.encodePacked**: `abi.encode` is generally safer for preventing collisions between different parameter types, though it does not prevent trailing data issues if hashing the entire buffer.
## Detection
- **Indicators of compromise**: Multiple transactions to the same function with identical parameters but different raw calldata signatures.
- **Detection methods and tools**:
- **Static Analysis**: Use tools like **Slither** or **Mythril** to flag usage of `msg.data` inside hashing functions.
- **Manual Code Review**: Inspect any logic where `keccak256(msg.data)` is used as a key in a mapping or as a unique identifier for a signature.
## References
- **Solidity Github Issue Trace**: hxxps[://]github[.]com/ethereum/solidity/issues
- **Solidity Documentation (ABI Spec)**: hxxps[://]docs[.]soliditylang[.]org/en/v0.8.0/abi-spec[.]html
- **SWC Registry (CWE Mapping)**: hxxps[://]swcregistry[.]io/docs/SWC-101