SP1 and zkVMs: A Security Auditor's Guide

Full Report

SP1 is a zero-knowledge virtual machine (zkVM) that enables developers to prove the execution of arbitrary programs that can be compiled to RISC-V. Most of the code that uses this is written in Rust though. The ZK circuits enable devs to write standard Rust code to generate their cryptographic proofs, instead of domain-specific languages. The goal of this post is to prime security auditors to review code that uses SP1. The SPL architecture is as follows: Compile the code into a RISC-V ELF binary. Execute the program in a zkVM. This will generate the STARK proof to be used later. Optimize and verify the proof. This is the mathematical verification that the code ran as intended. The system consists of two components: the prover and the verifier. The prover executes the guest program and generates the ZK proof. The verifier takes in the proof and validates the cryptographic assumptions of it. This should come from the prover but a malicious actor can submit whatever they want. If the verification succeeds, the claimed computation has occurred. The system is separated into Host and Guest systems. The Host is the standard machine that executes code, such as the machine you're using to view this website. The Guest program runs inside a VM that is completely separate from everything else. No Internet access, no databases, no nothing. When reading the code, the host and guest code is somewhat intertwined, making it an important distinction. The first security note is that all input data is untrusted. If input is coming from the HOST to the GUEST, then the inputs must be validated. Range checks, length checks, business logic constraints, etc. should all be done. On this note, only GUEST Code is proven - not code running on the HOST. So, if there's a check in the HOST that's not in the GUEST, you probably have a bug. SP1 uses 32-bit RISC-V. When coming or using 64-bit systems, this can cause issues. For instance, integer truncations and overflows should always be checked if dealing with usize values. On top of this integration issue, many dependencies attempting to be added to SP1 compiled code were not meant to be. This can lead to similar types of integer issues, operating system calls, unsafe code, and many other weird quirks. When using SP1, data can be committed to become a public output. Naturally, if we're doing zero-knowledge proofs, the public information should be carefully audited. For instance, disclosing someone's age would be inappropriate. Another issue that is weird to me is Verification Key Management. In SP1, each program generates two keys: one for the prover and another for the verifier. Each guest program must have a unique verification key derived from its binary and not allow older key versions. There are cases where information cannot be computed within the proof but rather statically as part of the output. For instance, a merkle proof can be generated. The validity is determined based upon the block hash associated with it. So, the block hash must be validated separately from the program. For SP1, you would want to make this a committed value as an output for external validation. The most common vulnerability is around "Underconstrained circuits". This is simply the insufficient validation of state transitions in a program. This is basic logic validation like most other things. According to the post, practical knowledge of STARKs/SNARKs isn't necessary for auditing SP1 programs, unlike other cryptographic primitives. A solid introduction to reviewing SP1 programs. I feel like this demystified a lot of terminology as well, which I really appreciated.

Analysis Summary