LeftoverLocals: Listening to LLM responses through leaked...

Full Report

GPUs are parallel and fast co-processors. They are designed to handle high throughout graphics and machine learning workloads. GPUs are made up of compute units for various computations, all of which have global memory. These have both compute and memory components. Some of the GPUs have section of memory called local memory for a given compute unit. This is a cache for processing elements within a given compute unit where global memory is too slow. The execution model is different than most programs called kernels. A GPU program is written in a Shader language like OpenCL, Metal or Vulkan with a single entrypoint function to be executed by various invocations. The vulnerability is that the local memory of programs being executed by different users on a compute unit are not properly cleared. As a result, it's possible to steal information across the different program runs! For instance, if there's a GPU job being executed by one process then a malicious process could execute a job directly after this one to steal information from local memory of the original one. The rest of the post is doing this on specific platforms and actually extracting the information from various platforms. The main interesting thing is that all applications on various platforms (like Android) have access to GPUs, making any application an potential attacker to exploit this. Although this is interesting, I don't think it's worth putting into this post but may be worth coming back to. The disclosure process was done through all GPU providers such as Apple, AMD, ARM and many others. Many of these were fixed, which is awesome. Overall, a good post for a relatively simple bug. I personally felt that it was theatricized too much with the name, impact, images, etc. I just love when bugs are talked about and explained :)

Analysis Summary