Asahi GPU Hacking [Hackaday]

View Article on Hackaday

[Alyssa Rosenzweig] has been tirelessly working on reverse engineering the GPU built into Apple’s M1 architecture as part of the Asahi Linux effort. If you’re not familiar, that’s the project adding support to the Linux kernel and userspace for the Apple M1 line of products. She has made great progress, and even got primitive rendering working with her own open source code, just over a year ago.

Trying to mature the driver, however, has hit a snag. For complex rendering, something in the GPU breaks, and the frame is simply missing chunks of content. Some clever testing discovered the exact failure trigger — too much total vertex data. Put simply, it’s “the number of vertices (geometry complexity) times amount of data per vertex (‘shading’ complexity).” That… almost sounds like a buffer filling up, but on the GPU itself. This isn’t a buffer that the driver directly interacts with, so all of this sleuthing has to be done blindly. The Apple driver doesn’t have corrupted renders like this, so what’s going on?

[Alyssa] gives up a quick crash-course on GPU design, primarily the difference between desktop GPUs using dedicated memory, and and mobile GPUs with unified memory. The M1 falls into that second category, using a tilebuffer to cache render results while building a frame. That tilebuffer is a fixed size. There’s the overflow that crashes the frame rendering. So how is the driver supposed to handle this? The traditional answer is to just allocate a bigger buffer, but that’s not how the M1 works. Instead, when the buffer reaches full, the GPU triggers a partial render, which eats the data in the buffer. The problem is that the partial render is getting sent to the screen rather than getting properly blended with the rest of the render. Why? Back to capturing the commands used by Apple’s driver.

The driver does something odd, it sets two separate load and store programs. Knowing that the render buffer gets moved around mid-render, this starts to make sense. One function is for a partial render, the other for the final. Omit setting up one of these, and when the GPU needs the missing function, it de-references a null pointer and rendering explodes. So, supply the missing functions, get the configuration just right, and rendering completes correctly. Finally! Victory never tastes so sweet, as when it comes after chasing down a mystifying bug like this.

Need more Asahi Linux in your life? [Hector Martin] did an interview on FLOSS Weekly just this past week, giving us the rundown on the project.