Even if you don't care a lot about Apple, this is still a great read. If you're ...

derefr · on March 20, 2023

On a tangent from that quote, I'm curious how much extra perf we could squeeze from GPUs if the applications driving them were running in kernel mode (picture an oldschool boot-from-floppy game, but in the modern day as a unikernel), and therefore the "GPU driver" was just a straight kernel API that didn't need any context switching or serialized userspace/kernelspace protocol, but could rely on directly building kernel-trustable data structures and handing them off to be rendered.

Presumably there was an era of console games that did things this way, back before game consoles had OSes — but since that would be about 10–15 years ago now (the Gamecube + PS2 era) it'd be somewhat hard to judge from that what the perf margin for modern devices would be, since the modern rendering pipeline is so different than back then.

dijit · on March 20, 2023

> On a tangent from that quote, I'm curious how much extra perf we could squeeze from GPUs if the applications driving them were running in kernel mode (picture an oldschool boot-from-floppy game, but in the modern day as a unikernel)

Presumably you're quite seasoned so I would assume you'd know: but Windows itself put a lot of its graphics rendering in kernel-space, they saw considerable performance gains from doing so but suffered 2 decades of severe bugs in rendering.

Retr0id · on March 20, 2023

It would probably be more practical to map the GPU into userspace, than to put the application in kernel space.

derefr · on March 22, 2023

I don't know about it being practical to map the GPU into userspace. Most systems only have the one GPU, and it being handed off to be managed by userspace (presumably via IOMMU allocation, like happens when you "pass through" a GPU to a particular virtual machine) means the kernel now can't use it. So the game can draw to the screen, but the OS can't. Which sucks for startup/shutdown, and any time things go wrong.

And yeah, if you imagine having to write a game as an OS kernel driver, then yeah, that's probably impractical. You don't want to have to program a game using kernel APIs.

But imagine instead, taking a regular OS with protected memory, and:

• Stripping out the kernel logic during context switches / interrupts, that de-elevates userland processes down to "ring 3" (or whatever the equivalent is on other ISAs.)

• Ensuring the kernel is mapped into every process's address space at a known position.

• Developing a libc where all the functions that make syscalls, have been replaced with raw calls to the mapped OS kernel functions that those syscalls would normally end up calling.

So you're still developing "userland" applications (i.e. there are still separate processes that each have their own virtual address space); but there's no syscall overhead. So a series of synchronous kernel syscalls to e.g. allocate O(N) tiny video-memory buffers, would be just as fast as (or faster than) what mechanisms like io_uring enable on Linux.

Retr0id · on March 23, 2023

fwiw the nintendo switch has a userspace GPU driver (it's a microkernel architecture), but client apps talk to it over IPC

cesarb · on March 20, 2023

> I'm curious how much extra perf we could squeeze from GPUs if the applications driving them were running in kernel mode [...] could rely on directly building kernel-trustable data structures and handing them off to be rendered.

Probably not much. Applications already directly build data structures in userspace and hand them off to be rendered by the hardware; the kernel intervention is minimal, and AFAIK mostly concerns itself with memory allocation and queue management.

monocasa · on March 20, 2023

Exactly. In fact Mantle/Vulkan/Metal/DX12 is based on the realization that with full GPU MMUs user space can only crash its own context, so it's safe to give access to console like APIs on desktop systems, and only have the kernel driver mediate user space as much as the kernel mediates access to the CPU.

MrRadar · on March 20, 2023

That's the premise behind this amusing presentation: https://www.destroyallsoftware.com/talks/the-birth-and-death...

speed_spread · on March 20, 2023

You could start anew from an OS like TempleOS with a flat memory space and direct access to hardware. But it will be hard to compare until you have real world scenarios with modern apps on both platforms.