Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even if you don't care a lot about Apple, this is still a great read.

If you're a layman it can be hard to find information on how graphics works that is technical enough (uses terms like "user space" and "kernel"), but simple and high-level enough for somebody who doesn't know much. There is stuff like that throughout the piece.

Here's the first example:

> In every modern OS, GPU drivers are split into two parts: a userspace part, and a kernel part. The kernel part is in charge of managing GPU resources and how they are shared between apps, and the userspace part is in charge of converting commands from a graphics API (such as OpenGL or Vulkan) into the hardware commands that the GPU needs to execute.

> Between those two parts, there is something called the Userspace API or “UAPI”. This is the interface that they use to communicate between them, and it is specific to each class of GPUs! Since the exact split between userspace and the kernel can vary depending on how each GPU is designed, and since different GPU designs require different bits of data and parameters to be passed between userspace and the kernel, each new GPU driver requires its own UAPI to go along with it.



On a tangent from that quote, I'm curious how much extra perf we could squeeze from GPUs if the applications driving them were running in kernel mode (picture an oldschool boot-from-floppy game, but in the modern day as a unikernel), and therefore the "GPU driver" was just a straight kernel API that didn't need any context switching or serialized userspace/kernelspace protocol, but could rely on directly building kernel-trustable data structures and handing them off to be rendered.

Presumably there was an era of console games that did things this way, back before game consoles had OSes — but since that would be about 10–15 years ago now (the Gamecube + PS2 era) it'd be somewhat hard to judge from that what the perf margin for modern devices would be, since the modern rendering pipeline is so different than back then.


> On a tangent from that quote, I'm curious how much extra perf we could squeeze from GPUs if the applications driving them were running in kernel mode (picture an oldschool boot-from-floppy game, but in the modern day as a unikernel)

Presumably you're quite seasoned so I would assume you'd know: but Windows itself put a lot of its graphics rendering in kernel-space, they saw considerable performance gains from doing so but suffered 2 decades of severe bugs in rendering.


It would probably be more practical to map the GPU into userspace, than to put the application in kernel space.


I don't know about it being practical to map the GPU into userspace. Most systems only have the one GPU, and it being handed off to be managed by userspace (presumably via IOMMU allocation, like happens when you "pass through" a GPU to a particular virtual machine) means the kernel now can't use it. So the game can draw to the screen, but the OS can't. Which sucks for startup/shutdown, and any time things go wrong.

And yeah, if you imagine having to write a game as an OS kernel driver, then yeah, that's probably impractical. You don't want to have to program a game using kernel APIs.

But imagine instead, taking a regular OS with protected memory, and:

• Stripping out the kernel logic during context switches / interrupts, that de-elevates userland processes down to "ring 3" (or whatever the equivalent is on other ISAs.)

• Ensuring the kernel is mapped into every process's address space at a known position.

• Developing a libc where all the functions that make syscalls, have been replaced with raw calls to the mapped OS kernel functions that those syscalls would normally end up calling.

So you're still developing "userland" applications (i.e. there are still separate processes that each have their own virtual address space); but there's no syscall overhead. So a series of synchronous kernel syscalls to e.g. allocate O(N) tiny video-memory buffers, would be just as fast as (or faster than) what mechanisms like io_uring enable on Linux.


fwiw the nintendo switch has a userspace GPU driver (it's a microkernel architecture), but client apps talk to it over IPC


> I'm curious how much extra perf we could squeeze from GPUs if the applications driving them were running in kernel mode [...] could rely on directly building kernel-trustable data structures and handing them off to be rendered.

Probably not much. Applications already directly build data structures in userspace and hand them off to be rendered by the hardware; the kernel intervention is minimal, and AFAIK mostly concerns itself with memory allocation and queue management.


Exactly. In fact Mantle/Vulkan/Metal/DX12 is based on the realization that with full GPU MMUs user space can only crash its own context, so it's safe to give access to console like APIs on desktop systems, and only have the kernel driver mediate user space as much as the kernel mediates access to the CPU.


That's the premise behind this amusing presentation: https://www.destroyallsoftware.com/talks/the-birth-and-death...


You could start anew from an OS like TempleOS with a flat memory space and direct access to hardware. But it will be hard to compare until you have real world scenarios with modern apps on both platforms.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: