Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is about 3D rendering, to be precise; I believe 2D acceleration goes through the same lower layers but the higher ones are very different.

Incidentally, one thing I noticed when I was trying to port Linux GPU drivers to Windows some time ago is what appeared to be an excessive amount of indirection; there are so many layers and places where things could be simpler.



2D acceleration is generally done through the same APIs, specifically OpenGL and Vulkan. Classically, the X compositor would use the GLX_EXT_texture_from_pixmap extension to import an X pixmap representing a window surface into OpenGL, where it can be used like any other texture. For the Wayland compositor, I believe you'd use EGL_WL_bind_wayland_display to bind a Wayland surface to an EGLImage, and then glEGLImageTargetTexture2DOES (can't believe I have that function name memorized) to bind that EGLImage to an OpenGL texture, where it can be used in the same way. Vulkan has similar extensions.

On the client side, I think most Linux apps still draw their UIs on CPU, usually accelerated with SIMD. Firefox and Chrome (I think SkiaGL is enabled on Linux?) are exceptions; they use OpenGL and/or Vulkan to draw their UI. Video playback is a different beast and in theory relies on vendor-specific extensions to decode the video in hardware. However, the last time I looked at Linux video decoding (which was years ago), the drivers were awful and interfacing with each vendor's APIs was a huge pain, and so most apps just did video decoding on CPU. (Besides, the Linux ecosystem prefers open codecs, and hardware has only recently gotten support for non-patent-encumbered video formats.)


> However, the last time I looked at Linux video decoding (which was years ago), the drivers were awful and interfacing with each vendor's APIs was a huge pain, and so most apps just did video decoding on CPU.

Nowadays VA-API is near universally supported, and any half-decent video player uses it to do hardware decoding.


For client side Qt has good GPU support but only for QML. All QML is drawn on GPU by default (expect text I think, which uses Harfbuzz) but all Qt Widgets are drawn on CPU. However things like KDE's Wayland uses direct OpenGL calls for faster composition.

Firefox has Web Render running on top of ANGLE which is a generic OpenGL layer that converts the OpenGL calls into native platform calls. ANGLE is a Google project and it is the base library for Skia which is used by Chromium to render everything. IIRC Qt / QML also uses ANGLE for Windows.


Why are toolkits still rendered on the CPU?


It's a ton of effort to write a GPU vector renderer that's both compatible with existing apps and faster than the CPU. Switching to SkiaGL would probably be the easiest approach to migrate to GPU rendering, but Skia is notoriously difficult to use outside of Google's codebases. (The running joke being "the recommended way to build Skia is to get a job at Google, but there are some workarounds available if for some reason that isn't practical.")


I love this joke! We use Skia as a PDF renderer and it does take a bit of plumbing to get it in, plus you have to track it more often than we'd like (that's not a fault of the build environment, but rather it doesn't have a stable API), plus we have local mods.


High-quality text rendering on GPU is surprisingly tricky and inefficient, unless you're using something simple like a glyph cache.


Does something like Signed Distance Fields help or is it just be an added value and not a complete different way of doing it?


Only for certain fonts at certain sizes, and only if you have the SDF generated ahead of time, both of which mean that the technique isn't general enough to render arbitrary fonts in .otf format.

Rasterizing small glyphs is super fast anyway, there's not much of a need to accelerate it if you can just cache the glyph bitmaps.


QML uses it, but it doesn't look as good as FreeType with LCD subpixel rendering and light hinting... if that is what you like (it is what I like).


I believe GTK4 uses the GPU by default


I think Cairo used OpenGL too. And X.Org itself had stuff like Glamor, XRender...


Cairo's OpenGL backends, such as glamor, never really made it out of the experimental phase and were rarely if ever used as far as I know.


The most viable approach these days is for 2D is to use the 3D hardware. There's no standard, usable API for 2D accelerated drawing the way there is for 3D, nor does it quite make sense for there to be one.

(No, OpenVG is not viable. No, Xrender is not viable. cairo and Skia both use the 3D hardware in combination with a CPU render engine.)


For the most part that's true, but simple 2D compositing is a bit of a different beast, because it can sometimes be done at scanout time, saving a blit. Last I checked, (non-Android) Linux rarely makes use of this except for the mouse cursor. But in general you can save a good bit of energy and memory bandwidth on HiDPI displays if you try to use 2D hardware layers where you can. You can virtually never use them for the UI itself, because they're far too limited, but the windowing system can often use them to composite windows together. It'd be nice if Wayland compositors made more use of this, e.g. to avoid having to blit the foreground window every frame.


I used to work on mobile graphics and the android HWC stack.

The scanout-time hardware was often less useful that you might think - only in dynamic scenes where the GPU is otherwise idle (like playing video possibly with a static UI overlay was the premier use case).

For static scenes it's more efficient to render out to a buffer (using the GPU as the scanout overlay pipes often had limited feedback capability) and just output that using overlays disabled. It didn't take many frames for that to be worth it.

For apps that were animating or otherwise updating it's window, most UI toolkits used the GPU for widget rendering. And often the scanout pipes didn't hook into the (relatively large) system caches like the GPU did, so there were times it was again faster to composite the screen on the GPU to a single scanout buffer than flush already cached data, the get the scanout hardware to read it back from the memory bus.

And there weren't as cheap as people thought - one stat I remember was that the total area of the GPU on the omap4 platform was smaller than the display pipes. Though that is now a pretty old chip, and always had a bit of focus on "multimedia".


I think your information is quite outdated. The HWC overlay planes are heavily used, you can see this trivially just doing a 'dumpsys SurfaceFlinger' or grabbing a systrace/perfetto trace. When it falls back to GPU composition it's very obvious as there's a significant hit to latency and more GPU contention.

The overlay capabilities of the modern Snapdragons are also quite absurd. They support like upwards of a dozen overlays now and even have FP16 extended sRGB support. Some HWCs (like the one in the steam deck) even have per plane 3D LUTs for HDR tone mapping (ex https://github.com/ValveSoftware/gamescope/blob/master/src/d... )

The composition is bandwidth heavy of course, but for static scenes there's a cache after the HWC in the form of panel self refresh.


CRTC planes and scanout-time compositing makes sense, and Wayland compositors do use them, even for non-cursor surfaces. It's simply not something an application can use general-purpose and guarantee (though see the recent GtkSurfaceOffload stuff for the latest attempt at it).

Personally, I don't see it as a "2D drawing API", it doesn't accelerate anything special about 2D, only blits and transforms, which a 3D API will eat for breakfast.


what happened with these VESA "2d accelerated" API that was on every SVGA card in the middle of the 90s ? They make a huge difference and was well supported on Windows and X11


That stuff has been obsolete for quite a while as the general 3D capabilities are more than enough to saturate all the GPU's memory bandwidth.


If it's a VESA standard and still supported it might be useful as a fallback for hardware that doesn't have its own driver.

Edit: But actually, I couldn't find references to anything similar besides VBE/AF which even when current got almost no support directly in hardware, so folks had to resort to hardware-specific DOS TSR's. I'm not sure if there's anything newer than that.


GPU manufacturers stopped putting 2D functionality in their chips.


The basic display interface used by UEFI and low level boot loaders these days it's called GOP - Graphics Output Protocol. It replaced VESA.


Xrender is hardware accelerated and cairo uses Xrender as a backend. Why is Xrender not "viable"?


Xrender is hardware-accelerated on an increasingly small number of devices, and even SNA, the flagship hardware-accelerated implementation in the Intel driver fell back to software rasterization extremely frequently [0]. In practice it wasn't worth it, and it was extremely buggy, hence why it fell into disrepair.

The semantics of Xrender simply don't match with what modern GPUs give you, even ones with 2D pipelines.

[0] https://gitlab.freedesktop.org/search?search=sna_pixmap_move...


Honestly I think XRENDER could be a viable API--the core idea is similar to WebRender, which Firefox uses to great effect--but the existing implementations of it are not well-optimized implementations and issue tons of draw calls using obsolete OpenGL APIs. They are slower than just drawing on CPU. You would essentially need a complete rewrite.

The bigger issue is that there's little reason to farm vector graphics rendering out to the window server in the first place. The main reason would be to avoid a window blit on HiDPI displays. But the tradeoff is that the XRENDER API is all you get, and usually apps have more sophisticated needs than what it can provide. For instance, browsers can't really use XRENDER nowadays because there's no way to describe CSS 3D transforms in it. And if you use it you're at the mercy of the window server to implement it reasonably, which is not a safe assumption. (A lot of the reason Chrome on Linux was faster than Firefox in the early days is that Firefox used XRENDER, while Chrome rendered on CPU. I remember at least one engineer at Mozilla who was bitter about that, after putting in all the work to make Firefox use it only to have it be a net loss.) In any case, you can avoid the window blit by simply using scanout compositing, as detailed in my other reply, so there is really is no compelling reason to reinvent XRENDER.


Well there was once a hardware accelerated API for 2D drawing on Windows (DirectDraw), but it died in Windows Vista when desktop composition was added in. It was still supported for application use, but it was just emulating it.

But if there was an API for 2D acceleration that was actually supported (and could be used simultaneously with desktop composition), then it could be added in to something like SDL then suddenly applications would support it.


It's slow as hell, today you need to use WIneD3D's Ddraw.dll among the WineD3D loader in the same folder of your 2D game.


Was the complexity dedicated to backwards compat or just needlessly complex for reasons sane people will never understand?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: