This is about 3D rendering, to be precise; I believe 2D acceleration goes throug...

pcwalton · on Dec 20, 2023

2D acceleration is generally done through the same APIs, specifically OpenGL and Vulkan. Classically, the X compositor would use the GLX_EXT_texture_from_pixmap extension to import an X pixmap representing a window surface into OpenGL, where it can be used like any other texture. For the Wayland compositor, I believe you'd use EGL_WL_bind_wayland_display to bind a Wayland surface to an EGLImage, and then glEGLImageTargetTexture2DOES (can't believe I have that function name memorized) to bind that EGLImage to an OpenGL texture, where it can be used in the same way. Vulkan has similar extensions.

On the client side, I think most Linux apps still draw their UIs on CPU, usually accelerated with SIMD. Firefox and Chrome (I think SkiaGL is enabled on Linux?) are exceptions; they use OpenGL and/or Vulkan to draw their UI. Video playback is a different beast and in theory relies on vendor-specific extensions to decode the video in hardware. However, the last time I looked at Linux video decoding (which was years ago), the drivers were awful and interfacing with each vendor's APIs was a huge pain, and so most apps just did video decoding on CPU. (Besides, the Linux ecosystem prefers open codecs, and hardware has only recently gotten support for non-patent-encumbered video formats.)

Denvercoder9 · on Dec 20, 2023

> However, the last time I looked at Linux video decoding (which was years ago), the drivers were awful and interfacing with each vendor's APIs was a huge pain, and so most apps just did video decoding on CPU.

Nowadays VA-API is near universally supported, and any half-decent video player uses it to do hardware decoding.

okanat · on Dec 20, 2023

For client side Qt has good GPU support but only for QML. All QML is drawn on GPU by default (expect text I think, which uses Harfbuzz) but all Qt Widgets are drawn on CPU. However things like KDE's Wayland uses direct OpenGL calls for faster composition.

Firefox has Web Render running on top of ANGLE which is a generic OpenGL layer that converts the OpenGL calls into native platform calls. ANGLE is a Google project and it is the base library for Skia which is used by Chromium to render everything. IIRC Qt / QML also uses ANGLE for Windows.

MBCook · on Dec 20, 2023

Why are toolkits still rendered on the CPU?

pcwalton · on Dec 20, 2023

It's a ton of effort to write a GPU vector renderer that's both compatible with existing apps and faster than the CPU. Switching to SkiaGL would probably be the easiest approach to migrate to GPU rendering, but Skia is notoriously difficult to use outside of Google's codebases. (The running joke being "the recommended way to build Skia is to get a job at Google, but there are some workarounds available if for some reason that isn't practical.")

flopsamjetsam · on Dec 20, 2023

I love this joke! We use Skia as a PDF renderer and it does take a bit of plumbing to get it in, plus you have to track it more often than we'd like (that's not a fault of the build environment, but rather it doesn't have a stable API), plus we have local mods.

cyberax · on Dec 20, 2023

High-quality text rendering on GPU is surprisingly tricky and inefficient, unless you're using something simple like a glyph cache.

spookie · on Dec 20, 2023

Does something like Signed Distance Fields help or is it just be an added value and not a complete different way of doing it?

pcwalton · on Dec 20, 2023

Only for certain fonts at certain sizes, and only if you have the SDF generated ahead of time, both of which mean that the technique isn't general enough to render arbitrary fonts in .otf format.

Rasterizing small glyphs is super fast anyway, there's not much of a need to accelerate it if you can just cache the glyph bitmaps.

ahartmetz · on Dec 20, 2023

QML uses it, but it doesn't look as good as FreeType with LCD subpixel rendering and light hinting... if that is what you like (it is what I like).

ladyanita22 · on Dec 20, 2023

I believe GTK4 uses the GPU by default

anthk · on Dec 20, 2023

I think Cairo used OpenGL too. And X.Org itself had stuff like Glamor, XRender...

pcwalton · on Dec 20, 2023

Cairo's OpenGL backends, such as glamor, never really made it out of the experimental phase and were rarely if ever used as far as I know.

Jasper_ · on Dec 20, 2023

The most viable approach these days is for 2D is to use the 3D hardware. There's no standard, usable API for 2D accelerated drawing the way there is for 3D, nor does it quite make sense for there to be one.

(No, OpenVG is not viable. No, Xrender is not viable. cairo and Skia both use the 3D hardware in combination with a CPU render engine.)

pcwalton · on Dec 20, 2023

For the most part that's true, but simple 2D compositing is a bit of a different beast, because it can sometimes be done at scanout time, saving a blit. Last I checked, (non-Android) Linux rarely makes use of this except for the mouse cursor. But in general you can save a good bit of energy and memory bandwidth on HiDPI displays if you try to use 2D hardware layers where you can. You can virtually never use them for the UI itself, because they're far too limited, but the windowing system can often use them to composite windows together. It'd be nice if Wayland compositors made more use of this, e.g. to avoid having to blit the foreground window every frame.

kimixa · on Dec 20, 2023

I used to work on mobile graphics and the android HWC stack.

The scanout-time hardware was often less useful that you might think - only in dynamic scenes where the GPU is otherwise idle (like playing video possibly with a static UI overlay was the premier use case).

For static scenes it's more efficient to render out to a buffer (using the GPU as the scanout overlay pipes often had limited feedback capability) and just output that using overlays disabled. It didn't take many frames for that to be worth it.

For apps that were animating or otherwise updating it's window, most UI toolkits used the GPU for widget rendering. And often the scanout pipes didn't hook into the (relatively large) system caches like the GPU did, so there were times it was again faster to composite the screen on the GPU to a single scanout buffer than flush already cached data, the get the scanout hardware to read it back from the memory bus.

And there weren't as cheap as people thought - one stat I remember was that the total area of the GPU on the omap4 platform was smaller than the display pipes. Though that is now a pretty old chip, and always had a bit of focus on "multimedia".

kllrnohj · on Dec 20, 2023

I think your information is quite outdated. The HWC overlay planes are heavily used, you can see this trivially just doing a 'dumpsys SurfaceFlinger' or grabbing a systrace/perfetto trace. When it falls back to GPU composition it's very obvious as there's a significant hit to latency and more GPU contention.

The overlay capabilities of the modern Snapdragons are also quite absurd. They support like upwards of a dozen overlays now and even have FP16 extended sRGB support. Some HWCs (like the one in the steam deck) even have per plane 3D LUTs for HDR tone mapping (ex https://github.com/ValveSoftware/gamescope/blob/master/src/d... )

The composition is bandwidth heavy of course, but for static scenes there's a cache after the HWC in the form of panel self refresh.

Jasper_ · on Dec 20, 2023

CRTC planes and scanout-time compositing makes sense, and Wayland compositors do use them, even for non-cursor surfaces. It's simply not something an application can use general-purpose and guarantee (though see the recent GtkSurfaceOffload stuff for the latest attempt at it).

Personally, I don't see it as a "2D drawing API", it doesn't accelerate anything special about 2D, only blits and transforms, which a 3D API will eat for breakfast.

Zardoz84 · on Dec 20, 2023

what happened with these VESA "2d accelerated" API that was on every SVGA card in the middle of the 90s ? They make a huge difference and was well supported on Windows and X11

pcwalton · on Dec 20, 2023

That stuff has been obsolete for quite a while as the general 3D capabilities are more than enough to saturate all the GPU's memory bandwidth.

zozbot234 · on Dec 20, 2023

If it's a VESA standard and still supported it might be useful as a fallback for hardware that doesn't have its own driver.

Edit: But actually, I couldn't find references to anything similar besides VBE/AF which even when current got almost no support directly in hardware, so folks had to resort to hardware-specific DOS TSR's. I'm not sure if there's anything newer than that.

rjsw · on Dec 20, 2023

GPU manufacturers stopped putting 2D functionality in their chips.

soundarana · on Dec 20, 2023

The basic display interface used by UEFI and low level boot loaders these days it's called GOP - Graphics Output Protocol. It replaced VESA.

sprash · on Dec 20, 2023

Xrender is hardware accelerated and cairo uses Xrender as a backend. Why is Xrender not "viable"?

Jasper_ · on Dec 20, 2023

Xrender is hardware-accelerated on an increasingly small number of devices, and even SNA, the flagship hardware-accelerated implementation in the Intel driver fell back to software rasterization extremely frequently [0]. In practice it wasn't worth it, and it was extremely buggy, hence why it fell into disrepair.

The semantics of Xrender simply don't match with what modern GPUs give you, even ones with 2D pipelines.

[0] https://gitlab.freedesktop.org/search?search=sna_pixmap_move...

pcwalton · on Dec 20, 2023

Honestly I think XRENDER could be a viable API--the core idea is similar to WebRender, which Firefox uses to great effect--but the existing implementations of it are not well-optimized implementations and issue tons of draw calls using obsolete OpenGL APIs. They are slower than just drawing on CPU. You would essentially need a complete rewrite.

The bigger issue is that there's little reason to farm vector graphics rendering out to the window server in the first place. The main reason would be to avoid a window blit on HiDPI displays. But the tradeoff is that the XRENDER API is all you get, and usually apps have more sophisticated needs than what it can provide. For instance, browsers can't really use XRENDER nowadays because there's no way to describe CSS 3D transforms in it. And if you use it you're at the mercy of the window server to implement it reasonably, which is not a safe assumption. (A lot of the reason Chrome on Linux was faster than Firefox in the early days is that Firefox used XRENDER, while Chrome rendered on CPU. I remember at least one engineer at Mozilla who was bitter about that, after putting in all the work to make Firefox use it only to have it be a net loss.) In any case, you can avoid the window blit by simply using scanout compositing, as detailed in my other reply, so there is really is no compelling reason to reinvent XRENDER.

Dwedit · on Dec 20, 2023

Well there was once a hardware accelerated API for 2D drawing on Windows (DirectDraw), but it died in Windows Vista when desktop composition was added in. It was still supported for application use, but it was just emulating it.

But if there was an API for 2D acceleration that was actually supported (and could be used simultaneously with desktop composition), then it could be added in to something like SDL then suddenly applications would support it.

anthk · on Dec 20, 2023

It's slow as hell, today you need to use WIneD3D's Ddraw.dll among the WineD3D loader in the same folder of your 2D game.

sonicanatidae · on Dec 20, 2023

Was the complexity dedicated to backwards compat or just needlessly complex for reasons sane people will never understand?