Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DirectX 11 vs. DirectX 12 oversimplified (littletinyfrogs.com)
112 points by snake_case on Feb 2, 2015 | hide | past | favorite | 85 comments


It really is oversimplified.

"Creating dozens of light sources simultaneously on screen at once is basically not doable unless you have Mantle or DirectX 12. Guess how many light sources most engines support right now? 20? 10? Try 4. Four. Which is fine for a relatively static scene. "

For my Masters degree project at uni I had a demo written in OpenGL with over 500 dynamic lights, running at 60fps on a GTX580. Without Mantle, or DX12. How? Deffered rendering, that's how. You could probably add a couple thousand and it would be fine too.

"Every time I hear someone say “but X allows you to get close to the hardware” I want to shake them. None of this has to do with getting close to the hardware. It’s all about the cores"

Also not true. I work with console devkits every single day and the reason why we can squeeze so much performance out of relatively low-end hardware is that we get to make calls which you can't make on PC. A DirectX call to switch a texture takes a few thousand clock cycles. A low-level hardware call available on Playstation Platform will do the same texture switch in few dozen instruction calls. The numbers are against DirectX, and that's why Microsoft is slowly letting devs access the GPU on the Xbox One without the DirectX overhead.


100% agree. Article seems very misinformed.

> For my Masters degree project at uni I had a demo written in OpenGL with over 500 dynamic lights, running at 60fps on a GTX580. Without Mantle, or DX12. How? Deffered rendering, that's how.

Indeed. For fixed function forward rendering you do have those limitations, though. However it's nowhere as low as 4: you can expect at least 8 and most often 16 lights. The catch is that DirectX 12 will do nothing for that limitation.

At the end of the day all that this comes down to is decreasing the cost of one thing:

    Draw()
Which has immense amounts of CPU overhead due to abstractions. Anything else DX12 might do is really just a bonus.


This type of direct access is fine on a console, but DirectX is primarily designed as an abstraction layer on top of arbitrary hardware. PC hardware isn't as predictable as the Xbox One, and yeah, it adds clock cycles. On PC you don't want to get that deep into the hardware: it has the potential to break compatibility, and because dev teams generally don't have the ability to test 5,000 different combinations of CPUs, GPUs and drivers, you try to keep as much compatibility as you can.


What he really seem to mean by "real lights" are actual shadow casting lights. I doubt you were rendering 500 shadow maps along with your 500 lights.


His examples don't really fit that though. For small ranged lights (lightsaber) or lights that are relatively short lived and intense (explosions), you can totally get away with not casting shadows. Also I don't see how threads would really help with rendering more shadows, other than providing a generalized speedup. The cost of the shadow map is really more in the GPU shaders than any sort of CPU calculation.

For stencil shaders threads might help, since the CPU has to calculate and upload them frequently (unless you're doing stencils in a geometry shader). Stencil shadows are pretty niche though, you only use them when you need pixel-perfect precision. Shadow maps are vastly more popular.


>you can totally get away with not casting shadows.

His entire point is that you CAN do this, but it all adds up to making it not look real.


And yet none of what he talks about does anything to help that. I'm not against oversimplifying, but what he's saying is entirely bogus, shadows being expensive has nothing to do with how many threads the CPU can issue commands with.

He's setting up this notion that the problem with graphics is a lack of threading, which is ridiculous. Graphics code is incredibly parallel, and his assertion that graphics work spends most of it's time constrained by waiting for the CPU just does not add up to me. Except in poorly written systems, I just haven't seen this be the case often at all.


Even without deferred rendering, why the limitation? I am sure there is a limit in the standart pipeline but can't I just pass as much as light info to shaders now, the result is just slightly longer shaders


There is no limitation per se,but with per-pixel lighting(and pretty much all lighting is done per-pixel nowadays) you would have to run your shader program exactly 2073600 times for a FullHD resolution....and that's every frame. So 60 times a second. Now imagine doing the lighting calculations for one light source....not too complex, few multiplications and that's all....but if you have several lights....then it's just way too much for any graphics card. A "slightly longer" program that has to run over 2 million times a frame is not really acceptable.


I think he meant that most shaders only receive four light sources. However I am not a low level graphics guy but my understanding is that deferred or not most shaders will only receive four light sources.


This hasn't been true for a long time and DX12 doesn't really change anything here. DX11 was already able to handle very large numbers of constants. The limit on how many lights you support in a shader is largely a factor of controlling the shader cost rather than one of constant space limits these days. Deferred renderers are very popular these days as well and don't really handle lights in the same way anyway.

DX12 does offer some potential CPU side performance benefits when it comes to updating large numbers of constants efficiently which may well help performance when dealing with lots of dynamic lights but it's not adding any new capabilities beyond what DX11 offers.


a "light source" is nothing else than a collection of variables when passed in to the shader. Vector3 for the position, a float for the intensity, Vector3 for the colour and so on. Each one of them can be passed in as a uniform - with OpenGL 3.0 you should be able to pass in at the very least 1024 uniforms, but in most implementations the limit is much higher. So if you were only passing in the position of the light, you should be able to give at least 1024 lights to any shader.


Is this a direct effect of the design of PC hardware, or could we theoretically build a PC OS that would let you do a texture switch as efficiently as on PS/XBOX/etc?


>For my Masters degree project at uni I had a demo written in OpenGL with over 500 dynamic lights,

I don’t know what kind of 3D engine you wrote, but as a FPS gamer I have quite a bit of experience with 3D engines. From Doom 3 to Alan Wake, some of the worst performance hits occur in scenes with heavy use of dynamic lights. Did OpenGL 4/DX11 fix this?

Which of the modern APIs have you actually used? How do Mantle, DX12, Metal, and OpenGl Next compare?

EDIT: "as a non-3D graphics programmer, who's tried to build FPS levels for fun and familiar with all the modern engines". This article is for those of us with interest but not experts in the field, right?


No, shaders and deferred rendering fixed this. Shaders mean you could design anything you imagined and not use the old deprecated fixed function OpenGL that only supported 4 lights.

Someone figured out deferred rendering which was a new technique that allows lots of lights. Lots of games use it. I believe one of the first was Killzone

http://www.slideshare.net/guerrillagames/the-rendering-techn...

Here's a live demo of deferred rendering

http://threejs.org/examples/webgldeferred_pointlights.html

It's using only OpenGL 2.1 features (which is all that's needed to emulate OpenGL ES 2.0 which WebGL is based on). To do deferred rendering efficiently all you really need is support for multiple render targets.


Pure OpenGL 4.0. It allowed me to do Tessellation and use Deferred rendering. This is the key - rendering the lights in a separate pass is crucial to performance, and like I've said - it allows you to have thousands of dynamic lights in the same scene. Neither Doom 3 nor Alan Wake could use this technique. I mostly work on consoles nowadays, so I use none of these APIs. On PS3/PS4 you have to do everything manually, instead of using nice OpenGL API to bind and send vertex buffer object,you have to allocate memory for it yourself and copy it over manually to the address that you want. That's where the speed lies. I have done some Xbox One programming but that's mostly regular DirectX at the moment, I haven't had a chance to play with the DX12 stuff.

There is a quite a good explanation on how deferred rendering works: http://gamedevelopment.tutsplus.com/articles/forward-renderi...


Great link and thanks for the explanation. So by using a deferred rendering algorithm you can reduces the complexity O(m+n), (m=number of surfaces, n=number of lights) where the forward lighting path renderers have complexity O(m*n). The trade-off seems higher memory usage, not working well on older hardware and with anti-aliasing and transparent objects explains why many modern engines like Unreal 3 and IW engine don't utilize it.


In case you're curious about the downvotes, it's probably because anyone who has ever done engine programming will laugh about your assertion that "as a FPS gamer" you know anything relevant about graphics rendering and technology.

It's like somebody claiming they can comment meaningfully on light bulb manufacturing standards because they've seen the lighting in a bunch of made-for-TV specials.


"As an frequent airplane passenger, I am qualified to both build build and pilot large aircraft."


Aren't you mixing forward shading and deferred shading? http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09...


"Last Fall, Nvidia released the Geforce GTX 970. It has 5.2 BILLION transistors on it. It already supports DirectX 12. Right now. It has thousands of cores in it. And with DirectX 11, I can talk to exactly 1 of them at a time."

That's not how it works, the app developer has no control over individual GPU cores (even in DX12). At the API level you can say "draw this triangle" and the GPU itself splits the work across multiple GPU cores.


Moreover the actual GPU doesn't work like that either. GPUs do not have the capability to run more than one work-unit-thing at a time. They have thousands of cores, yes, but much more in a SIMD-style fashion than in a bunch of parallel threads. They cannot split those cores up into logical chunks that can then individually do independent things.

The whole post isn't just oversimplified, it's just wrong. Across the board wrong wrong wrong. The point of Mantle, of Metal, and of DX12 is to expose more of the low level guts. The key thing is that those low level guts aren't that low level. The threading improvements come because you can build the GPU objects on different threads, not because you can talk to a bunch of GPU cores from different threads.

The majority of CPU time these days in OpenGL/DirectX is in validating and building state objects. DX12 and others now lets you take lifecycle control of those objects. Re-use them across frames, build them on multiple threads, etc... Then talking to the GPU is a simple matter of handing over an already-validated, immutable object to the GPU. Which is fast. Very fast.


Yeah, I'd have to agree it's hard to describe this post in more generous terms than just flat out wrong. DX12 is making it more efficient to spread CPU side rendering work across multiple cores but it's not about letting individual CPU cores talk to individual GPU cores. That isn't even really a coherent concept.

The whole digression on lighting is mostly just wrong too. Deferred renderers have been rendering with 100s of dynamic lights for years. DX12 may make it a bit more efficient to deal with the large amount of constant data that needs to be updated when dealing with 100s of dynamic lights but it isn't introudcing any fundamental changes to dynamic lighting.


Most GPUs group together individual ALUs into larger units (sometimes called compute units or clusters or SMs) and each compute unit can run independent work.

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-re...


I believe that is true but I haven't seen this exposed via DX. Does dx12 expose this functionality? Does cuda or OpenGL expose this type of xontrol.?


You can have multiple CUDA kernels running simultaneously on the same GPU, but you have no direct control over which SM(X) core is assigned to which kernel AFAIK. So it actually works pretty similarly to multi threaded programming on CPU if you take away thread affinity. In general I find a good way to approximate a top-of the line NVIDIA GPU is to think of it as an ~8 core with a vector length of 192 for single precision and 96 (half of full length) for double precision. It has a high memory bandwidth which has the limitation of requiring memory accesses to 32 neighbours simultaneously in order to make full use of the performance. The CUDA programming model is particularly set up in a way that the programmer doesn't have to handle this manually - (s)he just needs to be aware of it. I.e. you program everything scalar, map the thread indices to your data accesses and make sure that the first thread index (x) maps to the fastest varying index of your data.


> In general I find a good way to approximate a top-of the line NVIDIA GPU is to think of it as an ~8 core with a vector length of 192 for single precision and 96 (half of full length) for double precision.

But that's not entirely correct either. Yes you can use it like that, but you can also use it as a single core with a vector length of 1536.

In the context of over-simplification these are better thought of as single-core processors. The reason being if you have method foo() that you need to run 10,0000 times, it doesn't matter if you use 1 thread, 2 threads, or 8 threads - the total time it will take to complete the work will be identical. This is very different from an 8-core CPU where using 8 threads will be 8x faster than using 1 thread (blah blah won't be perfectly linear etc, etc).


No, this is something the GPU front end controls.


You still have to be aware of it when optimizing the shaders and workloads though. On consoles where the hardware is fixed this is easily profiled.

The GPU is creating threads and tasks internally and it's not always easy to balance this workload so no parts of the GPU becomes saturated while following parts in the chip's pipeline are idly waiting for work.

The PowerVR chips we're working with have dozens and dozens of different profile metrics corresponding to the different areas of its pipeline, each one being a potential bottleneck.

You could do something as silly as render a ball with 12k vertices instead of 24 and expecting the vertex processing to be much slower, but after profiling you find out its the fragment part lagging way behind because the data sequencer is overloaded trying to generate fragment tasks. In both cases you're rendering about the same amount of pixels.

With unified shader architectures, its very frequent for vertex and fragment tasks from different draw calls to overlap simultaneously. We're even seeing tasks from different render targets overlapping! Such as fragment tasks from the shadow pass still running when the solid geometry pass is processing its vertices.


This is fascinating. I would love to know more. You should write an article about advanced optimization for mobile GPUs or something.


>build them on multiple threads

That was the point of the blog post. Maybe described wrong, but that seems to be the point. The possibility of uploading stuff to the GPU on multiple threads on both CPU side, and the ability of the GPU to store those uploads in parallel. Maybe even render shadow-maps in parallel?

I wonder is there already the OpenGL equivalent of this? One of the main hurdles of OGL was issuing all the calls from the main thread, if that is gone now, that would be awesome.


Uploads are already done asynchronously, that's driver optimization 101 level stuff. It's also one of the very few operations that has an independent core to handle it (the copy engine)

Rendering shadow-maps in parallel would be pointless. If you render 2 at the same time, then each map gets half the GPU so an individual render takes twice as long, resulting in the same total time as if you gave each render 100% of the GPU and rendered in sequence.

> I wonder is there already the OpenGL equivalent of this? One of the main hurdles of OGL was issuing all the calls from the main thread, if that is gone now, that would be awesome.

Yes, NV_command_list extension:

http://www.slideshare.net/tlorach/opengl-nvidia-commandlista...


It isn't possible to render shadow maps in parallel in dx12 I believe.


> Cloud computing is, ironically, going to be the biggest beneficiary of DirectX 12. That sounds unintuitive but the fact is, there’s nothing stopping a DirectX 12 enabled machine from fully running VMs on these video cards. Ask your IT manager which they’d rather do? Pop in a new video card or replace the whole box.

Can someone please comment on that? Everything I know about GPUs suggests that even if it would be possible to run VMs on them, performance would be terrible. All those tiny GPU cores are not designed to do much branch prediction, all the fancy out-of-order execution, pre-fetching etc. and your average software will totally depend on that.


If you managed to run a VM on a GPU it would perform like ass.

GPUs are incredibly slow at doing the sort of work that a CPU does. If they were fast at doing general purpose work, then CPUs would be doing the same things.

GPUs are extremely fickle with memory access patterns, branches, etc... It takes a lot of care to get them to run fast on a given workload, and that workload better be identically parallel (as in, every "thread" takes the same branches, with memory access in a uniform offset+thread-id order, etc..).


I believe the author is referring to accessing the GPUs from VMs. i.e. if you have 16 core machine with a 1024 core GPU you can split it up so it looks like 4 machines with 4 cores and 256 core GPU.

The sentence that you snipped seems to refer to this. "Right now, this isn’t doable because cloud services don’t even have video cards in them typically (I’m looking at you Azure. I can’t use you for offloading Metamaps!)"


It's already possible to partition GPUs to share them across multiple VMs. Intel's got modified versions of Xen and KVM to support this for Haswell and later IGPs, and AMD and NVidia both have proprietary solutions.

Furthermore, this has nothing to do with rendering APIs; it's all in the driver and hypervisor.


I have no doubt. I was just trying to clarify what the original author may have been intending to say.


Yes, running the VM would be silly, but machine learning etc. is awesome. We get acces and time sharing across resources much easier (at all) now. At Graphistry, we are using AWS G2 instances for previously impossibly big interactive data visualizations and streaming into stock browsers. It's pretty ridiculously cool :) Always happy to chat about GPU clouds and related fun things.


Perhaps, but that's not what the author said:

"there’s nothing stopping a DirectX 12 enabled machine from fully running VMs on these video cards."


Modern GPUs have 32-64 independent cores, each can run different code path. Then each core runs the same code path on several different inputs in parallel. It's probably not possible with current tools, but architecture-wise completely dividing the cores between VMs should be doable.


I know they have different cores, but, as I pointed out, these cores are very slow for most code that would normally run on a CPU. That is precisely the reason why so many cores fit - a CPU uses the available space to get around the memory wall.

muizelaar's comment pointed out that the author probably wanted to say that the GPU can't be reasonably accessed _from_ VMs, yet. That makes much more sense.


The GPU on the card (Geforce GTX 970) mentioned in the article has 13 independent SMM cores. Each of them has 128 marketing "cores", for a total of 1664.


How relevant is DirectX11/12 these days now that OpenGL is so successful thanks to mobile and web?


Very much.

But OpenGL is also targeting and benefiting from these advances. Also, since the article is mindboggingly dumb, watch this: http://gdcvault.com/play/1020791/ (It might be that the DX12 announcement the author read was the source of misinformation, but anyway, this submission should have been flagged hours ago.)


The pc game space is still pretty much pure DX. Same for most of the hardcore Dev stuff.

Ogl is busy winning some ground but on the level the article is talking about... Slow gains.


No games console has first class support for OpenGL.

EDIT: Rephrased it better.


You know that thw biggest esport game (as in prize money) runs fine with openGL on Linux?

With valve developing source2, having ported almost all source games, and other AAA titles like Borderlands with first class Linux support, openGL it's very alive in gaming.


The keyword was consoles.


I believe most PC games are written against DirectX and XBox games obviously as well.


OpenglES 2 (which is what mobile runs) and WebGL are DirectX 9 equivalent. And on windows at least chrome implements WebGL on top of Directx and not OpenGl substrate https://code.google.com/p/angleproject/

With feature parity latest DirectX can be compared only to desktop OpenGL (although OpenGL ES 3.0 is up and coming)


Firefox and IE both implement WebGL using DirectX too. It's a sad state of affairs when OpenGL drivers are so flaky that it's less trouble to write an elaborate cross-API abstraction layer and shader translator, than to just use OpenGL directly with a nearly 1:1 mapping.


I too would like to know the relevance. From a wiki article [1]

In general, Direct3D is designed to virtualize 3D hardware interfaces. Direct3D frees the game programmer from accommodating the graphics hardware. OpenGL, on the other hand, is designed to be a 3D hardware-accelerated rendering system that may be emulated in software. These two APIs are fundamentally designed under two separate modes of thought.

[1] http://en.wikipedia.org/wiki/Comparison_of_OpenGL_and_Direct...


You can run Direct3D in software, Microsoft even provides a software implementation of DX11 (albeit the subset that works on DX10.1 hardware).

https://msdn.microsoft.com/en-us/library/windows/desktop/gg6...


That quote is referring to early versions from over ten years ago. If you read the end of the section you linked, you'll find a description of how they provide basically the same functionality. While one may provide a convience method over the other, any functionality in one generally finds its way to the other.


It's relevant to pretty much all PC gaming. Not everyone is enamored by Candy Crush and its ilk.


"There is no doubt in my mind that support for Mantle/DirectX12/xxxx will be rapid because the benefits are both obvious and easy to explain, even to non-technical people."

Except that this won't only depend on the programmers, but on the management too, and seeing what those guys do at EA, UbiSoft, Activison, etc, I am not sure it will be properly utilized for the first wave of Dx12 games.


EA's "big" engine which is now Frostbite from Dice already has Mantle support, it will likely be able to implement the DX12 side of it pretty quickly.


and here I am, barely managing to find a decent, simple 3D C++ rendering library to run on a mac with XCode.

Edit: 3D

It's pretty crazy how weird it is that there's no dead simple 3D rendering library that supports just bare polygons. When I look at an opengl tutorial I just flee when I see how much boiler plate code is necessary to make the mouse move a camera.



I meant 3D


SDL supports OpenGL. So you can have SDL handle all the boring stuff with window handles, keyboard and mouse movement, system events etc, and you are given an empty OpenGL context so you can draw literally anything you want in it. It doesn't get much simpler than that.


nah, still need to use opengl when using SDL, to make a projection matrix, VBO, all those fun things.


"a decent, simple 3D C++ rendering library " - 3D rendering doesn't really get any more simple than a blank OpenGL context window. If you want "simple" as in "easy to use" then Unity/Unreal Engine are both very easy to use.


Those are massive engines, with plenty of features I don't need. I can't rely on those engines because they're just made to make full fledged games, and often they force many paradigm or methods of working I don't really like. Not to mention the freedom thing, and how those engines are supported.

> 3D rendering doesn't really get any more simple than a blank OpenGL context window.

OpenGL is pretty low level. It's a bare standard bridge to make use of a GPU. What I mean is that there are either very high level engines, or a very low level graphics API, which requires a lot of work if you want to make anything decent with it. There are things like glfw glew, but nothing like a very simple to use 3D rendering library, that just does the bare minimum, like a camera, quaternions, some text rendering, inputs, offers some simpler access to what opengl has to offer, etc.

3D engines are always being made obsolete. Except irrlicht and ogre3d, which are still fat enough in my opinion, there are no general-purpose, light 3D renderer, that are not necessarily trying to do it all. This kind of engine might serve as a thin wrapper to opengl to just avoid the work with the high quantity of opengl calls.


I've used libgdx[1] before, it's still fully supported and is being developed further. It is still fairly low level,but it has just enough things in it to make things like draw calls, vbos and shader management easier. That being said, I still ended up writing my own shaders for lighting and shadows, as far as I know there isn't anything in between libraries like this and huge engines like Unity/UE.

[1] http://libgdx.badlogicgames.com/


java ? no thanks


If you are willing to go objective-c (or objective-C++):

https://developer.apple.com/library/mac/documentation/3DDraw...


I'd recommend bgfx (https://github.com/bkaradzic/bgfx), if you want something simple and cross platform. In many ways it reminds me of a 3D SFML.


Thanks for the suggestion, I'll try it, but SFML works right out of the box though.

The bgfx examples seem to contain a lot of boiler plate code... It seems to be an impressive engine, it's recent so everything is up to date with recent graphics API. Pretty cool.

I'm more about making something simple and quick without the bells and whistles.


Not a hardware guy so don't know a ton about this, the oversimplification was a bit necessary for me. With that, I'm very excited for DX12, is it available for download on Windows7-64 yet?


It's about redesigning the current idiotic graphics pipeline API, so the driver and the application doesn't have to use the inefficient abstractions, that are already outdated on both ends. (The GPU is much more complex and sophisticated than what it used to be. It's not just a fancy framebuffer with a few megabytes of RAM so you can do z-culling and a 3D to 2D projection. And nowadays with virtual texturing and megatextures, with the bottleneck being the latency of pushing these abstract objects to the GPU, which are getting replaced by in-driver compiled shader programs anyway, the whole thing is ripe for a bit of fundamental change.)

Take some time to watch these. I'm also not a hardware guy, but these problems are far-far from the hardware, and the solutions are very educational from a distributed systems standpoint.

http://gdcvault.com/play/1020791/

http://www.slideshare.net/tlorach/opengl-nvidia-commandlista...


There's a private preview for big name developers (Unity already has DX12 implemented) but it's not going to be public until Windows 10 launches.

Also, it's going to be exclusive to Windows 10. Not the end of the world since 7 and 8 owners get a free upgrade.


No, and it won't be.


With the performance gains of Mantle and DirectX 12, where does OpenGL exactly fall within all of this?


One step behind as usual sadly. Not far behind...but behind. And in a two horse race that's a difficult pitch...


A next generation API comparable to Mantle and D3D12 is being developed inside the Khronos Group: https://www.khronos.org/news/press/khronos-group-announces-k...


Everyone is hoping that it doesn't turn out into another Longs Peak.


Do you need to pay royalties to Microsoft in order to use DX12/DX11 in a commercial game?


Soft of, it only runs on Windows.


No


nice

oh wait it only runs on "windows"


Everybody seems to be really critical about the article. But he does say that it is an "extreme oversimplification". And he is a video game developer. Maybe he is mostly working as the CEO at Stardocks now (maybe not) but I have difficulties to think that it is just "wrong". I felt the criticisms were unfairs. But I'm a web developper so maybe I just miss it completely.


An extreme oversimplification can be expected to get the technical details wrong but the conclusion right. This article gets the high-level consequences wrong too, and claims that there will be benefits to things that are completely unrelated.


I am a 3d dev and this article makes my head hurt. It has a bit of right and a lot of wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: