> In general I find a good way to approximate a top-of the line NVIDIA GPU is to think of it as an ~8 core with a vector length of 192 for single precision and 96 (half of full length) for double precision.
But that's not entirely correct either. Yes you can use it like that, but you can also use it as a single core with a vector length of 1536.
In the context of over-simplification these are better thought of as single-core processors. The reason being if you have method foo() that you need to run 10,0000 times, it doesn't matter if you use 1 thread, 2 threads, or 8 threads - the total time it will take to complete the work will be identical. This is very different from an 8-core CPU where using 8 threads will be 8x faster than using 1 thread (blah blah won't be perfectly linear etc, etc).
But that's not entirely correct either. Yes you can use it like that, but you can also use it as a single core with a vector length of 1536.
In the context of over-simplification these are better thought of as single-core processors. The reason being if you have method foo() that you need to run 10,0000 times, it doesn't matter if you use 1 thread, 2 threads, or 8 threads - the total time it will take to complete the work will be identical. This is very different from an 8-core CPU where using 8 threads will be 8x faster than using 1 thread (blah blah won't be perfectly linear etc, etc).