I once converted a simulation into cython from plain old python.
Because it fit in the CPU cache the speedup was around 10000x on a single machine (numerical simulations, amirite?).
Because it was so much faster all the code required to split it up between a bunch of servers in a map reduce job could be deleted, since it only needed a couple cores on a single machine for a ms or three.
Because it wasn't a map-reduce job, I could take it out of the worker queue and just handle it on the fly during the web request.
Sometimes it's worth it to just step back and experiment a bit.
Yeah, back when I was in gamedev land and multi-cores started coming on the scene it was "Multithread ALL THE THINGS". Shortly there after people realized how nasty cache invalidation is when two cores are contending over one line. So you can have the same issue show up even in a single machine scenario.
Good understanding of data access patterns and the right algorithm go a long way in both spaces as well.
Even earlier, when SMP was hitting the server room but still far from the desktop, there was a similar phenomenon of breaking everything down to use ever finer-grain locks ... until the locking overhead (and errors) outweighed any benefit from parallelism. Over time, people learned to think about expected levels of parallelism, contention, etc. and "right size" their locks accordingly.
Computing history's not a circle, but it's damn sure a spiral.
> Computing history's not a circle, but it's damn sure a spiral.
I think that is my new favorite phrase about computing history. Everything old is new again. There's way too much stuff we can extract from past for current problems. It's kind of amazing.
I wish I could take credit for it. I know I got it from someone else, could have been lots of people, but Fun Fact: one famous proponent of the idea (though obviously not in a computing context) was Lenin.
Lenins views on this comes directly from Marx - it's the concept of "dialectical materialism" (though Marx did not use that term) - and Hegel. Specifically Marx noted in Capital that (paraphrased) capitalist property is the first negation (the antithesis) of feudalism (the thesis), but that capitalist production necessarily leads to its own negation in the form of communism - the negation of the negation (the synthesis) - basically applying Hegelian dialectics to "the real world".
In that example the idea is that feudalism represents a form of "community property", which though owned by a feudal lord in the final instance is in practice shared. Capitalism then "negates" that by making the property purely private, before communism negates that again and reverts to shared property but with different specific characteristics.
The three key principles of Hegels dialectics comes from Heraclitus (the idea of inherent conflict within a system pulling it apart), Aristotle (the paradox of the heap; the idea that quantitative changes eventually lead to qualitative change), and Hegel himself (the "negation of the negation" that was popularised by Marx in Capital; the idea that a driven by inherent conflict, qualitative changes will first change a system substantially, before reversing much of the nature of the initial change, but with specific qualitative differences).
The idea is known to have been simultaneously arrived at by others in 19th century too, at least (e.g. at least one correspondent of Marx' came up with it independently), and it's quite possible variations of it significantly predates both Marx and Hegel as well.
I usually think of it more as a three-dimensional spiral like a spring or a spiral staircase. Technically that's a helix, but "history is a helix" just doesn't sound as good for some reason.
What kind of games? I always thought that e.g. network-synced simulation code in RTSes or MMOs would be extremely amenable to multithreading, since you could just treat it like a cellular automata: slicing up the board into tiles, assigning each tile to a NUMA node, and having the simulation-tick algorithm output what units or particles have traversed into neighbouring tiles during the tick, such that they'll be pushed as messages to that neighbouring tile's NUMA node before the next tick.
(Obviously this wouldn't work for FPSes, though, since hitscan weapons mess the independence-of-tiles logic all up.)
This was back in the X360/PS3 era and mostly FPS/Third-person(think unreal engine 3 and the like).
What's really ironic is the PS3 actually had the right architecture for this, the SPUs only had 256kb of directly addressable memory so you had to DMA everything in/out which forced you to think about memory locality, vectorization, cache contention, etc. However X360 hit first so everyone just wrote for its giant unified memory architecture and then got brutalized when they tried to port to PS3.
What's even funnier in hindsight is all that work you'd do to get the SPUs to hum along smoothly translated to better cache usage on the X360. Engines that went <prev gen> -> PS3 -> X360 ran somewhere between 2-5x faster than the same engine that went <prev gen> -> X360 -> PS3. We won't even talk about how bad the PC -> PS3 ports went :).
Hadoop has its time and place. I love using hive and watching the consumed CPU counter tick up. When I get our cluster to myself and it steps 1hr of CPU every second it's quite a sight to see.
Yeah, I got a bit of a shock when I first used our one and my fifteen minute query took two months of CPU time. Thought maybe I'd done something wrong until I was assured that was quite normal.
Because it fit in the CPU cache the speedup was around 10000x on a single machine (numerical simulations, amirite?).
Because it was so much faster all the code required to split it up between a bunch of servers in a map reduce job could be deleted, since it only needed a couple cores on a single machine for a ms or three.
Because it wasn't a map-reduce job, I could take it out of the worker queue and just handle it on the fly during the web request.
Sometimes it's worth it to just step back and experiment a bit.