This is cool, but missing a LOT of details between steps 4 and 5, which is the meat of the quicksort. Actually, the first and last elements of step 4 would be swapped, which means the order depicted in step 5 is incorrect.
I'd guess if you care more about speed than memory it might be faster to just move elements into new array - sequence through old array appending to start/end of new array according to pivot comparison. You'd be moving every element vs leaving some in place with a swap approach, but the simplicity of the code & branch prediction might win out.
I'm pretty sure the swapping is a fundamental part of the quicksort algorithm, not a mere implementation detail. That's the reason quicksort is an in-place algorithm.
Actually you're right, it is an implementation detail. The original isn’t mistaken, it’s just showing the lo-to-hi partitioning pass rather than the from-both-ends version I had in mind when I implemented quicksort before.
shame, shame, I should have double-checked before posting.