It's basically byte or bit shuffling filter (very fast SIMD optimized) in front ...

It's basically byte or bit shuffling filter (very fast SIMD optimized) in front of several modern compressors (lz4, zstd, their own) with self describing header. So if you have an array of 100 8-byte values, the result of shuffling is 100 1st bytes, followed by 100 of 2nd bytes and so on.

It shines when values are of fixed size with lots of similar bits, e.g. positive integers of the same magnitude. It's not so good for doubles, where bits change a lot. Also, if stroring diffs it helps to take a diff from initial value in a chunk, not previous value, so that deltas change sign less often (and most bits flipped).

From own usage case, for the same data, C# decimal (16 bytes struct) is compressed much better than doubles (final absolute blob size), while decimal is taking 2x more memory uncompressed.

If data items have little similar bits/bytes then it's underlying compressor that matters.