> That's a savings of something like 10% over the whole dataset.
Sorry, I'm probably misunderstanding you - I'm not sure about that "10%", what does it it refer to?
Anyway, I do like a lot in Clickhouse to be able to chain codecs (I personally like to think about "Delta/DoubleDelta/Gorilla/T64 codecs" being "encodings" and the "general purpose codecs LZ4/LZ4HC/ZSTD" being "compression codecs").
I don't have a good math background => about Delta & DoubleDelta I liked this explanation ( https://altinity.com/blog/2019/7/new-encodings-to-improve-cl... ) which defines "delta" as tracking "distance" and "doubledelta" tracking "acceleration" between consecutive column values.
In the end I tested, using relatively small datasets (which were different between usecases), all combinations of encodings (delta/doubledelta/gorilla/t64) and ZSTD-compression levels (mostly 1/3/9) (I ignored LZ4*).
- "Delta"&"DoubleDelta" were often interesting (but in general for my data using "Delta"+ZSTD was already good enough compared to the rest).
- "Gorilla" somehow never gave me any benefits if compared to other codecs and/or compression algos.
- "T64" is a bit a mistery for me, anyway in some tests it delivered excellent results compared to the other combinations, therefore I'm currently using just T64 for some columns, and for some other columns as T64+ZSTD(9).
EDIT: sorry, I think I got it - you probably meant something like "just by doing that on that specific column, the overall storage needs were reduced by 10% for the whole dataset", right?
Sorry, I'm probably misunderstanding you - I'm not sure about that "10%", what does it it refer to?
Anyway, I do like a lot in Clickhouse to be able to chain codecs (I personally like to think about "Delta/DoubleDelta/Gorilla/T64 codecs" being "encodings" and the "general purpose codecs LZ4/LZ4HC/ZSTD" being "compression codecs").
I don't have a good math background => about Delta & DoubleDelta I liked this explanation ( https://altinity.com/blog/2019/7/new-encodings-to-improve-cl... ) which defines "delta" as tracking "distance" and "doubledelta" tracking "acceleration" between consecutive column values.
In the end I tested, using relatively small datasets (which were different between usecases), all combinations of encodings (delta/doubledelta/gorilla/t64) and ZSTD-compression levels (mostly 1/3/9) (I ignored LZ4*).
- "Delta"&"DoubleDelta" were often interesting (but in general for my data using "Delta"+ZSTD was already good enough compared to the rest).
- "Gorilla" somehow never gave me any benefits if compared to other codecs and/or compression algos.
- "T64" is a bit a mistery for me, anyway in some tests it delivered excellent results compared to the other combinations, therefore I'm currently using just T64 for some columns, and for some other columns as T64+ZSTD(9).
EDIT: sorry, I think I got it - you probably meant something like "just by doing that on that specific column, the overall storage needs were reduced by 10% for the whole dataset", right?