Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Really interesting.

I was trying to implement a compression algorithm selection heuristic in some file format code I am developing. I found this to be too hard for me to reason about so basically gave up on it.

Feels like this blog post is getting there but there could be a more detailed sets of equations that actually calculate this based on some other parameters.

Having the code completely flexible and doing a full load production test with desired parameters to find the best tuning is an option but is also very difficult.

Also read this previously which I find similar.

https://rocksdb.org/blog/2021/12/29/ribbon-filter.html

 help



the compression algorithm you select for your data is quite dependent on the dataset you have. the equations in this blog post don't help you choose which compression to use, but rather "how much" and when to compress. I would be curious to formalize the math for different compression algorithms though... might be a good follow up post!

I was calculating timings and compression ratio for each array with each algorithm. Then I would save the “best” one to use for next chunks of data.

But it is hard to decide how to judge the cpu vs disk/network tradeoff like you explain in the article.

I was a bit curios if I could make an API so on the top level user enters some parameters and the system can adjust this calculation according to that.

But had some issues with this because the hardware budget used by all parts of the system, not only by the compression code.

As an example network is mega fast in data center but can be slow and expensive when connecting to a user. The application can know which case it is executing but it is hard to connect that part of the code into the compression selection stuff cleanly.

Also on network case. It might make sense to keep data large but cpu time low until I hit the limit but nothing matters when I hit the limit.

Would be cool to have a mathematical framework to put some numbers in and be able to reason about the whole picture




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: