\* If it's simple transforms, use cli tools. \* If it requires aggregation and i...

hedora · on March 4, 2019

Before reaching for spark, etc:

Sort is good for aggregations that fit on disk (TBs these days, I guess)

Perl does well too if the output fits in a hashtable in DRAM, so 10’s (or maybe 100’s?) of GBs

jandrese · on March 3, 2019

For bzip2 why not just use pbzip2? Frankly, I wish distros would replace the stock bzip2 with pbzip2 (I think it's drop in compatible).