Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

doesn't this read the entire input into memory? `uniq` and `comm` don't (need to) do this, so they can work on inputs bigger than available memory.


Yes in the `setop` current implementation, because this enables the script to be fast, short, and work with inputs without needing to call `sort`.

For comparison, a typical POSIX `uniq` implementation reads the input and solely compares two adjacent lines; this requires the input to be presorted.

An interesting upgrade could be to add a `setop` option flag that tells the script the inputs are already sorted and/or deduped. This can achieve the memory savings you're describing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: