I've been a pipeline junkie for a long time, but i've only recently started to g...

gcmeplz · on March 3, 2019

  awk '{ if (!($2 in seen)) print $0; seen[$2] = 1; }'

You can even shorten this a bit! "awk '!seen[$2]++'" does the same thing -- awk will print the whole line when it's provided a truthy value. It's definitely more code-golfy than being explicit about what's actually going on though

bewuethr · on March 3, 2019

Definitely very terse, but I'd call this idiomatic awk.

mooreds · on March 3, 2019

I did a lightning talk on awk last year and found this great article series from 2000 on all the powers of awk (including network access, but not yet email :) ).

https://www.ibm.com/developerworks/library/l-awk1/index.html

dehugewe1209 · on March 3, 2019

I admire your work. Clever usage of unix tools is very handy. But for parsing text, do you really see that awk and Unix tools as a better solution then a simple python script?

Although I admit that the key argument for Unix tools is that they don’t get updated. That sounds awful, but think about it, once it works, it works everywhere, no matters OS type, version or packages installed. That is something experienced programmers always want from their solutions.

twic · on March 3, 2019

Python is fantastic for little (or large!) bits of logic, but its handling of input is clunky enough to put me off for tiny things. AFAIK the boilerplate you need to get to working on the fields on each line is:

  import sys
  for line in sys.stdin:
    fields = line.split()
    # now you can do your logic

If you want to use regular expressions, that's another import.

Python also doesn't play well with others in a pipeline. You can use python -c, but you can't use newlines inside the argument (AFAICT), so you're very limited in what you can do.

ryl00 · on March 3, 2019

This is exactly where perl (namely, perl -ne) is so very, very useful.

twic · on March 3, 2019

Yes, that helps a lot. The fact that Perl uses braces rather than whitespaces also makes it work much better in this situation.

I still wouldn't touch Perl with a bargepole, though. Sorry not sorry.

usefulcat · on March 3, 2019

Totally agree. On a related note, I came across a similar thing just the other day for rubyists:

https://github.com/thisredone/rb

Probably not as fast as many of the individual unix tools it could replace, but does look like a great way to leverage one's knowledge of ruby.

mirceal · on March 3, 2019

parsing text is what a lot of these scripts/mini-pipelines do.

the key argument for *nix tools is that they do one thing and only one thing extremely well. at a meta level these tools are units of functionality and you’re actually doing functional programming, on the command line, without realizing it.

IshKebab · on March 3, 2019

The problem is there are like 5 different tools that do the same "one thing" well - awk, sed, grep, cut, etc.

Kind of simpler to learn one language that does lots of things well!

mirceal · on March 3, 2019

i’m going to respectfully disagree.

each tool you listed (with the exception of awk) does one thing and does it extremely well.

my goal is not to use a language to solve all possible variations on problems i have. my goal is to solve the problem.

another interesting side effect is that a lot of times this is super compact and good enough. when it’s not you can go to a programming language

MisterOctober · on March 3, 2019

^ agree -- I've seen lots of folks [newer users mostly] turn to grep when really what they wanted was sed. It's just a matter of learning which screwdriver is for which type of screw

faho · on March 3, 2019

Isn't sed's "one thing" a superset of grep's?

mirceal · on March 4, 2019

grep is for searching for stuff

sed is for editing streams

IshKebab · on March 4, 2019

Pretty sure you can search for stuff with sed.

tyingq · on March 3, 2019

"I don't think there's another simple tool in the unix toolkit that lets you do things like this."

Perl can, since it borrowed a fair amount of awk. It's also almost as commonly already installed. The one liner equivalents to what you showed are pretty similar, for example: https://news.ycombinator.com/item?id=19294575

Though, I concede it falls outside the realm of "simple tool".

tsimionescu · on March 4, 2019

There's a wonderful quote about things like this in the Unix Hater's Handbook:

> However, since joining this discussion, a lot of Unix supportershave sent me examples of stuff to “prove” how powerful Unix is.These examples have certainly been enough to refresh my memory:they all do something trivial or useless, and they all do so in a veryarcane manner.

mobilemidget · on March 3, 2019

So I assume you would use some sort of ' grep XX | sort | uniq' (I still) to get a unique line output. Is this awk line now your default, or did you find yourself using both for convenience?

Do you alias these awk commands on all machines you work on, or other way put, I did not find a nice way to keep my custom aliases 'in sync' over different machines, perhaps you have some recommendation or workflow that is really sweet?

TIA!

twic · on March 3, 2019

I still default to sort -u (or sort | uniq -c if i need counts), partly from habit, but partly because it's often useful to have the output sorted anyway.

I have a script on my path called huniq ('hash uniq') that contains that awk program. I prefer scripts to aliases because they play better with xargs and so on.

I have a Mercurial repository full of little scripts like this, and other handy things, which lives on Bitbucket, and which i clone on machines i do a lot of work on. In principle, whenever i make changes to it i should commit and push them, then pull them down on other machines, but i'm pretty slack about it. It still helps more than not having anything, though.

chewz · on March 3, 2019

Thank you. I have just been working on something a this

> awk '{ if (!($0 in seen)) print $0; seen[$0] = 1; }'

fits right in.

tyingq · on March 3, 2019

Perl equivalent is:

perl -ne 'print unless $SEEN{$_}++'