This is something I've noticed in the last 8-10 years. The rise of the python/js/java paradigm everywhere. Some of the associated LDIF (json) I enjoy much more than XML and flat files but the misapplication of tools is becoming an epidemic. When I can write: awk -F "," '{for (x = 1 ; x <= NF ; x++) {if ($x ~ /[0-9]+/) {a[x] = a[x] + $x}}} END { for (p in a) {printf "%d = %d\n",p,a[p]}}' to sum columns in 5 seconds and people are scrambling with libraries to do matrix operations I tend to scratch my head and walk away. The aversion to the command line is also something that bothers me but I don't run into it as much in my field.
import pandas as pd
data = pd.read_csv(filename)
print(data.sum())
and have the same result, I'm going to do the one that is faster to write, fewer characters, and lets me understand what's going on.
And don't get me wrong, I've written some gnarly pipelined bash before, although I'm by no means an expert, but that doesn't mean its always the right tool for the job.
I was going to be the long haired *nix geek here but I have no hair and the world is moving on. I can't pick bones with python for data science/analysis and personal convenience. _However_ as a principal engineer if someone was to say, for a trivial dataset, that we need python and pandas for an operation like this where python + pandas was not already provisioned the answer would be no.
>When I can write: awk -F "," '{for (x = 1 ; x <= NF ; x++) {if ($x ~ /[0-9]+/) {a[x] = a[x] + $x}}} END { for (p in a) {printf "%d = %d\n",p,a[p]}}'
To be fair, what the matrix libraries do is provide readability and clarity.
Show a programmer unfamiliar with awk your statement and they're going to be spending quite a while parsing it.
Show a programmer unfamiliar with numpy/pandas some matrix multiplication and they may understand it intuitively for the most part without even having to look up references.
edit: I'm actually an awk noob myself but after rereading your code for a second, it makes quite a bit of sense. "For all rows in the column, take all digits 0-9 and sum them". So perhaps not much is gained by the library
>The aversion to the command line is also something that bothers me but I don't run into it as much in my field.
The company I recently started with is really big on Splunk.
The fact that they're proud enough of coming up with the tagline "Taking the sh out of IT" to print it on branded t-shirts featured in their training material was a hint that I wouldn't be a huge fan of the product, personally.
Abstracting things away is great, but something about IT pros being proud of avoiding the command line rubs me entirely the wrong way.