Excel is fine for opening up a table and doing some quick numberwang.
But, as soon as you have to take your piece of work and start making little variations and tacking bits on, or running it on different bits of data, or God forbid you want to actually test your code (and make no mistake, code is what you are making), well, it all involves rather a lot of clicking and opportunity for fuck-ups.
Of course, R itself hits limitations pretty quickly. While technically it's a general purpose programming language, trying to use it on non-tabular data or to assemble even a medium-sized program leads to pain.
Clearly Excel isn't always the best tool if you are free to choose between R or Python or C or Java. But if you're not a developer then Excel may be the only tool that you and all your colleagues are guaranteed to have. Even if I persuaded ICT to install R on my work PC, my colleagues wouldn't have it so work wouldn't be reproducible.
I find that approaching Excel as if it were a database or a programming language tends to reduce the replication problem.
For example to analyse a daily file:
* set up a data connection to an exemplar file
* refresh the connection with the new file daily
* set connection option to copy formula down so each row is identical
* point to a separate spreadsheet for lookups
* use a recorded macro to paste the calculated values into a cumulative spreadsheet
* do pivots and change over time graphs in the cumulative spreadsheet
I don't think Excel is the best tool for this workflow, but a logical approach makes it "good enough".
>But, as soon as you have to take your piece of work and start making little variations and tacking bits on, or running it on different bits of data, or God forbid you want to actually test your code (and make no mistake, code is what you are making), well, it all involves rather a lot of clicking and opportunity for fuck-ups.
This isn't an accurate reflection of where Excel is currently. Excel can run OLAP cubes at this point, which is an exponential multiplier for it's utility as a quick BI solution. Comparisons to programming languages really miss the entire point of why excel is so useful.
Yes, also Excel has a subset of Power BI's functionality for data cleaning. One of the great things about Power BI's data cleaning is that it presents a summary of all of the operations taken on the data in a linear format, so that you can easily see everything that was done to the file.
Excel is fine for opening up a table and doing some quick numberwang.
But, as soon as you have to take your piece of work and start making little variations and tacking bits on, or running it on different bits of data, or God forbid you want to actually test your code (and make no mistake, code is what you are making), well, it all involves rather a lot of clicking and opportunity for fuck-ups.
Of course, R itself hits limitations pretty quickly. While technically it's a general purpose programming language, trying to use it on non-tabular data or to assemble even a medium-sized program leads to pain.
My rough rule of thumb is:
* Checking if it's got vaguely sensible data in? Column https://linux.die.net/man/1/column.
* Just looking, maybe aggregating a column or doing a simple summarise? Excel (or Libreoffice, why not?).
* Giving outputs to someone else or might need to do it more than once? R.
* Need sensible data structures and useful abstractions? Python.
* Need more speed? C or Java or whatever.