I do scientific computing, and Python is one language I never actually got around to learning for some reason. However, as a long-time hobby, I do have an interest in programming languages so I like exploring things like Haskell, Clojure, Lisp, etc.
One language I'm really excited about for scientific computing though is Julia. From a language-design perspective, it's beautiful. It was actually thought out rather than kludged together. I've been trying to gradually use it more and more for my research, but the only problem I've found so far is the large mental context-switch I make going from my usual languages to Julia. It's hard to tell what Julia code will be the most performant because there's many ways of doing the same task. I saw someone in the comments on this page mention that you can hand-tune the LLVM generated output within the REPL itself. I imagine this would be very useful if I can get around to learning it (anyone know a good tutorial?)
The big question for me is whether Julia will be able to maintain its "purity" as it gains adoption.
R probably started out "beautiful" and "thought out" but has lost that edge with years of community driven development. It's also what make it so damn useful -- you can pretty much find anything on CRAN, often multiple implementations of it.
R is actually one of the most pure languages out there; it basically says "I have vectors; they can have missing values, be nested, and can have other vectors as attributes. And I have functions with lexical scoping. Now go and build the rest as you like." So people did this, one better, one worse -- but the core and beautiful stuff here is that all those approaches will work together and just do the job.
Once all of Bioconductor is ported to Julia, I will probably shed some joyous tears.
R is really nice for some things, because of how easy it is to work with table-like data, and operations like applying a function over a vector or a matrix, etc are cake. However, it's also slow, leaky (and sometime seems to use more memory after gc()), and often difficult to debug because so many things fail silently. If it wasn't for so many standard bioinformatics packages being available only for R, I'd probably be using Python. Once we stop running microarrays, I'll probably be using R much less.
Deliberately-straightforward -- agreed. But "highly proficient" after writing one or two scripts? That's quite a stretch.
For instance, one of the questions I give in phone screens is for the candidate to write a program to count the number of occurrences of unique words in a text file. The "after writing one or two Python scripts" approach is something like this:
counts = {}
f = open('test.txt')
lines = f.read().split('\n')
for line in lines:
for word in line.split(' '):
if word:
word = word.lower()
if word in counts.keys():
counts[word] += 1
else:
counts[word] = 1
f.close()
count_items = [(count, word) for word, count in counts.items()]
count_items.sort()
for count, word in reversed(count_items):
print word, count
Whereas the "highly proficient" (and much simpler and more Pythonic) approach might look something like this:
import collections
counts = collections.Counter()
with open('test.txt') as f:
for line in f:
for word in line.lower().split():
counts[word] += 1
for word, count in counts.most_common():
print word, count
lines = [line for line in open("bible.txt")]
words = [word for line in lines for word in line.split()]
counts = {word:0 for word in words}
for word in words:
counts[word] += 1
No imports needed. Linear time. A bit inefficient in the dictionary comprehension, but easy to read.
The "lines=" and "words=" can be compressed into one line, but I figure this is a bit easier to read for people who aren't familiar with nested list comprehensions.
I'm not too familiar with Python, but this is case sensitive unlike benhoyt's examples. Would this be case insensitive?
lines = [line for line in open("bible.txt")]
words = [word.lower() for line in lines for word in line.split()]
counts = {word:0 for word in words}
for word in words:
counts[word] += 1
Yeah, those are nice -- and may actually be more efficient on smaller files, as you're only doing the lower() once on a big string. However, for big files you don't necessarily want to read the whole thing in at once.
One nitpick: it's Pythonic (I think) to just name the list of words "words" rather than "word_list".
Yes that's a classic tradeoff, a proficient programmer will have to pick one.
Personally I always read entire files into memory first unless I have reason to believe memory will be an issue or need to program defensively against malicious/careless input. The code is always much cleaner and easier to read and if you need to do a second pass on the data you don't need to re-read it from disk.
Here's mine for what it's worth, since this is one of the Google python assignments. Admittedly, i'm not very proficient at all.
def get_file_words(filename):
file = open(filename, 'rU')
words = {}
for line in file:
for word in [word.lower() for word in line.split()]:
if not word in words.keys():
words[word] = 1
else: words[word]+=1
return words
def print_words(filename):
wordcount = get_file_words(filename)
for word in wordcount:
print word, wordcount[word]
I'd like to jump in with a little R here- it not all that difficult in "that" either!
open(con <- file('text.txt'))
text = readLines(con, n= -1L) # n is number of lines to read, -1L means read all of it
words = strsplit(text,split = " ")
counts = table(unlist(words))
I put this in because the good thing about R is that it provides functions for many such mathematical operations. And along with this, I'll say something any self-respecting pythoner will know- Less is better than more.
smashing tons of crap together isn't necessarily "highly proficient". In some cases it makes things harder to read and/or harder to maintain and many times certainly harder to edit.
except that's wrong, all you're doing is counting the number of unique words, and you didn't even consider Foo and foo as the same in your example. Part of proficiency is understanding the problem.
The straightforward design makes it fast to go from zero to having a working proficiency. But I'm not sure about "highly proficient": even experienced Pythoners get tripped up by things such as mutable function arguments, and it's often not clear why some simple-looking code is running slowly, and how to speed it up.
What are your thoughts on Clojure's suitability as a go-to language for scientific computing (except in lower-level, high-performance scenarios that might recommend Julia)? I think it has serious potential in this field. It's not there yet, but it's getting there, and the core.matrix standardization helps a lot.
I agree with collyw; Lisp is too much for most scientists, who just care about getting their research done and can't be bother to learn all this weird FP/paren stuff [1]. Supplying an obvious, familiar syntax that looks like math written on paper for, e.g., matrix operations, is critical. This is actually one of the core tensions in Julia development imho: balancing having a sane, well designed language (from a programmer's perspective) vs. having a tool that allows scientists to quickly and easily crank out results.
[1]: Note; I am not insinuating that Lisp is bad, I like it personally. Just relaying the response you will get from most practicing scientists who are not trained as programmers.
Just relaying the response you will get from most practicing scientists who are not trained as programmers.
Is it that hard of a gap to cross? I was a math major who took a couple CS courses, and wouldn't say I was "trained as a programmer". I found Scheme pretty easy to get from the go.
I tend to think that anyone who can get a PhD in the physical sciences can become a half-decent programmer-- if the desire is there, the intelligence and work ethic being established (one hopes, at least) by the degree.
It's less a matter of ability and more a matter of having the time/motivation. If you took a bunch of practicing natural scientists and/or engineers and forced them somehow to enroll in a course teaching Scheme/Clojure/etc., would many of them do well? Almost certainly. If you gave them the choice between a Lisp and something like Julia or MATLAB to use in their everyday work when they need to do some computation? They'll probably choose the later because it's easy and familiar and doesn't have a huge up-front time investment; scientists tend to be very busy people.
>The combination of NumPy/SciPy, MatPlotLib, pandas and statmodels had effectively replaced R for me, and I hadn’t even noticed
I am surprised that he "hadn't noticed" the switch from plots in R to MatPlotLib. I am a long time MatPlotLib user and I STILL find myself noticing all the time just how painful is can be (irregular data model, weird function names, the insanity that it the documentation). Then I go to the page and feel guilty because the guy who started the project (which I am using for free) died and all the finished plots look so beautiful.
For me, Python has everything I need, among which there are many things that R or Matlab have not. If I should summarize what makes Python so suited for the things I do (and did when I was still doing research) it's the following:
Python is an easy to use scripting language that can be integrated with number-crunching C/C++ code and for which a scientific standard library _with a vibrant community_ exists.
Also, I haven't written a piece of 2.x code in half a year, which is of course only possible because scipy and matplotlib are 3.x ready.
Even inside of scientific computing, the picture is quite a bit more nuanced. This post is basically about a the author's personal migration to Python as a user of other people's scientific programming packages. In doing interviews with people inside of companies, there's fairly little actual use of Python for scientific computing – lots of Python for data preparation, but R and Matlab (not to mention Simulink) still dominate for the actual scientific part. And of course, there's the bizarre blind spot that the SciPy community has to the fact that they are really doing scientific computing in C – literally every single package you use that's scalable and performant is actually written in C. This is true of R and Matlab too, of course.
Yeah, I'm a Python convert like the author, though coming mostly from Matlab rather than R, and everyone in my field reacts with surprise when I tell them I prefer Python. They're open-minded, and I'm hoping to convert a few myself, but I don't think the mass migration has happened yet.
Regarding your second comment- you're correct of course, but what makes this a "blind spot"? After all, if the user is writing code in Python, they're doing scientific computing in Python, regardless of what the Python library calls behind the scenes. In my experience, a lot of people doing scientific computing--particularly those more interested in the science than the computing--could care less about what's going on behind the curtain. Any moment they have to think about implementation is a moment not thinking about science and therefore a waste of time. So it's actually a benefit for an ecosystem to hide the underlying mechanics--calling it a "bizarre blind spot" seems to imply they're doing something wrong.
I call it a "bizarre blind spot" because it seems like there's a silent consensus to never talk about this basic fact. It's a bit surreal attending SciPy and hearing all of these people talking about scientific computing in Python when almost every single person in the room spends the vast majority of their time and energy writing C code.
I disagree that the separation between implementation and user-land that's enforced by two-language designs like C/Python or C/R is socially beneficial:
1. If your high-level code doesn't perform fast enough (or isn't memory efficient enough), you're basically stuck. You either live with it or you have to port your code to a low-level language. Not impossible, but not ideal either.
2. When there are problems with some package, most users are not in a position to identify or fix those problems – because of the language boundary. If the implementation language and the user language are the same, anyone who encounters a problem can easily see what's wrong and fix it.
3. Basically a corollary of 2, but having the implementation language and user language be the same is great for "suckering" users into becoming developers. In other words, this isn't just a one-time benefit: as users use the high-level language, they automatically become more and more qualified to contribute to the ecosystem itself. It is crucial to understand that this does not happen in Python. You can use NumPy until the cows come home and you will be no more qualified to contribute to its internals than you were when you started.
These benefits aren't just hypothetical – this is what is actively happening with Julia, where almost all of its high-performance packages are written in Julia. In fact, I never realized just how important these social effects where until experiencing it first hand. The author of the article wrote:
> It turns out that the benefits of doing all of your development and analysis in one language are quite substantial.
It turns out that it is even more beneficial to not only do development and analysis, but also build libraries in one language. Of course, Julia has a lot of catching up to do, but it's hard to not see that the author's own logic implies that it eventually will catch up and surpass two-language systems for scientific computing.
> You can use NumPy until the cows come home and you will be no more qualified to contribute to its internals than you were when you started.
Just for whatever it's worth, as an occasional contributor to numpy who is an absolutely terrible C programmer, there's a _lot_ you can contribute with pure python. Yes, the core of the functionality is in C, but most of the user-facing functionality isn't.
That having been said, I completely agree on the benefits of Julia.
However, I'd argue that Julia has the potential to compete with (or replace) the scientific python ecosystem for a completely different reason: It's more seamless to call C/Fortran functions from Julia than from Python. (Though Cython and f2py makes it pretty easy in python.)
There's an awful lot of very useful, well-designed, very well-tested scientific libraries written in C and Fortran. It's far better (i.m.o.) to have a higher-level language be able to call them seamlessly than to have a high-level language where reimplementing them is a better option. (Julia does wonderfully in this regard. Python does pretty well, but not as well, i.m.o.)
Also, from what I've seen, I think the Julia and scientific python ecosystems are more complimentary than competing, at the moment. There seems to be a lot of cross-pollination of ideas and collaboration, which is a very good thing.
(...And I just realized who I'm replying to... Well, ignore most of what I said. You know all of that far, far better than I do! Julia is _really_ interesting and useful, by the way!)
I completely agree that Julia and SciPy are complementary rather than competing. I've attended the SciPy conference for several years and it's great – I love the Python and SciPy communities. It's definitely crucial to both be able to easily call existing C and Fortran libraries and write code in the high-level language that's as fast as it would have been in C. You don't want to reimplement things like BLAS, LAPACK and FFTW – but you do want to be able to implement new libraries without coding in Fortran or C, and more importantly, be able to write them in a very generic, reusable fashion.
I'd just like to add that what I love about Julia is that it actually lets you go deeper than C code. For high-performance computing it's easy to hit a wall with C (i.e. with SIMD vector instructions), and it's fairly difficult to jump the barrier to programming assembly. Julia makes it easy to muck around with the generated LLVM IR code as well as native assembly code. You can go as deep as you want without leaving the Julia REPL.
Thanks for this reply. I thought the "bizarre blind spot" comment was some sort of (absurd) thought that numpy users were unaware that C was being used under the hood.
> it eventually will catch up and surpass two-language systems for scientific computing.
Assuming that, like hardware engineers, scientists have a fair bit of general-purpose scripting to do, Julia will itself be part of a different kind of two-language solution unless it is up-to-snuff w.r.t. said general-purpose scripting. This implies libraries and good interaction with OS utilities. Any thoughts on whether or not this will be an issue with Julia?
(aside: there has been some discussing of moving to libgit2 for performance reasons)
Until recently, the startup time somewhat precluded use for general scripting. However, on the trunk the system image is statically compiled for fast startup, so scripting usage is viable.
>Regarding your second comment- you're correct of course, but what makes this a "blind spot"? After all, if the user is writing code in Python, they're doing scientific computing in Python, regardless of what the Python library calls behind the scenes.
What he means is, they can't expand the core primitives provided for them in Python itself, so they are constrained by what's given if they want performance. Unlike with, say, Julia.
I think Nimrod could really make inroads in those cases. Almost as easy to write as (and in fact looks a lot like) like python with lightweight type annotations, runtime characteristics are those of C, since that what it compiles via.
Literally every single C programmer in the world is writing x86 or arm machine code. C programmers have this bizarre blind spot where they don't realize that though. Isn't that weird?
Although I would say in academia I've seen a rapid expansion of Python as both a Matlab replacement and for doing non-HPC work, at least in the field I work in (computational chemistry and biomolecular simulation).
I know that I use python because of how easy it is to code. I can focus wholly, totally on the logic of my code without ever worrying about if I misplaced a semi-colon or left out some weird punctuation.
Python frees me to code and not worry about things that get in the way of coding. That's why it's eating other language's lunches, the freedom is almost intoxicating.
It's ironic that you say that, given that if you don't get the whitespace correct, you'll have a syntax error. That's one of the big reason Python rubs me the wrong way: white space is semantic.
Getting the indentation right should be the least of your worries if you have a good editor (and don't do something like mix spaces and tabs, which I think everyone is in general agreement with across all languages).
When was the last time that you manually typed out 4 (or 2, or 8, etc) spaces to indent a line of code vs. just hitting tab and letting the editor handle inserting those spaces (or the editor automatically indenting when you hit enter on the previous line)?
As an aside, all of the people that I've met in person that get red in the face over the idea of white space being semantic are the sort of people that write code like this:
sub function1 { return map { $_[2]->do_something($_) } @{shift->(@_)[0]} }
I'm sorry, but I can't get worked up about not being able to write code like that.
Note: the above code is a reasonable approximation of actual code I encountered by an actual person that would get visibly upset about Python's semantic white space.
P.S. The two '$_'s in the map block actually refer to two different variables, and this is one of the reasons I remember that bit of code. It makes no sense to mix usage like that because it becomes confusing.
> Getting the indentation right should be the least of your worries if you have a good editor
I never understood that. The whole problem for me is that the indentation being the only thing denoting blocks the editor can't know for sure how things should be indented, since it's not simply cosmetic.
I haven't written a whole lot of Python but how do you even refactor python code? In C I can just copy paste a block of code from anywhere to anywhere (no matter the coding style in the source and destination file and the level of indentation) and then hit C-M-\ in emacs and have it reindent everything properly. In Python you have to make sure that everything is at the level of indentation it belongs to. If you refactor huge chunks of code it's easy to miss one fubar tab and have code subtly broken and introduce weird regressions.
Also, regarding the OP and "focusing only on your code", I think we all feel that way about the language we're the most familiar with. For me that's C and I can't say I've had a "missing semicolon" compilation error in months of heavy use. Once you're used to the syntax it becomes automatic.
> If refactor huge chunks of code it's easy to miss one fubar tab and have code subtly broken and introduce weird regressions.
That is just one of the many issues that can come up when refactoring large sections of code, and is one reason that people write tests. For people to point it out as the entire reason that they can't use Python seems to be making a mountain out of a mole hill.
For example, I don't like that I can't use "if $?" in Ruby and I need to explicitly write "if $? != 0", but I don't go around bashing Ruby on that basis. For some reason people feel that need to do that with Python though. I don't really understand it.
I'm not making a mountain out of a molehill but I don't understand why the python crowd refuse to acknowledge that having significant whitespace does cause some issues and the benefits are completely subjective.
Sure unit tests will catch the error but in most other languages the error wouldn't have been introduced in the first place.
I think in the end the problem is that I've been writing in C-style languages a long enough time that I don't even "see" the brackets and semicolon anymore. As such I don't find any advantage to the python way. To me it's sacrificing convenience for aesthetics.
I'm not saying that you are, just that these arguments/discussions generally are making a mountain out of a mole hill when people go on at length about how much they hate Python (though they've never used it) because of its semantic white space.
> I don't understand why the python crowd refuse to acknowledge that having significant whitespace does cause some issues and the benefits are completely subjective.
All language decisions are trade-offs that come with some downside. I wrote Perl for 4 years, and I would still get tripped up by its break/continue syntax (next/last) on occasion since I learned to program in C/C++.
I currently am working heavily in both JavaScript and Python, and I don't have any indentation issues swapping between the two languages (probably the issue I come across the most of switching between under_scores and camelCase for names). I could probably count on one hand the number of times I've had an indentation error in Python.
> Sure unit tests will catch the error but in most other languages the error wouldn't have been introduced in the first place.
But this is essentially the same argument for static-typing over dynamic-typing, but that doesn't get the same amount of flack as white space in Python.
I'm not making a mountain out of a molehill but I don't understand why the python crowd refuse to acknowledge that having significant whitespace does cause some issues and the benefits are completely subjective.
Here's what you wrote:
That's one of the big reason Python rubs me the wrong way: white space is semantic.
Defending python against a vague accusation like that is most certainly not "refusing to acknowledge that having significant whitespace does cause some issues"
that said, the benefits are not completely subjective. Whether you prefer the benefits or not is somewhat subjective, but the benefits can be clearly described. Specifically: you don't need braces or begin-end tags for code blocks. Correct code will always be indented based on the same concrete rules. Code can be moved from one block to another just by changing the indentation. Those are objective, not subjective traits.
Indeed that is a problem when you are copy-pasting huge blocks of code. In deeply nested code it can be difficult to determine whether the nesting should be say 28 or 32 spaces. In practice, most people shy away from writing such code because to many levels of nesting is hard to follow. People also prefer to write atomic 5-15 line functions in which keeping track of the nesting levels is trivial.
Many C# and Java-heads complain that Python lacks support for auto-completion. Which is true, the language makes it so you can't have as sophisticated auto-completion as is available for the aforementioned languages in Visual Studio and Eclipse. But it's not so bad because Python developers are trained to prefer shorter names instead of OverlyLongJavaNames such as "getattrs" instead of "GetAllAttributes".
Btw have you noticed that on this site, the only thing that indicates how the comment threads are structured is how the individual comments are indented?
Many C# and Java-heads complain that Python lacks support for auto-completion. Which is true
Isn't that an IDE issue and not a language issue? I have no problem with the auto-completion in ipython for instance, though even ipython notebook is only useful for writing simple amounts of code. PyDev though works well too for larger projects albeit a bit sluggishly.
I think all the modes for python in Emacs implement dedent and indent functions which behave nicely. I hacked one of them to leave selection active after the operation and now - it's still more than one command - I do C-y for paste, C-x C-x to select what was pasted and C-M-> or C-M-< to adjust indent.
On the other hand it's almost impossible to make a mistake with indentation with python-mode (and similar, I'm using all-in-one-plus-your-cat elpy package) when writing code - the enter key indents automatically (instead of having to press tab additionally) and it indents one level more after statements which need it. And backspace removes one level of indentation. I can't remember if I ever had a problem with indentation in Python in Emacs; although I know I had some with CoffeeScript.
So in short - it's all an editor support issue and some relatively trivial rules give you an experience as streamlined as in langs without significant whitespace.
Also, I'm a lisper so - paredit. Knowing about it and using it makes you realize how broken every other syntax is and how hard it is to edit ;)
(Replying just because of Emacs, the rest is uninteresting anyway - in my experience once you cross the border of 5-8 known langs you can easily accommodate to any syntax)
Lazy me never bothered to use proper indent/dedent function, since TAB will cycle through indentation levels.. but I'm happy I read your comment since cycling isn't efficient.
ps: although it seems, vanilla python mode binds them to 'C-c >' and 'C-c <'
Just like monads, sexps have that curse when you get how paredit or alikes work, you just can't explain how awesome it is to non lispers.
Any good editor should be able to figure out the indents when pasting. I'm not an emacs user, but I'd be surprised if there wasn't a plugin with smart python pasting.
My point is that it's not always possible for the editor to know what the indentation is supposed to be because it can't know what the code is supposed to do.
Suppose you have code like this:
[...]
if a:
b
c
[...]
And then you paste some snippet you got from somewhere else between b and c:
[...]
if a:
b
pasted_snippet
c
[...]
The editor cannot know how to indent that properly. It's not a problem in most other languages.
Again, I'm not trying to say it's a deal breaker and Python is useless as a result, I just think it's a small mistake in the design of the language. It's like non-breaking switch/case in C, it doesn't make the language unusable but it is an annoyance.
In that example, a good editor should indent pasted_snippet at least to the first indent level. If you wanted it to be part of the if statement then you could just select the pasted block (both Vim and Emacs should be able to do this with a single command) and indent it by one more.
Python's use of semantic white space is more a function of it's inheritance than anything. It's based off of ABC[1].
The editor cannot know how to indent that properly. It's not a problem in most other languages.
It will have a pretty good idea. If you paste the snippet, then hit 'tab', odds are high that a good editor (I use emacs python-mode) will Do The Right Thing on the first try, although sometimes you'll have to hit tab again or backspace a couple of times to get the right indent level.
Occasionally I need to use a keyboard macro to fix the indent after a paste, but this is very easy to do and doesn't happen to often, really. I'm sure by now with python's popularity there are more advanced indentation management tools but I still just use emacs keyboard macros.
It's a very small price to pay for the huge benefits of semantic whitespace.
Generally the worst case scenario for copy/paste is that I'm using emacs in a terminal window and I forget to switch to fundamental-mode before pasting. Because the terminal is handling the paste, not emacs, python-mode treats it all as if it had been entered one line at a time and auto-indents everything after a colon, then pasting in lines that already have indentation and the result is a complete mess. (but then I just undo it all and re-paste it the right way) GUI emacs doesn't have this problem.
The worst case scenario I can think of for semantic whitespace (outside of copy/paste) is accidentally changing indent level of a piece of code without realizing it and a syntax error doesn't result, meaning there's now a logic flaw in your program you don't know about. Python is more susceptible to that than sort of regression error than most languages. That said, usually that sort of mistake WILL cause a syntax error and be easily fixed.
> The two '$_'s in the map block actually refer to two different variables
Yeah, but that's Perl 101. Grabbing an element of array @a is $a[0], which is wholly different than plain $a.
That's not really a style thing, it's a Perl thing, for better or for worse.
> I'm sorry, but I can't get worked up about not being able to write code like that.
Why? It looks more or less reasonable to me. Most of the weirdness there is a factor of Perl's unfortunate lack of named function parameters. @_ and $_[] are simply a fact of life when writing Perl.
It's even a single expression which means you could probably write a similar one liner in python (since their wimpy "lambda" only does expressions)
> Yeah, but that's Perl 101. Grabbing an element of array @a is $a[0], which is wholly different than plain $a.
>
> That's not really a style thing, it's a Perl thing, for better or for worse.
That code confused me after working in nothing but Perl for 4 years. It was confusing because I wasn't used to people having an array (@_ in this case) and a scalar ($_) named the same thing, and used in close approximation. The issue could have been avoided by (e.g.) setting $_[2] to a variable first, and would have made the code more readable. It's not "Perl 101" to write code that is intentionally obtuse.
> Why? It looks more or less reasonable to me. Most of the weirdness there is a factor of Perl's unfortunate lack of named function parameters. @_ and $_[] are simply a fact of life when writing Perl.
That code could be written in a way that was easier to read and maintain. For example, what is 'shift' intended to be? The only information we have is that it's supposed to be the first argument to function1.
Edit:
> Most of the weirdness there is a factor of Perl's unfortunate lack of named function parameters
This was in a code base with a source filter to provide function parameters. That could have been written as:
sub function1($arg1, $arg2, $arg3) {
}
in that code base, but the developer in question chose not to.
Even without said source filter, you can still name the function parameters:
I don't think I've ever come across Perl code that uses "shift->method()" to access "this" but once. Maybe I just haven't look at enough code on CPAN, or maybe I just remember that instance because it was an entire file that was meant to be production code that could have been an entry into a Perl golf competition.
I understand that in Perl OO code, the first argument is 'this', but most code I've come across takes the time to actually name variables because the aim isn't write-once, read-never.
You are being kind or your coworker isn't that bad - in your example the whitespace is consistent. My experience is that the whitespace is totally arbitrary: totally inconsistently placed 0 - n spaces with random indentation levels.
Sloppily formatted code is Edward Bear code. All the bumping makes it hard to think about how it works (or, more often, why it doesn't work).
"Here is Edward Bear, coming downstairs now, bump, bump, bump, on the back of his head, behind Christopher Robin. It is, as far as he knows, the only way of coming downstairs, but sometimes he feels that there really is another way, if only he could stop bumping for a moment and think of it." - http://www.gurteen.com/gurteen/gurteen.nsf/id/L001362/
> My experience is that the whitespace is totally arbitrary: totally inconsistently placed 0 - n spaces with random indentation levels.
This is usually due to:
1) Lack of a consistent style guide.
2) Lack of style guide enforcement.
3) A language where white space doesn't matter.
If you think about this critically though, these random indentation changes would either break all of the code (e.g. it wouldn't run, or would run but not work correctly) or it would make code maintenance a nightmare. Yet there are plenty of Python shops out there, and we don't hear horror stories of Python white space maintenance nightmares. Either the Python community is doing a good job of hiding these issues, or they really aren't issues in practice.
Of the people I've talked to in-person about Python semantic white-space, the common threads are either:
1) It's different than what I'm used to.
2) It's cramping my style. My code is art, and restricting how I can structure my code is an affront to my very being.
3) Python's semantic whitespace is so wonderful I'm baffled that most other programming languages don't have it. It's like they're coding with one eye shut: why not do this wonderful thing that makes everything easier?
2) It's cramping my style. My code is art, and restricting how I can structure my code is an affront to my very being.
To be fair, there are quite a few common cases where Python's syntax is rather inelegant. The need to break out if-then statements into multiple lines, the need for explicit 'return' statements which also have to go on their own line. These things are only indirectly related to whitespace but do sometimes put a lower bound on expressiveness.
> To be fair, there are quite a few common cases where Python's syntax is rather inelegant
Now we're stepping out of the realm of "semantic white space" though. The people I've known to get (literally) red in the face over Python haven't actually used the language and can't do much more than regurgitate stuff like, "but white space!"
> The need to break out if-then statements into multiple lines, the need for explicit 'return' statements which also have to go on their own line.
Unless I'm missing something those are poor examples of the limits of semantic white space:
Actually after a quick search I realized my problem was ignorance. I did not know about PEP 308 (conditional expressions) which is a workaround for the issue I described.
And yes technically it's a statement/expression issue not a whitespace issue, though they're indirectly related.
The problem is larger multi-people projects. You need very strict editor/whitespace discipline, and you don't want to commit some sort of re-indent which touches every line of a file. That and the refactor issue is the main issue people have with the whitespace thing.
It's easy enough if you have only one editor on one computer that you use with only one language. If you mix and match a half-dozen editors on multiple computers running different OSes coding in different languages, then it gets messier. Especially when all of the editors have different ways to set preferences for whether to use spaces or tabs, how much space per tab/indent, and whether those preferences are for this session, this language, or permanent.
> Getting the indentation right should be the least of your worries if you have a good editor (and don't do something like mix spaces and tabs, which I think everyone is in general agreement with across all languages).
Nope. Tabs are for indentation and spaces are for alignment. It's precisely because of non-good (or maybe non-smart) editors that people can't be bothered acknowledging or practicing this distinction and end up using spaces for both.
I was mainly talking about mixing tabs and white space for indentation. Many people set their tabs equivalent to different numbers of whitespace, so when the whitespace and tabs mix, it becomes an issue if the previous developer had tabs set to 2 spaces, but you have them set to 4.
I disliked the whitespace thing at first, but after using it for a while I got used to it. Then when I had to go back to change some Perl code I realised the real beauty of it - you never get the problem of having unmatched braces when moving blocks of code around.
Eh, this argument is at least 10 years old now, isn't it? I don't even remember the last time I had a problem with whitespace in Python. Just use a decent editor (vim!) and you're golden. :)
It's ironic that you say that, given that if you don't get the whitespace correct, you'll have a syntax error. That's one of the big reason Python rubs me the wrong way: white space is semantic.
Not to pile on but this is exactly backwards. The whitespace saves keystrokes and errors because it's giving a semantic meaning to something that programmers put in their code anyway.
Now that I've actually written some code in ruby, I can't comprehend how rubyists aren't driven insane by the 'end' tokens. I'm sure this is a newbie mistake but on multiple occasions I've written something like this:
def foo(x)
if x > 5
puts x
else
puts 5
end
foo(7)
foo(3)
Any experienced ruby programmers should see the error pretty quickly. But it LOOKS pretty good. Here is the error you get:
What's on line 9? That's just the end of the file. Nowhere near where the syntax error actually is. In this trivial example, tab-checking the indentation using my editor will reveal the error. But this is a trivial example. In more substantial code this technique is not nearly as effective. In erb templates its harder still.
That never happened to me in python, even as a newbie. If I forget a colon I almost always know instantly because the editor will try to indent my code dramatically wrong. And if I inadvertently delete the colon at a later time without updating the indentation (this happens a lot) I get an instant syntax error that points me right at the line where the colon is missing. It's an error you can fix in your sleep. You don't forget end tokens because there are no end tokens to forget. The block ends when the indentation level decreases.
I do all of my scientific computing in python these days.
However, I think it's interesting to compare popularity using stackoverflow (which isn't a great metric, as most scientists aren't aware that it exists):
Semi-useless Stackoverflow Popularity Metric
--------------------------------------------
Searching for questions tagged "[r]":
* 45,119 questions
Searching for questions tagged "[matlab] or [simulink]":
* 27,044 questions
Searching for questions tagged "[numpy] or [scipy] or [pandas] or [matplotlib]":
* 18,745 questions
Searching for questions tagged "[julia-lang]":
* 95 questions
Searching for questions tagged "[python]":
* 255,603
Sure, python isn't as widely used for scientific computing as R or matlab (as evidenced by the third item above), but there's a lot to be said for using a very widely-used language for scientific computing. This is doubly true once you branch out from the "core" scientific code. Building a deployable desktop application is a lot easier in python than in matlab (Done it, partly through java. Don't want to again.) or R (Never tried. Might be easier than I think.).
I'm quite aware of MatlabCentral, and it's a _very_ active community. Similarly, there are other forums for scipy, etc (mostly mailing lists). Traditionally, this is where the majority of questions were answered in the scientific python community, though lately stackoverflow has gained popularity.
I wasn't claiming it was a complete sample. I used it as an unbiased random sample, but it's obviously not completely unbiased, either.
I do think it's fair to say that usage python as a whole (of which the scientific python community is a very small part) is larger than matlab usage as a whole. That alone is not a good reason to choose a tool, but it does have some advantages. That's the point I was trying to get at.
I saw a post on Perl loosing ground to Python based on Stack Overflow posts. Then I remembered, Perl Monks is far better than Stack Overflow for Perl. I would far rather see the discussions that are encouraged there, than the "closed as not constructive" crap on Stack Overflow. (Though to be fair I do see Python getting used more and more, and hear less of Perl).
I find Lua more interesting than Python. It has all the simplicity, all of the power, none of the indentation, and is quite a nice portable tool.
That said, I do wonder at times what it is about Lua that makes so many people not-interested in it, when .. from my naive point of view .. its an almost perfect language for rapid development. I don't have that feeling about Python, quite so much ..
I love, love, love Lua. I use it for everything. That being said, there's a lot I'd like to use it for that I can't. I'm not a great programmer by a longshot, I'm a hacker in the most traditional of senses (ie, not a hacker who builds billion-dollar wildly successful startups. I write one-off programs to solve a need or to automate a task).
Lua doesn't have a lot of features. This is great because it's small and simple to learn, but it makes some jobs harder. As I'm not a great programmer, there are features that people could write themselves, but that's beyond my skill level. I'm constantly pushing tasks to the OS level, which makes my code one step above a Bash script. Other times I'll push something to C, which turns friendly code into a death trap.
Lua needs some love and care from the community and it would be perfect. Python has that love and care, but it's still a mess (IMO) at the core language.
"That said, I do wonder at times what it is about Lua that makes so many people not-interested in it, when .. from my naive point of view .. its an almost perfect language for rapid development."
Probably because the core language is only a small part of what determines a language's usefulness. Python has heaps of tools and libraries, is used in many software packages as the scripting language, and is generally well-known by scientists. Who cares about syntax, really. It's the ecosystem that makes a language powerful.
Thats why I find it curious that more hackers don't adopt Lua, and the Lua VM, for a lot of projects - you can put the VM anywhere. ANY. Where. It takes less than a day to get the Lua VM inserted in a project, and from that point on you have a powerful engine for productive development..
They don't adopt it because the ecosystem is so much smaller, as I said. Chicken and egg, and Python had the first mover advantage. I embedded Python from C++ in less than a day too, that is not the problem. It's the design of the API for the integration which takes the work.
I'm a user of both, fan of both, but the lightness of Lua is a big plus.
The main downside, IMO, is that the Lua user base is smaller so there's more need to roll your own solutions for things that Python already has several libraries for.
For a personal project, sure, but if it's a work project that has to get to the goal line along with ten other things, right NOW!, then it's nice to have some drop-in libraries that "just work".
Lua the language and LuaJIT the third party vm are great. But there is no general purpose standard library. Like in C one has very limited libraries that are shipped with Lua - that's fine. Though I really miss a good website with a list of additional up-to-date libraries and a community around it.
Lua needs a better project website, like PHP where you can post code examples as comments. At the momemt the Lua website is stuck in 1999. The wiki software is bad, the only way to stay in touch is the mailing list, etc. The LuaForge website is outdated. Finding various libraries is pain. Many libraries are hosted on servers that don't exist anymore or are for Lua 4 or 5.1 but not 5.2.
A lot of users use Lua in closed source software as embedded language (video games, Adobe programs, etc.)
Imagine if all those libraries that you find for Python would be available for Lua, it would be great.
From 1977 until 1992, Brazil had a policy of strong trade barriers (called a market reserve) for computer hardware and software. In that atmosphere, Tecgraf's clients could not afford, either politically or financially, to buy customized software from abroad. Those reasons led Tecgraf to implement the basic tools it needed from scratch.
I hope you're wrong about Lua, which I've never used. But allow me to say that I have to use Tcl, and I hate it. I want to like it, the syntax is clean and code has a nice look to it.
But the everything-is-a-string [1] semantics is awkward to deal with. As with shell scripting, a lot of what you do amounts to solving problems with quoting. I find Tcl hard to debug, and I don't like the scoping (upvar!).
I'd much rather use Emacs Lisp. In fact, Cadence uses a language of their own, called Skill, for some of their tools. It's so close to Emacs Lisp it may as well be identical, and it's far easier to deal with than Tcl.
[1] That's no longer true under the hood. But Tcl behaves as if it's true.
I think Lua improves on the Tcl "everything-is-a-string" semantics a great deal: everything is a table, or a string, or a number.
The great thing about Lua is that tables - or, rather, metatable programming - is really, really powerful. I'm finding it difficult to think of an example of a common, powerful data structure that we all know and love, which can't be implemented with Lua tables/metatables. Lists, hashes, arrays, tries, trees, all of these basic things work so well in the context of the Lua table.
However, you have to learn what a table is, in Lua. You have to learn how to use it effectively. A lot of times, folks don't take the effort to understand how metatables can be used to turn your average Lua table into .. objected-oriented constructs (classes), queues, stacks, etc. But if you do make at least this milestone in learning Lua: watch out! You won't want to use any other language, ever again! :)
As much as I like scikit-learn and pandas, Python likely won't be replacing my R code for quite some time, and I'll continue to hop between the two of then.
R is, first and foremost, a language for statistical analysis, and that's really where it shines. Python is getting better (it used to be "you want to do...what?"), but it doesn't have nearly the package infrastructure R does for advanced statistics. It is, for most statistical computing tasks not even on the radar for a number of my colleagues.
I mostly agree with this article, but we are not there yet. I work with scientists who love the IPython Notebook technology. Some claim the IP[y]: Notebook to be the best thing since the Mosaic web browser and the most important development in scientific computing in a decade. I tend to agree, it is a revolutionary technology and the idea of executable papers is tantalizing. But there are also big problems. In particular, setting up a Python environment with all the necessary libraries is a real pain in the neck even with technologies like pip. For a fee, companies like Enthought are making good progress at taking the pain away (though what happens when you have awkward custom dependencies?). Cloud solutions for preconfigured IP[y]: Notebook servers is another exciting possibility, but not ideal if you work with big data where you want your data local to your Python environment.
Also, as I understand, taking advantage of multicore parallelism is not trivial because of the Python Global Interpreter Lock. I have also worked in JVM environments where parallel computing is becoming significantly easier and I don't see that happening in Python anytime soon. I would love to be proven wrong, of course.
Re: setting up environment, I love Anaconda for that reason. Wget the installer, run it, all done - you have a fully featured Python environment ready to go, including Numpy, Scipy, Scikit-learn and much more. Running IPython and the IPython Notebook is then trivial.
If you need anything else, you can use their own Conda package manager, or you can just use pip as usual.
@dkersten What happens in the scenario where you have other Python distributions on your system? Does Anaconda keep things nicely compartmentalized like a virtualenv?
Yes, Anaconda is completely self-contained (you can actually install it anywhere you like, for example, your home directory) and does not interfere with the rest of your system at all. You can also run virtualenv from Anaconda too, if you want to have isolated Anaconda environments without installing it multiple times (which you could also do, if you wanted..)
I like using it to make sure I have a very easy to install consistent environment between the various computers I use (including EC2 instances).
He looks at the contention that Python is killing R--based on various data sources--and ultimately concludes:
"While the original argument is certainly defensible, then, I find it ultimately unpersuasive. The evidence isn’t there, yet at least, to convince me that R is being replaced by Python on a volume basis. With key packages like ggplot2 being ported, however, it will be interesting to watch for any future shift."
My only question mark from this is matplotlib. I tried it five or six years ago and it seemed clunky to use and install. And worst I couldn't seem to just throw up a plot, I recall there being a lot of settings required. And the plots didn't look good by default you had to fool with fonts, font sizes, etc.
Does anyone know if it's improved a lot since then? Otherwise I'm not seeing how it could hold a candle to R's plotting abilities and ease of use.
Also the code examples given on AstroML to work well for figuring out how to make publication quality figures in Matplotlib:
http://www.astroml.org/book_figures/
For things that only need a one or two line command set, Python and R are probably similar. Once things get a bit more complex, matplotlib may not seem as "friendly" to those used to ggplot.
But this about perspective. Coming from a Python background, I would rather stay in Python and work with matplotlib, seaborn, or even ggplot.py, than try to work my data management code into the R model.
Yep, ggplot is the one thing that I keep coming back to R for, plus the odd statistical model I can't find in statsmodels - which is rarer and rarer.
One of the many awesome features of IPython - the interactive python shell and notebook, is that you can call code blocks in R just by prefacing with %%R. So my plotting habits are usually first to try the python port of ggplot, and if that can't handle my situation I just jump into R without having to switch windows or do any complicated data transfer.
It's worth mentioning that matplotlib is designed to mimic Matlab's plotting API, so for people coming from Matlab there's very little change, plus there's all the benefits of the other plotting libraries others have mentioned.
You could try my veusz plotting GUI / plotting package as an alternative to matplotlib: http://home.gna.org/veusz/ I think the output looks nicer than matplotlib by default, and you can have a nice GUI and scriptable interface.
I have a Python background and recently signed for the Coursera course on R that just started (https://www.coursera.org/course/compdata) because I wanted to get a small taste of R and see how it differed from Python's scientific computing stack.
So right now I'm not far enough in the learning curve to see all the benefits R provides. Is it worth investing time in R now if I'm already pretty familiar with a good amount of the Python ecosystem? Or, would it make more sense to continue on in Python?
If you are serious about data analysis you should probably at least read R (and maybe Matlab) as lots of algorithms were released and only exist in one of those languages.
You could get by with a more general statistics course that happened to use R.
I would say it depends on precisely what scientific work you need to do. E.g. for phylogenetic statistics there are some nice R packages that bundle simulation techniques and measures that are so far not implemented as conveniently, or at all, in Python. So you need to explore what packages/libraries are out there that fulfill your needs (and also consider how much time/skill/interest you have to code your own packages/libraries where needed). R is still really popular for stats/prototyping/data viz and a useful language to have up your sleeve.
Edit: in response to "why isn't there a movement around using JavaScript for scientific computing", which I thought was an excellent question (which has crossed my mind on occasion).
Typed arrays and real support for integers are crucial features for scientific computing. Although JavaScript recently got some support for typed arrays, they are pretty awkward to use and there still isn't any support for 64-bit integers, let alone even larger types (128-bit). There's also no support for true multidimensional arrays, meaning you're left to simulate them for yourself like in C. Oh, and let's not forget how awful basic things like equality are in JavaScript. Given how awkward numbers, typed arrays and multidimensional arrays are in JS and how essential all of these things are for scientific computing, I think you can see why this hasn't happened.
You don't have to simulate multidimensional arrays in C. It's just easier than wrapping your head around the weird syntax required to pass them around: http://pastebin.com/JTjQMfxr
What does this have to do with Scientific Computing? I don't think there is anyone would say that in the realm of Scientific Computing there is a problem with slow in python.
to those downvoting the parent to this comment ... honestly, do you disagree? If so I must be missing some trick, and if so I would love to hear what it is.
problem is typically copying and pasting from an editor into a terminal window running python... usually for code blocks that have not just one but at least two (or more) levels of indentation
iPython and pasting with %cpaste usually helps but what if I want plain Python not iPython?
anyway it's a problem I don't have with any other language and so that's why I'm complaining about it
One language I'm really excited about for scientific computing though is Julia. From a language-design perspective, it's beautiful. It was actually thought out rather than kludged together. I've been trying to gradually use it more and more for my research, but the only problem I've found so far is the large mental context-switch I make going from my usual languages to Julia. It's hard to tell what Julia code will be the most performant because there's many ways of doing the same task. I saw someone in the comments on this page mention that you can hand-tune the LLVM generated output within the REPL itself. I imagine this would be very useful if I can get around to learning it (anyone know a good tutorial?)