Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even inside of scientific computing, the picture is quite a bit more nuanced. This post is basically about a the author's personal migration to Python as a user of other people's scientific programming packages. In doing interviews with people inside of companies, there's fairly little actual use of Python for scientific computing – lots of Python for data preparation, but R and Matlab (not to mention Simulink) still dominate for the actual scientific part. And of course, there's the bizarre blind spot that the SciPy community has to the fact that they are really doing scientific computing in C – literally every single package you use that's scalable and performant is actually written in C. This is true of R and Matlab too, of course.


Yeah, I'm a Python convert like the author, though coming mostly from Matlab rather than R, and everyone in my field reacts with surprise when I tell them I prefer Python. They're open-minded, and I'm hoping to convert a few myself, but I don't think the mass migration has happened yet.

Regarding your second comment- you're correct of course, but what makes this a "blind spot"? After all, if the user is writing code in Python, they're doing scientific computing in Python, regardless of what the Python library calls behind the scenes. In my experience, a lot of people doing scientific computing--particularly those more interested in the science than the computing--could care less about what's going on behind the curtain. Any moment they have to think about implementation is a moment not thinking about science and therefore a waste of time. So it's actually a benefit for an ecosystem to hide the underlying mechanics--calling it a "bizarre blind spot" seems to imply they're doing something wrong.


I call it a "bizarre blind spot" because it seems like there's a silent consensus to never talk about this basic fact. It's a bit surreal attending SciPy and hearing all of these people talking about scientific computing in Python when almost every single person in the room spends the vast majority of their time and energy writing C code.

I disagree that the separation between implementation and user-land that's enforced by two-language designs like C/Python or C/R is socially beneficial:

1. If your high-level code doesn't perform fast enough (or isn't memory efficient enough), you're basically stuck. You either live with it or you have to port your code to a low-level language. Not impossible, but not ideal either.

2. When there are problems with some package, most users are not in a position to identify or fix those problems – because of the language boundary. If the implementation language and the user language are the same, anyone who encounters a problem can easily see what's wrong and fix it.

3. Basically a corollary of 2, but having the implementation language and user language be the same is great for "suckering" users into becoming developers. In other words, this isn't just a one-time benefit: as users use the high-level language, they automatically become more and more qualified to contribute to the ecosystem itself. It is crucial to understand that this does not happen in Python. You can use NumPy until the cows come home and you will be no more qualified to contribute to its internals than you were when you started.

These benefits aren't just hypothetical – this is what is actively happening with Julia, where almost all of its high-performance packages are written in Julia. In fact, I never realized just how important these social effects where until experiencing it first hand. The author of the article wrote:

> It turns out that the benefits of doing all of your development and analysis in one language are quite substantial.

It turns out that it is even more beneficial to not only do development and analysis, but also build libraries in one language. Of course, Julia has a lot of catching up to do, but it's hard to not see that the author's own logic implies that it eventually will catch up and surpass two-language systems for scientific computing.


> You can use NumPy until the cows come home and you will be no more qualified to contribute to its internals than you were when you started.

Just for whatever it's worth, as an occasional contributor to numpy who is an absolutely terrible C programmer, there's a _lot_ you can contribute with pure python. Yes, the core of the functionality is in C, but most of the user-facing functionality isn't.

That having been said, I completely agree on the benefits of Julia.

However, I'd argue that Julia has the potential to compete with (or replace) the scientific python ecosystem for a completely different reason: It's more seamless to call C/Fortran functions from Julia than from Python. (Though Cython and f2py makes it pretty easy in python.)

There's an awful lot of very useful, well-designed, very well-tested scientific libraries written in C and Fortran. It's far better (i.m.o.) to have a higher-level language be able to call them seamlessly than to have a high-level language where reimplementing them is a better option. (Julia does wonderfully in this regard. Python does pretty well, but not as well, i.m.o.)

Also, from what I've seen, I think the Julia and scientific python ecosystems are more complimentary than competing, at the moment. There seems to be a lot of cross-pollination of ideas and collaboration, which is a very good thing.

(...And I just realized who I'm replying to... Well, ignore most of what I said. You know all of that far, far better than I do! Julia is _really_ interesting and useful, by the way!)


:-)

I completely agree that Julia and SciPy are complementary rather than competing. I've attended the SciPy conference for several years and it's great – I love the Python and SciPy communities. It's definitely crucial to both be able to easily call existing C and Fortran libraries and write code in the high-level language that's as fast as it would have been in C. You don't want to reimplement things like BLAS, LAPACK and FFTW – but you do want to be able to implement new libraries without coding in Fortran or C, and more importantly, be able to write them in a very generic, reusable fashion.


I'd just like to add that what I love about Julia is that it actually lets you go deeper than C code. For high-performance computing it's easy to hit a wall with C (i.e. with SIMD vector instructions), and it's fairly difficult to jump the barrier to programming assembly. Julia makes it easy to muck around with the generated LLVM IR code as well as native assembly code. You can go as deep as you want without leaving the Julia REPL.


Thanks for this reply. I thought the "bizarre blind spot" comment was some sort of (absurd) thought that numpy users were unaware that C was being used under the hood.

> it eventually will catch up and surpass two-language systems for scientific computing.

Assuming that, like hardware engineers, scientists have a fair bit of general-purpose scripting to do, Julia will itself be part of a different kind of two-language solution unless it is up-to-snuff w.r.t. said general-purpose scripting. This implies libraries and good interaction with OS utilities. Any thoughts on whether or not this will be an issue with Julia?


Julia is designed to be a good general purpose language. There are already a bunch of database drivers; a simple web framework, etc., etc. http://docs.julialang.org/en/release-0.2/packages/packagelis...


In addition to what jamesjporter mentioned, Julia has IMHO a very nice, clean shell interaction paradigm for this very use case ("glue"):

http://julialang.org/blog/2012/03/shelling-out-sucks/

One of the best examples of this is the package manager's concise wrapping of git CLI commands:

https://github.com/JuliaLang/julia/blob/master/base/pkg/git....

(aside: there has been some discussing of moving to libgit2 for performance reasons)

Until recently, the startup time somewhat precluded use for general scripting. However, on the trunk the system image is statically compiled for fast startup, so scripting usage is viable.


WRT shell integration, a follow up post details the safe (no code injection) and straightforward Julia implementation.

http://julialang.org/blog/2013/04/put-this-in-your-pipe/


Fair enough. And interesting, I hadn't thought of some of this.

SciPy users are certainly doing scientific computing in Python, but it is surprising to get that attitude from the developers.


>Regarding your second comment- you're correct of course, but what makes this a "blind spot"? After all, if the user is writing code in Python, they're doing scientific computing in Python, regardless of what the Python library calls behind the scenes.

What he means is, they can't expand the core primitives provided for them in Python itself, so they are constrained by what's given if they want performance. Unlike with, say, Julia.


I think Nimrod could really make inroads in those cases. Almost as easy to write as (and in fact looks a lot like) like python with lightweight type annotations, runtime characteristics are those of C, since that what it compiles via.

http://www.nimrod-lang.org


Literally every single C programmer in the world is writing x86 or arm machine code. C programmers have this bizarre blind spot where they don't realize that though. Isn't that weird?


You've missed the point. NumPy and friends are a mass of C code with some Python bindings.


Which is completely irrelevant, since the users of it are writing python code.


>Which is completely irrelevant

Except for the reasons that the people who brought it up mentioned, which is why they brought it up.


Although I would say in academia I've seen a rapid expansion of Python as both a Matlab replacement and for doing non-HPC work, at least in the field I work in (computational chemistry and biomolecular simulation).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: