From original article[1] (Not the highscalability link spam):
>This led them to a model where the application is essentially piping all of its data through a single connection that is held open for as long as it is needed.
So it sounds like from using a request/response driven architecture, they adopted streaming architecture. Also, they moved away from Rails and adopted an Event driven approach.
It is well known fact that, it is almost impossible to stream stuff with Rails currently and hence adopting a event driven approach made sense. I can see, how just these two factors alone will contribute hugely to performance.
Sure, but is it unreasonable to wonder why they need such an architecture? LinkedIn is basically a CRUD app, and while they have a wall now and yadda yadda, I wonder how much of this rearch was really necessary, over simple refactorings and sysadmin attention to the nuts and bolts.
Ability to stream stuff from an Event driven reactor loop makes lot of sense when it comes to raw performance actually.
If you throw in Postgres database in mix which supports asynchronous queries - one can pretty much beat 20 passenger instances serving similar request. The problem again though is, doing non-blocking IO does not reduce database load. So, if likely they re-architected that bit as well.
Indeed, shifting to a streaming architecture can vastly reduce the amount of churn your application sees. This makes great sense if it fits the use case of your product. However, one need not throw out a huge existing code/knowledge base in order to accomplish this shift. I think that it's perfectly acceptable to shift to something like EventMachine + some HTTP layers to implement a streaming server using a good chunk of your existing codebase and developer experience. Node.js is the new hotness and so is the prospect of rewriting your application cleaner, faster and better... but often, the less you have to rewrite or learn, the more likely it is that your project will be successful.
If I had a choice I would have certainly picked EM. But I am not sure, how much of Rails code can be ported over to Event driven model of EM. I think the only thing which can be reliably ported is models.
Now, they did evaluate EM apparently and according to them node.js performed 20 times better than EM/twisted in certain tests and hence they went ahead with node.js. I am curious as anyone else to see, what those tests were.
The em-synchrony project (https://github.com/igrigorik/em-synchrony) has patched versions of the mysql gem and they say they can run most of the rails stack. They use a fiber model to un-event the program flow a bit. Ilya is brilliant really. Still, I would be a little leary of betting LinkedIn on a bunch of monkey-patched client code. The challenge in porting to EM is that so many Ruby network clients will block. With Node.js that is impossible. I think though for a rewrite it wouldn't matter, you'd use the right libraries and design your app around the framework just like they had to do switching to Node.
If you <3 EventMachine, check out Celluloid. It's a vastly more sensible approach to asynchronous code in ruby imo - uses the Actor Model (from erlang OTP, etc) to great effect.
>This led them to a model where the application is essentially piping all of its data through a single connection that is held open for as long as it is needed.
So it sounds like from using a request/response driven architecture, they adopted streaming architecture. Also, they moved away from Rails and adopted an Event driven approach.
It is well known fact that, it is almost impossible to stream stuff with Rails currently and hence adopting a event driven approach made sense. I can see, how just these two factors alone will contribute hugely to performance.
[1]: http://arstechnica.com/information-technology/2012/10/a-behi...