This articles misses the point of the cloud. The argument for "live migrations" ...

t0mas88 · on Jan 10, 2015

Spot on. A random server could go down at any moment, either by a hardware failure or because your cloud provider decides it needs maintenance. The latter actually gives you an up-front warning, the former is just "bang". So any well designed system independent of running on a cloud or on your own hardware should be able to deal with this, since it's the reality of servers: sometimes one breaks.

All of the bigger Clouds (Amazon, Rackspace, SoftLayer) handle their security related restarts this way, one area/region/DC at a time carefully making sure that well designed systems don't notice downtime.

What Verizon does is a totally different thing and something most "important but not life threatening" systems do not design for: Taking down the whole cloud with all servers at the same time.

devonkim · on Jan 10, 2015

What hasn't really been said well is that there's constant incidents that are going on in large cloud providers and you don't need to have a maintenance period like this to experience failures. I work in a place where people are still used to individual machines being up constantly by IP alone and aren't practicing high availability architectures in their cloud environments. I had to explain to an architect that VMs can simply just hiccup in AWS or any other provider and that just relaunching the instance is fine. The idea of random failures may be foreign, but I was certainly a bit surprised that someone with such rudimentary understanding of infrastrucure wouldn't be familiar with the idea of "have you tried turning it off and on again?"

With that said, I have zero hope that most enterprises will be able to get a decently solid failover strategy for their cloud applications because an awful lot of them are barely able to get anything to even one cloud provider setup at the most basic layout. Half the folks I'm aware of don't even have cross AZ redundancy let alone cross region failover. When people are turning off 3-4 VMs to save money, you're in no position to think about cross provider load balancing.

Most of the enterprises I'm familiar with would throw Verizon more and more money in the blind hope that it'll make their service more reliable. They'll do this to an incredible dollar amount partly because it'll still be cheaper than their old, traditional in house IT that cost multiple orders magnitude more but got way worse availability. Stuff like this scheduled maintenance shows how foolish it is to believe that someone will magically become ultra reliable by you virtue of you paying them to be.