Did you read my post? I understand what ephemeral storage is, and that giving an...

illumin8 · on Feb 12, 2017

So, the solution to some customers not understanding the economics and functionality of ephemeral storage is to charge all customers for a minimum of 25 hours of use, even if they only use the instance for a single hour? That seems crazy.

Look, AWS is trying to balance the economics of a large, shared, multi-tenant platform. It would be great if they had enough excess capacity around to keep ephemeral instance hardware unused for 24 hours after the customer terminates or stops the instance, but frankly, that's an edge case, and they would be forcing other customers to subsidize your edge case by charging everyone more.

cookiecaper · on Feb 12, 2017

>So, the solution to some customers not understanding the economics and functionality of ephemeral storage

Let me stop you there. In our case, it wasn't that we didn't understand what ephemeral storage was or how it functioned, or that it would get cleared if the instance was stopped (though I've frequently met people who are confused over whether instance storage gets wiped when a machine is stopped or when it's terminated; it gets wiped when an instance is stopped).

The issue was that someone typed "sudo shutdown -h now" out of habit instead of "sudo shutdown -r now" (and yes, something like "sudo reboot" should've been used instead to prevent such mistakes). Stopping an instance, which is what happens when you "shut down", can have other ramifications that are annoying, like getting a different IP address when it's started back up, but those annoyances are usually pretty easy to recover from, not a big deal. Much different ball park from getting your stuff wiped.

Destroying consumer data IS a big deal. It's ALWAYS a big deal. If your system allows users to destroy their data without being 1000% clear about what's happening, your system's design is broken. High-cost actions like that should require multiple confirmations.

Even the behavior of the `rm` command has been adjusted to account for this (though it could be argued that it hasn't been adjusted far enough); for the last several years, an extra flag has been required to remove the filesystem root.

>is to charge all customers for a minimum of 25 hours of use, even if they only use the instance for a single hour? That seems crazy.

One of several potential solutions. It doesn't seem crazy to me; at least, not in comparison to making a platform with such an abnormal design that something which is an innocent, non-destructive command everywhere else can unexpectedly destroy tons of data.

The ideal solution would be for Amazon to fix their design so that this is fully transparent to the user. Instance storage should be transmitted into a temporary EBS disk on shutdown and automatically re-applied to a new instance store when it's spun back up (it's OK if this happens asynchronously). The EBS disk would follow conventional EBS disk termination policies; that data shouldn't be deleted except at times that the EBS root disk would also be deleted (typically on instance termination, unless special action is taken to preserve it).

That could be an optional extension, but it should be on by default -- that is, you could start an instance store at a lower cost per hour if you disabled this functionality, similar to reduced redundancy storage in S3, etc. Almost every company would be thrilled to pay the extra few cents per hour to safeguard against the accidental destruction of virtually any quantity of data that might be important.

>Look, AWS is trying to balance the economics of a large, shared, multi-tenant platform. It would be great if they had enough excess capacity around to keep ephemeral instance hardware unused for 24 hours after the customer terminates or stops the instance, but frankly, that's an edge case, and they would be forcing other customers to subsidize your edge case by charging everyone more.

A redemption fee would punish the user who made the mistake for failing to account for Amazon's flawed design. Under this model, such fees should be at least high enough to make up the cost incurred by Amazon in keeping the hardware idle.

This way Amazon can punish people who impugn upon its bad design choices by making them embarrass themselves before their bosses when they have to explain why the AWS bill is $300 higher this month or whatever, and the data won't be gone. Winners all around.

illumin8 · on Feb 13, 2017

A redemption fee is a good idea, but it would still take engineering effort to build such a feature, so the opportunity cost is that other features customers need wouldn't get built.

Another thing I'd like to point out is that you really need to plan for ephemeral storage to fail. All it takes is a single disk drive failure in your physical host, and you've lost data. If you are using ephemeral storage at all, you should definitely have good, reliable backups, or the data should be protected in other ways (like HDFS replication).