Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you read my post? I understand what ephemeral storage is, and that giving another instance access to that physical device without scrubbing it is insecure. That's not the point. There's no reason that AWS needs to delete that data the instant a stop command is issued.

AWS gets paid the big bucks to abstract such concerns away in a pleasant manner. The device with customer data can sit in reserve, attached to the customer's account, for a cooldown period (of maybe 24 hours?) that would allow the customer to redeem it. AWS could even charge a fee for such data redemptions to compensate for the temporary utilization of the resource, or they could say ephemeral instances will always cost your use + 1 day. They can put a quota on the number of times you can hop ephemeral nodes.

They could do basically anything else, because basically anything else is better than accidentally deleting data that you need due to a counterintuitive vendor-specific quirk that conflicts with established conventions and habits and then being told "Sorry, you should've read the docs better."

This is an Amazon-specific thing that bucks established convention and converts the otherwise-harmless habits of sysadmins into potential data loss events. It's very bad to do this ever (looking at you, killall Linux v killall Solaris), but it's especially bad to do it on a new platform like AWS where you know lots of people are going to be carrying over their established habits and learning the lay of the land. It is not reasonable for Amazon to tell the users that they just have to suck it up and read the docs more thoroughly next time.

This is not like invoking rm on your system or database root, which is a multi-decade danger that everyone is aware of and acclimated to accounting for, and which has multiple system-level safeguards in place to prevent it: user access control, safe-by-default versions of rm that have been distributed with most major distributions lately, etc., and for which thorough backup and replication solutions exist to provide remedies when inevitable accidents do happen.

The point is that just instantly deleting that data ASAP and providing 0 chance for recovery is wanton recklessness, and there's no excuse for it. Security is not an excuse because there's no reason they have to reallocate the storage the instant the node is stopped.

If such deletions could only be triggered from the EC2 console after removing a safeguard similar to Termination Protection, that may be more reasonable, but allowing a shutdown command from the CLI to destroy the data is patently irresponsible.

Good system design considers that humans will use the system, that humans make mistakes, and it will provide safeguards and forgiveness. Ephemeral storage fails on all of those fronts. Yes, technically, it's the user's fault for mistakenly pressing the buttons that make this happen. But that doesn't matter. The system needs to be reasonably safe. AWS's implementation of ephemeral storage is neither safe nor reasonable.

Amazon has done a good job of tucking ephemeral storage away. It used to be the default on certain instance sizes. As another commenter points out, it now requires one to specifically launch an AMI with instance-backed storage. It's good that they've made it harder to get into this mess, but it's bad that they continue to mistreat customers this way, especially when their prices are so exorbitant.



So, the solution to some customers not understanding the economics and functionality of ephemeral storage is to charge all customers for a minimum of 25 hours of use, even if they only use the instance for a single hour? That seems crazy.

Look, AWS is trying to balance the economics of a large, shared, multi-tenant platform. It would be great if they had enough excess capacity around to keep ephemeral instance hardware unused for 24 hours after the customer terminates or stops the instance, but frankly, that's an edge case, and they would be forcing other customers to subsidize your edge case by charging everyone more.


>So, the solution to some customers not understanding the economics and functionality of ephemeral storage

Let me stop you there. In our case, it wasn't that we didn't understand what ephemeral storage was or how it functioned, or that it would get cleared if the instance was stopped (though I've frequently met people who are confused over whether instance storage gets wiped when a machine is stopped or when it's terminated; it gets wiped when an instance is stopped).

The issue was that someone typed "sudo shutdown -h now" out of habit instead of "sudo shutdown -r now" (and yes, something like "sudo reboot" should've been used instead to prevent such mistakes). Stopping an instance, which is what happens when you "shut down", can have other ramifications that are annoying, like getting a different IP address when it's started back up, but those annoyances are usually pretty easy to recover from, not a big deal. Much different ball park from getting your stuff wiped.

Destroying consumer data IS a big deal. It's ALWAYS a big deal. If your system allows users to destroy their data without being 1000% clear about what's happening, your system's design is broken. High-cost actions like that should require multiple confirmations.

Even the behavior of the `rm` command has been adjusted to account for this (though it could be argued that it hasn't been adjusted far enough); for the last several years, an extra flag has been required to remove the filesystem root.

>is to charge all customers for a minimum of 25 hours of use, even if they only use the instance for a single hour? That seems crazy.

One of several potential solutions. It doesn't seem crazy to me; at least, not in comparison to making a platform with such an abnormal design that something which is an innocent, non-destructive command everywhere else can unexpectedly destroy tons of data.

The ideal solution would be for Amazon to fix their design so that this is fully transparent to the user. Instance storage should be transmitted into a temporary EBS disk on shutdown and automatically re-applied to a new instance store when it's spun back up (it's OK if this happens asynchronously). The EBS disk would follow conventional EBS disk termination policies; that data shouldn't be deleted except at times that the EBS root disk would also be deleted (typically on instance termination, unless special action is taken to preserve it).

That could be an optional extension, but it should be on by default -- that is, you could start an instance store at a lower cost per hour if you disabled this functionality, similar to reduced redundancy storage in S3, etc. Almost every company would be thrilled to pay the extra few cents per hour to safeguard against the accidental destruction of virtually any quantity of data that might be important.

>Look, AWS is trying to balance the economics of a large, shared, multi-tenant platform. It would be great if they had enough excess capacity around to keep ephemeral instance hardware unused for 24 hours after the customer terminates or stops the instance, but frankly, that's an edge case, and they would be forcing other customers to subsidize your edge case by charging everyone more.

A redemption fee would punish the user who made the mistake for failing to account for Amazon's flawed design. Under this model, such fees should be at least high enough to make up the cost incurred by Amazon in keeping the hardware idle.

This way Amazon can punish people who impugn upon its bad design choices by making them embarrass themselves before their bosses when they have to explain why the AWS bill is $300 higher this month or whatever, and the data won't be gone. Winners all around.


A redemption fee is a good idea, but it would still take engineering effort to build such a feature, so the opportunity cost is that other features customers need wouldn't get built.

Another thing I'd like to point out is that you really need to plan for ephemeral storage to fail. All it takes is a single disk drive failure in your physical host, and you've lost data. If you are using ephemeral storage at all, you should definitely have good, reliable backups, or the data should be protected in other ways (like HDFS replication).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: