Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In any case, it is a huge security issue, because file systems use this command to efficiently clear freed blocks to zeros.

Do file systems directly issue SCSI commands? I would've thought they tell the storage driver to do something and the driver would do it with the most efficient means available.



If not supporting WRITE SAME turns out to be a security issue, it's a bug in the operating system.

And yes, some filesystems do - ESX, for example, uses what they call VAAI, which is a set of optional (standardized) SCSI functionality, like WRITE SAME, COMPARE AND SWAP (iirc), and server side copy.


Ah, blame tennis, my favorite game!

Is there an alternative non-optional strategy for achieving secure delete (or revocation semantics of some kind)? If not, this is a fundamental capability that you can't paper over by slapping an abstraction layer on top any more than you could turn a 1TB HDD into a 2TB HDD with an abstraction layer. If so, it seems to me like the bug is very much in the hard drive / standards, not in the operating system.


> Is there an alternative non-optional strategy for achieving secure delete

Issue normal data writes of blocks that are filled with zeros. The same way regular data makes it to the drive just fine will also of course work for data that's all zeros.


Oh, so WRITE SAME doesn't come with "no wear leveling" semantics? That makes emulation much more reasonable.


I think the only way to get "no wear leveling" is the ATA Secure Erase command. Which you only need for devices that do wear leveling in the first place which the drive in question doesn't anyway so it's a bit moot.


Would that work on a filesystem that supports sparse files?


We're talking about the filesystem driver itself issuing the write.

The above is a discussion about whether the filesystem driver or the block device driver would issue the SCSI commands.

This would never happen from userspace.


Why would you need to overwrite blocks of a sparse file? Which blocks would you be overwriting?


> If not supporting WRITE SAME turns out to be a security issue, it's a bug in the operating system.

Is it though? There is probably a big drawback in terms of resource consumption if this is not supported. Not all environments may be ok with this.


That sounds like a performance issue, not a security issue?


The (potential) security issue is that the operating system fails to clear disk space, possibly incorrectly assuming it has actually been cleared.


Except that it receives an error, so if it's assuming that, it's wrong.

This command is not mandated to be supported. Therefore, if an OS assumes it is supported, that's an OS problem, not the drive.


I don't see how this could occur. Either you do WRITE SAME and it works or it doesn't and you have to manually issue WRITE commands.


It likely flows down from the same kind of code that supports TRIM or other free space clearing stuff. They won't (usually) issue the commands themselves


It seems like a bug in a storage driver in that case (if it's actually getting triggered by it)... if a command isn't available, it should be falling back to one that is, right?


Maybe? I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported. If there is maybe it advertised support and confuses the system when it doesn't work


When you talk to disks via smartctl, the tool reports the specification versions they support. There's a ATA Version and SATA Version field for SATA disks. I was unable to get details on a SAS disk, but it was identified as a SAS drive successfully.

These standards probably define mandatory and optional commands to certify disks as compatible with these specs IMHO.

If the command is optional, then it's OK, but if it's not, then there's some bug fix what WD shall make.


For SAS drives I would recommend sg3_utils [1]. You can basically query what the drive supports via `sg_opcodes`.

smartctl isn't really designed to handle SCSI protocol I think. It can do basic things but for anything deep you better use sg3_utils.

[1] http://sg.danny.cz/sg/sg3_utils.html


Thanks for the reply and the utility. I'll take a look into it. Since I'm familiar with smartctl due to my server management roles, it came to my mind so I've shared. I never thought it should be able to handle beyond what it needs to do get SMART and other diagnostic data.

Thanks again. :)


> I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported.

The OP shows errors that are reported to the OS by the drive when it attempts to use the command. Even if it can't pre-determine support for the command, it can fall back upon receiving an error.


Yes, it's called "REPORT SUPPORTED OPERATION CODES". If you have sg3_utils instead, sg_opcodes can be used to get the list of operations supported.


Why do you need Discovery? The command itself returns illegal opcode, that's sufficient right?


It's not always safe to simply try an opcode if it's valid because it might trigger something else... like a firmware update (which has happened)


Thanks, I suppose that answers the question of "why not try the opcode instead of doing command discovery". Though what I was really trying to understand was, "if you've already issued the command {for whatever reason}, and it returns invalid opcode, then shouldn't you fall back to an alternative command?" Because at that point, you have enough information to know you can do so safely. It seems to me that that's what the storage driver needs to do, irrespective of any command discovery or lack thereof beforehand.


There can be other reasons for command failure than "opcode not supported", even if that's the error code returned. I wouldn't trust cheaper harddrives to handle that properly either.


What would such a reason be? How likely is this to happen? If you have such a mistrust of the response then you can never trust anything, right? How do you know the drive isn't lying about everything else too? At some point you gotta trust something means what it says...


The trust is in what the drive identifies as supported.

The issue is that some command ops may be doing double duty in a different drive. Famously, a few CDROM drive vendors reused the "clear buffer" command to instead mean "update firmware". Linux used support for "clear buffer" to detect if a drive is a CDROM or CDRW drive. As a result, using such a specific CDROM drive under linux would quickly cause the CDROM drive to become permanently bricked.

You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

That applies to any command that the drive does not advertise support for via appropriate SAS and SATA commands. In some rare cases you might manually have a whitelist of commands supported by drives outside this list but you should never try to automatically discover it during runtime.


> You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

I still don't get this. If the damage is already done, then how is issuing the fallback going to change things? Again: I'm not arguing about whether discovery should be done or not. All I'm saying is, if the device says invalid opcode, you should use the fallback, whether or not there was any discovery that led you to use the initial opcode.


You don't know what state the drive is in anymore. The safest option is to reset the device entirely and start it back up again. If it comes back, you can use your fallback.

But it is much easier to rely on what is known to work instead of issuing potentially non-working commands to the point that there is no reason to have a fallback other than "rediscover what it supports".

I don't get why you would even want to use a fallback command on a drive that is in a potentially unknown or undefined state.

If discovery led to an invalid opcode the drive is faulty, end of story. The SAS and SATA standards are very clear on what is permitted and what is forbidden and that falls very far on the side of "not allowed".


Is this just a theoretical thing, or have there been actual drives that lied about invalid opcodes on a read and then proceeded to destroy the drive if you issued a fallback read? I have a hard time believing a hard drive would behave like a C compiler if I'm being honest...


As I mentioned earlier, there was a series of CDROM drives that upon receiving an unsupported command (and this was before you could discover it) would lead to all further data being interpreted as firmware data for an update and brick the device. If you issued a fallback read then the device would become bricked, if you reset the bus and reinitialized the device, everything was fine.

Discovery has of course improved this, so we know what a harddrive can and cannot do. Harddrives that lie about what they support shouldn't have the appropriate seals and trademarks of SATA or SAS on them, as they must be certified by those entities.


Oh wow, it would interpret every subsequent command as firmware data? I didn't realize that, that's completely nuts. Thanks for sharing that!


Well, if it's a security concern, perhaps the code author should check the response code rather than blindly assuming it worked.


No, it's indirect via whatever the operating system's disk abstraction is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: