> In any case, it is a huge security issue, because file systems use this comman...

trasz · on March 26, 2020

If not supporting WRITE SAME turns out to be a security issue, it's a bug in the operating system.

And yes, some filesystems do - ESX, for example, uses what they call VAAI, which is a set of optional (standardized) SCSI functionality, like WRITE SAME, COMPARE AND SWAP (iirc), and server side copy.

jjoonathan · on March 26, 2020

Ah, blame tennis, my favorite game!

Is there an alternative non-optional strategy for achieving secure delete (or revocation semantics of some kind)? If not, this is a fundamental capability that you can't paper over by slapping an abstraction layer on top any more than you could turn a 1TB HDD into a 2TB HDD with an abstraction layer. If so, it seems to me like the bug is very much in the hard drive / standards, not in the operating system.

kllrnohj · on March 26, 2020

> Is there an alternative non-optional strategy for achieving secure delete

Issue normal data writes of blocks that are filled with zeros. The same way regular data makes it to the drive just fine will also of course work for data that's all zeros.

jjoonathan · on March 26, 2020

Oh, so WRITE SAME doesn't come with "no wear leveling" semantics? That makes emulation much more reasonable.

kllrnohj · on March 27, 2020

I think the only way to get "no wear leveling" is the ATA Secure Erase command. Which you only need for devices that do wear leveling in the first place which the drive in question doesn't anyway so it's a bit moot.

pixl97 · on March 26, 2020

Would that work on a filesystem that supports sparse files?

throwaway373438 · on March 26, 2020

We're talking about the filesystem driver itself issuing the write.

The above is a discussion about whether the filesystem driver or the block device driver would issue the SCSI commands.

This would never happen from userspace.

tedunangst · on March 26, 2020

Why would you need to overwrite blocks of a sparse file? Which blocks would you be overwriting?

outworlder · on March 26, 2020

> If not supporting WRITE SAME turns out to be a security issue, it's a bug in the operating system.

Is it though? There is probably a big drawback in terms of resource consumption if this is not supported. Not all environments may be ok with this.

codys · on March 26, 2020

That sounds like a performance issue, not a security issue?

HelloNurse · on March 26, 2020

The (potential) security issue is that the operating system fails to clear disk space, possibly incorrectly assuming it has actually been cleared.

cornishpixels · on March 26, 2020

Except that it receives an error, so if it's assuming that, it's wrong.

This command is not mandated to be supported. Therefore, if an OS assumes it is supported, that's an OS problem, not the drive.

rowanG077 · on March 27, 2020

I don't see how this could occur. Either you do WRITE SAME and it works or it doesn't and you have to manually issue WRITE commands.

simcop2387 · on March 26, 2020

It likely flows down from the same kind of code that supports TRIM or other free space clearing stuff. They won't (usually) issue the commands themselves

dataflow · on March 26, 2020

It seems like a bug in a storage driver in that case (if it's actually getting triggered by it)... if a command isn't available, it should be falling back to one that is, right?

simcop2387 · on March 26, 2020

Maybe? I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported. If there is maybe it advertised support and confuses the system when it doesn't work

bayindirh · on March 26, 2020

When you talk to disks via smartctl, the tool reports the specification versions they support. There's a ATA Version and SATA Version field for SATA disks. I was unable to get details on a SAS disk, but it was identified as a SAS drive successfully.

These standards probably define mandatory and optional commands to certify disks as compatible with these specs IMHO.

If the command is optional, then it's OK, but if it's not, then there's some bug fix what WD shall make.

rowanG077 · on March 27, 2020

For SAS drives I would recommend sg3_utils [1]. You can basically query what the drive supports via `sg_opcodes`.

smartctl isn't really designed to handle SCSI protocol I think. It can do basic things but for anything deep you better use sg3_utils.

[1] http://sg.danny.cz/sg/sg3_utils.html

bayindirh · on March 29, 2020

Thanks for the reply and the utility. I'll take a look into it. Since I'm familiar with smartctl due to my server management roles, it came to my mind so I've shared. I never thought it should be able to handle beyond what it needs to do get SMART and other diagnostic data.

Thanks again. :)

cornishpixels · on March 26, 2020

> I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported.

The OP shows errors that are reported to the OS by the drive when it attempts to use the command. Even if it can't pre-determine support for the command, it can fall back upon receiving an error.

chadcatlett · on March 26, 2020

Yes, it's called "REPORT SUPPORTED OPERATION CODES". If you have sg3_utils instead, sg_opcodes can be used to get the list of operations supported.

dataflow · on March 26, 2020

Why do you need Discovery? The command itself returns illegal opcode, that's sufficient right?

zaarn · on March 26, 2020

It's not always safe to simply try an opcode if it's valid because it might trigger something else... like a firmware update (which has happened)

dataflow · on March 27, 2020

Thanks, I suppose that answers the question of "why not try the opcode instead of doing command discovery". Though what I was really trying to understand was, "if you've already issued the command {for whatever reason}, and it returns invalid opcode, then shouldn't you fall back to an alternative command?" Because at that point, you have enough information to know you can do so safely. It seems to me that that's what the storage driver needs to do, irrespective of any command discovery or lack thereof beforehand.

zaarn · on March 27, 2020

There can be other reasons for command failure than "opcode not supported", even if that's the error code returned. I wouldn't trust cheaper harddrives to handle that properly either.

dataflow · on March 27, 2020

What would such a reason be? How likely is this to happen? If you have such a mistrust of the response then you can never trust anything, right? How do you know the drive isn't lying about everything else too? At some point you gotta trust something means what it says...

zaarn · on March 27, 2020

The trust is in what the drive identifies as supported.

The issue is that some command ops may be doing double duty in a different drive. Famously, a few CDROM drive vendors reused the "clear buffer" command to instead mean "update firmware". Linux used support for "clear buffer" to detect if a drive is a CDROM or CDRW drive. As a result, using such a specific CDROM drive under linux would quickly cause the CDROM drive to become permanently bricked.

You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

That applies to any command that the drive does not advertise support for via appropriate SAS and SATA commands. In some rare cases you might manually have a whitelist of commands supported by drives outside this list but you should never try to automatically discover it during runtime.

dataflow · on March 27, 2020

> You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

I still don't get this. If the damage is already done, then how is issuing the fallback going to change things? Again: I'm not arguing about whether discovery should be done or not. All I'm saying is, if the device says invalid opcode, you should use the fallback, whether or not there was any discovery that led you to use the initial opcode.

zaarn · on March 27, 2020

You don't know what state the drive is in anymore. The safest option is to reset the device entirely and start it back up again. If it comes back, you can use your fallback.

But it is much easier to rely on what is known to work instead of issuing potentially non-working commands to the point that there is no reason to have a fallback other than "rediscover what it supports".

I don't get why you would even want to use a fallback command on a drive that is in a potentially unknown or undefined state.

If discovery led to an invalid opcode the drive is faulty, end of story. The SAS and SATA standards are very clear on what is permitted and what is forbidden and that falls very far on the side of "not allowed".

dataflow · on March 27, 2020

Is this just a theoretical thing, or have there been actual drives that lied about invalid opcodes on a read and then proceeded to destroy the drive if you issued a fallback read? I have a hard time believing a hard drive would behave like a C compiler if I'm being honest...

zaarn · on March 27, 2020

As I mentioned earlier, there was a series of CDROM drives that upon receiving an unsupported command (and this was before you could discover it) would lead to all further data being interpreted as firmware data for an update and brick the device. If you issued a fallback read then the device would become bricked, if you reset the bus and reinitialized the device, everything was fine.

Discovery has of course improved this, so we know what a harddrive can and cannot do. Harddrives that lie about what they support shouldn't have the appropriate seals and trademarks of SATA or SAS on them, as they must be certified by those entities.

dataflow · on March 27, 2020

Oh wow, it would interpret every subsequent command as firmware data? I didn't realize that, that's completely nuts. Thanks for sharing that!

rodgerd · on March 27, 2020

Well, if it's a security concern, perhaps the code author should check the response code rather than blindly assuming it worked.

loeg · on March 26, 2020

No, it's indirect via whatever the operating system's disk abstraction is.