Expanding on your points, only AMD, SPARC and MIPS avoided Meltdown bugs, they w...

jdietrich · on May 25, 2019

Intel played fast-and-loose with correctness to eke out performance gains. They cut corners that they shouldn't have cut and now their customers are paying the price. Other companies cut similar corners, but Intel did it on a gratuitous scale.

rayiner · on May 26, 2019

Hard disagree. Correctness means conformance with the spec. The spec says, for example, you get a page fault if you access privileged memory. That’s what Intel does. The spec doesn’t make any guarantees beyond that.

The software stack is at fault for building their security model on assumptions about what the hardware did that aren’t guaranteed in the spec. The software assumes the existence of this magical isolation that the CPU never promised to provide.

panarky · on May 25, 2019

Companies cut corners all the time, but then pay the price later through refunds or lawsuits or reduced sales or higher costs.

But Intel cut corners and offloaded most of the downside onto their customers and partners while pocketing the upside for themselves.

_ugfj · on May 25, 2019

You have zero proof to that and your explanation also violates Hanlon's razor (Never attribute to malice that which is adequately explained by stupidity.)

atq2119 · on May 25, 2019

I don't think it was malice, but I think it is perfectly fair to say Intel was playing fast and loose here, and they deserve all the flak they're getting as a consequence.

Vulnerabilities like Meltdown and the latest "MDS" vulnerabilities are absolutely a design decision.

Think about Meltdown, for example. The problem here was that data from the L1$ memory was forwarded to subsequent instructions during speculative execution even when privilege / permission checks failed. Now the way that an L1$ has to be designed, the TLB lookup happens before reading data from L1$ memory (because you need to know the physical address in order to determine whether there was a cache hit and if so, which line of L1$ to read), and the TLB lookup gives you all the permission bits required to do the permission check.

Now you have a choice about what to do when the permission check fails. Either you read the L1$ memory anyway, forward the fact that the permissions check failed separately and rely on later logic to trigger an exception when the instruction would retire.

This is clearly what Intel did.

Or you can treat this situation more like a cache miss, don't read from L1$ and don't execute subsequent instructions. This seems to be what AMD have done and while it may be slightly more costly (why?), it can't be that much more expensive because you have to track the exception status either way.

The point is, at some place in the design a conscious choice was made to allow instructions to continue executing even though a permissions check has failed.

The MDS flaws seem similar, though it's a bit harder to judge what exactly was going on there since it's even deeper in microarchitectural details.

dfrage · on May 25, 2019

> This is clearly what Intel did.

And ARM, and IBM, both the POWER and mainframe/Z teams.

Although this design decision of Intel's was in the early 1990s, shipping in their first Pentium Pro in 1995. When AMD's competition was a souped up 486 targeting Intel's superscaler Pentium, although I assume they were also working on their first out-of-order K5 at the time, it first shipping in 1996.

_ugfj · on May 27, 2019

Nah, the K6 was not an AMD design, it was NexGen. (I had a Nx686 motherboard back in the day. I was young and wasted way too much money on exotic parts. I also had an Ark Logic video card about the same time, it might've been the Hercules Stingray Pro but my memory is a bit fuzzy after 25 years.)

https://en.wikipedia.org/wiki/AMD_K6

The AMD K6 is a superscalar P5 Pentium-class microprocessor, manufactured by AMD, which superseded the K5. The AMD K6 is based on the Nx686 microprocessor that NexGen was designing when it was acquired by AMD. Despite the name implying a design evolving from the K5, it is in fact a totally different design that was created by the NexGen team, including chief processor architect Greg Favor, and adapted after the AMD purchase.

dfrage · on May 27, 2019

That's interesting, and relevant in that the NexGen team was joined by DEC Alpha people to do the K7, but the K6 is not AMD's first out-of-order speculative execution x86 design, per Wikipedia the K5 was an internal project that "was based upon an internal highly parallel 29k RISC processor architecture with an x86 decoding front-end."

panarky · on May 26, 2019

> And ARM, and IBM, both the POWER and mainframe/Z teams

Is the point of your argument that others did it too, so Intel shouldn't be accountable?

Or that others did it too, so it's reasonable to believe that the vulns were impossible to prevent?

This argument sounds a lot like the "deflect" part of Facebook's strategy to "delay, deny, deflect".

dfrage · on May 26, 2019

I'm saying Intel's design decisions are found in so many other vendors, Spectre in all of them I've looked for, that it represents an industry blind spot. Which calls for different kinds of "accountability", unless you're a rent seeker.

cesarb · on May 26, 2019

> Now the way that an L1$ has to be designed, the TLB lookup happens before reading data from L1$ memory (because you need to know the physical address in order to determine whether there was a cache hit and if so, which line of L1$ to read)

With VIPT caches (which AFAIK both AMD and Intel use for the L1 data cache), the TLB lookup and the L1 cache lookup happen in parallel. The cache lookup uses the bits of the memory address corresponding to the offset within the page (which are unchanged by the translation from virtual to physical addresses) to know which line of the L1 cache to read. Only later, after both the TLB and the L1 cache have returned their results, is the L1 tag compared with the corresponding bits of the physical address returned by the TLB.

atq2119 · on May 26, 2019

This is only partially correct. A VIPT cache uses the offset into the page to know which set of the L1 cache to read. You still need the result of the TLB to know the line.

This is clear from the fact that a page is 4KB while the L1 data cache is 32KB, so the page offset cannot contain enough information to tell the processor which part of L1 to read.

tntn · on May 26, 2019

Still need the tag to know which way to read and whether there is a hit, so the statement is pretty much accurate.

lotsofpulp · on May 25, 2019

Hanlon’s razor is not an argument, it’s just a saying.

sanderjd · on May 26, 2019

Ok here's an "argument" version: I argue that it is more likely that these problems were due to taking the wrong side of trade offs based on incomplete knowledge, weak foresight, and poor judgement. I argue that there is little or no evidence that executives or engineers at Intel actively didn't care about their reputation or their customers.

(Somewhat ironically, I think "Hanlon's razor is just a saying, not an argument" is just a saying, not an argument.)

pfortuny · on May 25, 2019

Hanlon’s razor only applies to individual persons, never to corporations or groups.

Charity (which Hanlon’s razor is a part of) only applies to individuals.

jholman · on May 25, 2019

> Hanlon’s razor only applies to individual persons, never to corporations or groups.

Why?

Do you think that the only reason for applying Hanlon's razor is some sort of moral principle (that only applies to individuals)?

I think the reason to apply Hanlon's razor is that stupidity is far more widespread than malice, and so that should be my prior. This is a purely epistemic argument, with no moral component.

I also think that larger groups always contain more stupidity than smaller groups, and I think it grows super-linearly. On the other hand, the effect of group size on malice (and on benevolence) is very complicated and unpredictable. Do you disagree with one of these beliefs?

If you agree that groups are almost always stupider than people but only sometimes more malicious, I think it's clear that I should be at least as eager to apply the razor to groups as to people.

In conclusion, an apposite quotation:

> Moloch! Nightmare of Moloch! Moloch the loveless!

klingonopera · on May 25, 2019

Agreed. A big corporation working on products for a prolonged period of time has a different level of consciousness to their work than a single individual.

Take the 737 Max as an example: It's absolutely malice, because the level of incompetency you'd have to have to let a plane into the air that can tilt all the way down by means of a single(!) faulty sensor would disqualify anyone from ever building a plane.

An aviation problem such as that is still comparatively easy for a layman to understand. When it comes to CPU-microarchitectures, I'm not so sure. But I trust they have professionals designing their chips, and my default is to assume they are competent and that malice has occured, and rather they'd have to prove the opposite.