I think you meant Intel Compiler? Yes. Intel Compiler consistently produces the ...

jedbrown · on Aug 28, 2020

This is an overstatement. ICC consistently compiles the slowest and produces the largest binaries. It also defaults to something close to -ffast-math, which may or may not be appropriate. If your app benefits from aggressive inlining and vectorization at the expense of potentially huge increases in code size, ICC is likely to do well for you. However, I've seen lots of cases where well-vectorized code is faster with GCC or Clang, including some very important cases using Intel intrinsics. (Several such cases reported to/acknowledged by Intel; some have been fixed over the years, but these observations are not uncommon.)

BLIS is used by AMD and is a good open alternative to MKL (for BLAS) across many platforms. https://github.com/flame/blis/blob/master/docs/Performance.m...

outworlder · on Aug 28, 2020

I have been hearing about the superiority of Intel's compiler for a couple of decades now. Back when GCC was a tiny baby compared to what it is now, and when Clang/LLVM didn't even exist.

I wonder if this Intel compiler 'superiority' is still the case today, or if this is just a meme at this point.

jarvist · on Aug 28, 2020

For matrix manipulation based Fortran scientific codes, ifort/MKL can give +30% compared to gfortran. It's difficult to disentangle where the speedup comes from, but certainly as jedbrown aludes to, the Intel compilers seem to make a better go of poorly optimised / badly written code.

For C based software, its a much closer run thing, and often sticking with GCC avoids weird segfaults when mixing Intel and GCC-compiled Linux libraries.

gnufx · on Aug 29, 2020

> This is an overstatement.

To be generous...

Where do you typically see lack of inlining and vectorization with GCC? I'm curious because most times people have said GCC wouldn't vectorize code that I've been able to try, it would, at least if allowed -ffast-math a la Intel (as in BLIS now).

kzrdude · on Aug 29, 2020

Can you explain "BLIS is used by AMD"? In what way do they use it?

jedbrown · on Aug 29, 2020

It's their official BLAS [1] since 2015 when they moved away from their proprietary ACML implementation [2].

[1]https://developer.amd.com/amd-aocl/blas-library/

[2] https://developer.amd.com/open-source-strikes-again-accelera...

gnufx · on Aug 29, 2020

Amusingly, OpenBLAS significantly beat the bought-in ACML, on DGEMM, over the six(?) generations of Opteron I had available. AMD learnt.

nzmlPA · on Aug 29, 2020

The fact that MKL is the highest performing library has nothing to do with the quality of icc's output.

It is a myth that icc produces faster binaries that may have been true 25 years ago.

shmerl · on Aug 28, 2020

So what are they compiled with for non Intel processors?

petschge · on Aug 28, 2020

The HPC codes I worked on we would compile with gcc, clang, icc and whatever vendor compiler was installed (Cray, PGI, something even worse). Then we'd benchmark the resulting binaries and make a recommendation based on speed. (Assuming the compiled binaries gave the correct results, which would sometimes fail and trigger further debugging to find out if we had undefined behaviors (or implementation defined) or had managed to find a compiler bug. For codes that are memory-bandwidth dominated the results are pretty much a toss-up. For compute bound codes intel would often win.

You can do the same when your machine has non-intel CPUs that are supported by a lot of compilers. If you are on power9 or arm the compiler list gets shorter. And a lot of supercomputers start to contain accelerators (often, but not always Nvidia GPUs) in which case there is often only one supported compiler and you have to rewrite code until that compiler is happy and produces fast code.