> C programmers have more freedom to arrange data in memory to exploit locality
I think just about everyone here is underestimating the impact of this one "feature". Most optimization approaches are grounded in 1970s computer hardware, where CPU speed and memory speed were two aspects of the approach. That's not the case anymore... memory access times absolutely dominate on modern hardware. Just lookup the number of instruction cycles a cache miss takes for your favorite system. Its appalling.
C and C++ have been able to stay ahead of the curve because they allow strict control over memory layout. Ocaml is often brought up as a competitor, but storing a floating point value in ocaml requires a boxed pointer! Nevermind trying to make an aggregate structure that includes floats and other types bundled together... everything will get boxed and the cache never stands a chance.
C and C++ are the only fully current languages that let you bundle your data exactly as you need it in as small space and in appropriately sized chunks such that memory access doesn't grind your program down. Of course you can fail to take advantage of this ability and write slow code in C or C++; in which case you'll match the benchmarks for your other favorite languages and maybe make a blog post about it. That'd be missing the point though...
People don't realize what a power tool C/C++ is because they haven't had it out for a spin at that level. If you've done serious API work, and wondered why you're busy byte-packing, it's because there's some highly optimized code somewhere you're feeding. You can allocate a big honking chunk of memory and create your own world in there. Boxing, for all its goodness, is a performance killer.
I think just about everyone here is underestimating the impact of this one "feature". Most optimization approaches are grounded in 1970s computer hardware, where CPU speed and memory speed were two aspects of the approach. That's not the case anymore... memory access times absolutely dominate on modern hardware. Just lookup the number of instruction cycles a cache miss takes for your favorite system. Its appalling.
C and C++ have been able to stay ahead of the curve because they allow strict control over memory layout. Ocaml is often brought up as a competitor, but storing a floating point value in ocaml requires a boxed pointer! Nevermind trying to make an aggregate structure that includes floats and other types bundled together... everything will get boxed and the cache never stands a chance.
C and C++ are the only fully current languages that let you bundle your data exactly as you need it in as small space and in appropriately sized chunks such that memory access doesn't grind your program down. Of course you can fail to take advantage of this ability and write slow code in C or C++; in which case you'll match the benchmarks for your other favorite languages and maybe make a blog post about it. That'd be missing the point though...