But it's an apples-to-oranges comparison, between functions that skip whitespace, validate the input, and report error conditions, and a function that assumes that it has exactly N digits and does not need to check. Code that doesn't have to solve the most expensive part of the real problem can run much faster.
It would have been more interesting to try to optimize something closer to the problem the library routines are solving: skip leading whitespace, error if the first character isn't a digit, accumulate digits until a non-digit is reached. Can the article writer beat the standard functions? By how much?
Optimization is often about specializing, and therefore avoiding work you don't happen to need in your particular case. The article acknowledges that it's giving up some flexibility. It could maybe have been louder about it.
This would be different if we were benchmarking to compare libraries or hardware or languages or whatnot, or even calling the new implementation "better" without qualifiers.
Does all that stuff really wind up costing meaningful performance?
If you say:
while (i++)
if ( notWhitespace(chars[i]) && isDigit(chars[i]) )
doWork(chars[i])
speculative execution will give you near full performance when the input is clean, as long as you didn't write it in a way that invites mispredicts. You'll only suffer the cost of checking when your input is not clean.
That’s more elaborate than BM_Naive from the article, which takes over 10 times the time of their last version.
Reason? the next to last iteration of the code grabs 8 characters in an unsigned long, parses them, grabs 8 more characters, parses them, multiplies the first number by 100,000,000, and adds it to the second.
So, it does 2 iterations per 16-digit number, not 16.
The last result further improves on that by grabbing all 16 digits in one go.
Absolutely. Even with no other considerations, you're spending more ALU operations per character in that validation than the naive solution does for parsing. That probably won't drastically affect latency for a single parse if you don't have branch mispredictions and whatnot, but I'd be shocked if it didn't reduce throughput by 2x even for validation as simple as that.
Yes, so an effort to write an optimized version and benchmark it should include such checks (plus, of course, checking for reaching the end of the input).
Yeah. If you have that much control over your input and parsing is your bottleneck, you might as well use a binary format, mmap the file, cast the returned pointer to your data structure, and boom: zero time spent on parsing.
It would have been more interesting to try to optimize something closer to the problem the library routines are solving: skip leading whitespace, error if the first character isn't a digit, accumulate digits until a non-digit is reached. Can the article writer beat the standard functions? By how much?