I'm not doing mod 10 operations, I'm printing left to right. I also put each digit in a case: and the digit printing is part of a larger state machine transmitting various things. No characters are stored, one of these blocks was run whenever the UART was ready to receive a new character. I didn't want to do the entire conversion to string all at once and this is a very fast way to extract digits in order.
Your “d = ((uint32_t)n * 53687 + 8208) >> 29;” is no faster than “d = n / 10000;” Modern compilers already apply these multiply-shift tricks when dividing by a constant.
I was writing for an ARM based micro controller. The compiler may well have been GCC. I should have timed it vs the obvious use of division. I was not aware that they would do such things for division by arbitrary large numbers. If that's all so, then why is TFA worried about the speed of itoa?
> I was not aware that they would do such things for division by arbitrary large numbers.
They don’t, just 10000 is not large enough :-) But yeah, apparently 32-bit 10k division works on GCC ARM as well: https://godbolt.org/g/ebJqrw
> why is TFA worried about the speed of itoa?
Because itoa is slow.
I once needed to read / write large (100MB+) G-code files. Profiler showed me it spent majority of time in standard library routines like itoa / atof / etc.