Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The crucial point, however, is that while FP multiplication is faster than integer division, converting between a floating point and an integer is very slow: cvtsi2ss and cvtss2si have a latency of 6 cycles each. This adds a latency of 12 cycles for each of these multiplications.

For divisions by a constant value that don't easily decompose into shifts you can fall back to multiplication by a magic constant which is the integer reciprocal. (This is also something compilers do and is what's being explained in the article.)



Multiplying by the integer reciprocal only works if the dividend is an integer multiple of the divisor.

What's being explained in the article is multiplying by a fraction the value of which is close to the rational reciprocal of the divisor, and where the denominator of the fraction is an integer power of two (so dividing by the denominator can be done with a shift).

The fraction in this case is (2938661835 + 2^32) / 2^37.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: