DoReFa needs a custom scaling factor for each image when quantizing gradients.

deepnotderp · on Feb 11, 2017

What's wrong with that?

p1esk · on Feb 12, 2017

Their formula for quantizing gradients involves quite a bit of extra computation. It's not clear to me how much it complicates an actual implementation (in software, or hardware). To me, the most interesting question is if we can do training using just 8 bits (weights, activations, and gradients), without all that acrobatics. If so, then we can get another significant (and free!) speed up from GPUs.