Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AFAIK, the main bottleneck on training is memory bandwidth. Distributed gpu compute has multiple orders of magnitude less than an equivalent number of GPUs colocated, because they don’t share a physical bus, but have a network connection instead. This work improves on that, but the fundamental limitations remain.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: