Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The slowest part is loading weights into vram in my experience. I haven't done benchmarking on that. What kind of benchmark would you like to see?


I would like to see time to first inference for typical models (llama-7b first token, SDXL 1 step, etc)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: