The slowest part is loading weights into vram in my experience. I haven't done b... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		xena on Feb 14, 2024 \| parent \| context \| favorite \| on: Fly.io has GPUs now The slowest part is loading weights into vram in my experience. I haven't done benchmarking on that. What kind of benchmark would you like to see?

ipsum2 on Feb 14, 2024 [–]

I would like to see time to first inference for typical models (llama-7b first token, SDXL 1 step, etc)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact