Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You'll be able to run 72B models w/ large context, lightly quantized with decent'ish performance, like 20-25 tok/sec. The best of the bunch are maybe 90% of a Claude 3.5.

If you need to do some work offline, or for some reason the place you work blocks access to cloud providers, it's not a bad way to go, really. Note that if you're on battery, heavy LLM use can kill your battery in an hour.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: