Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can already do that on most desktop GPU's (even going as far as prev gen Nv 1050/1060/1070 for example).

You'll need a model able to work with tools, like llama 3.2 (https://huggingface.co/meta-llama), serve it, hook up MCPs, include a STT interface, and you're cooking.



Even a bottom of the barrel N95 has audio acceleration features helping with speech to text, but the LLM inference part still will be far from being efficient.

Plus, you need to keep the card at "ready" state, you can't idle/standby it completely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: