Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t think it’s undermining the effort and improvement, but usability of these models aren’t usually what their benchmarks suggest.

Last time there was a hype about GLM coding model, I tested it with some coding tasks and it wasn’t usable when comparing with Sonnet or GPT-5

I hope this one is different





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: