Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DPO is pretty good as well.

I think that the '7b beating 70b' is mostly due to the fact that Mistral is likely trained on considerably more tokens than Chinchilla optimal. So is llama-70b, but not to the same degree.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: