DPO is pretty good as well. I think that the '7b beating 70b' is mostly due to t...

		whimsicalism on Dec 10, 2023 \| parent \| context \| favorite \| on: Mistral "Mixtral" 8x7B 32k model [magnet] DPO is pretty good as well. I think that the '7b beating 70b' is mostly due to the fact that Mistral is likely trained on considerably more tokens than Chinchilla optimal. So is llama-70b, but not to the same degree.