I want to jump in and correct your usage of "LLaMA Laws" (even you are using it ...

npsomaratna · on Aug 17, 2023

Isn't the Chinchilla estimate considered to be wrong now?

https://espadrine.github.io/blog/posts/chinchilla-s-death.ht...

FanaHOVA · on Aug 16, 2023

Yep, +1. That's why I used the quotes. :) Thanks for expanding!

arugulum · on Aug 16, 2023

Yep I understood that you were using it informally, just trying to keep things informative for other folks reading too.

swyx · on Aug 16, 2023

there frankly needs to be a paper calling this out tho, because at this point there are a bunch of industry models following “llama laws” and nobody’s really done the research, its all monkey see monkey do

arugulum · on Aug 16, 2023

But what would they be calling out?

If industry groups want to run a training run based on the configurations of a well-performing model, I don't see anything wrong with that. Now, if they were to claim that what they are doing is somehow "optimal", then there would be something to criticize.

swyx · on Aug 16, 2023

poor choice of words, i probably mean sketching out the curves/doing ablation studies in a comprehensive way like the chinchilla paper did.

arugulum · on Aug 16, 2023

Makes sense! But expensive...