Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It would be interesting to see if these abilities were to go away if you subjected the large model to drop-outs as you continued training, until it were reduced to the size of the small model.

I think once an "ability" is learned by the model, it is useful to help compress information, and is more likely than not (>50%) to be retained.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: