It would be interesting to see if these abilities were to go away if you subject...

It would be interesting to see if these abilities were to go away if you subjected the large model to drop-outs as you continued training, until it were reduced to the size of the small model.

I think once an "ability" is learned by the model, it is useful to help compress information, and is more likely than not (>50%) to be retained.