The claim is not that they are fundamentally different or similar, the claim is that one doesn't need that much data to get instruction-following behavior from a raw autoregressive LLM. K-shot prompting shows that the capability to follow instructions is present in the model. It's just a matter of using fine-tuning to keep the model in that frame all the time without a K-shot prompt.