15k is not the full training corpus. The model is trained on huge swaths of inte...

15k is not the full training corpus. The model is trained on huge swaths of internet text. 15k is just the fine-tuning corpus to show it how to follow instructions. Stuff like world capitals and such are already present in the model weights due to being trained on tons of internet text.

With the raw LLM, you can get the capital of Mongolia with the prompt "The capital of Mongolia is", i.e. text completion. The fine-tuning allows you to get at that information by asking questions or giving commands, e.g. "Tell me the capital of Mongolia"