You're getting at a deep point of disagreement - should we expect a modern or near-future LLM to be limited by the intelligence of the people who generated its training data? I don't think anyone claims to have a provably correct answer. There's one intuition that says yes (why should it be impossible to make new insights from data collected by people who didn't have those insights?) and another that says no (how can a statistical average of N people's most likely responses be smarter than any of those N?)