LLMs are trained, as others have mentioned, first to just learn the language at all costs. Ingest any and all strings of text generated by humans until you can learn how to generate text in a way that is indistinguishable.
As a happy side effect, this language you've now learned happens to embed quite a few statements of fact and examples of high-quality logical reasoning, but crucially, the language itself isn't a representation of reality or of good reasoning. It isn't meant to be. It's a way to store and communicate arbitrary ideas, which may be wrong or bad or both. Thus, the problem for these researchers now becomes how do we tease out and surface the parts of the model that can produce factually accurate and reasonable statements and dampen everything else?
Animal learning isn't like this. We don't require language at all to represent and reason about reality. We have multimodal sensory experience and direct interaction with the physical world, not just recorded images or writing about the world, from the beginning. Whatever it is humans do, I think we at least innately understand that language isn't truth or reason. It's just a way to encode arbitrary information.
Some way or another, we all grok that there is a hierarchy of evidence or even what evidence is and isn't in the first place. Going into the backyard to find where your dog left the ball or reading a physics textbook is fundamentally a different form of learning than reading the Odyssey or the published manifesto of a mass murderer. We're still "learning" in the sense that our brains now contain more information than they did before, but we know some of these things are representations of reality and some are not. We have access to the world beyond the shadows in the cave.
Humans can carve the world up into domains with a fixed set of rules and then do symbolic reasoning within it. LLMs can't see to do this in a formal way at all -- they just occasionally get it right when the domain happens to be encoded in their language learning.
You can't feed an LLM a formal language grammar (e.g. SQL) then have it only generate results with valid syntax.
It's awfully confusing to me that people think current LLMs (or multi-modal models etc) are "close" to AGI (for whatever various definitions of all those words you want to use) when they can't do real symbolic reasoning.
Though I'm not an expert and happy to be corrected...
Adult humans can do symbolic reasoning, but lower mammals cannot. Even ones that share most of our brain structure are much worse at this, if they can do it at all; children need to learn it, along with a lot of the other things that we consider a natural part of human intelligence.
That all points towards symbolic reasoning being a pretty small algorithmic discovery compared to the general ability to pattern match and do fuzzy lookups, transformations, and retrievals against a memory bank. It's not like our architecture is so special that we burned most of our evolutionary history selecting for these abilities, they're very recent innovations, and thus must be relatively simple, given the existence of the core set of abilities that our close ancestors have.
The thing about transformers is that obviously they're not the end of the line, there are some things they really can't do in their current form (though it's a smaller set than people tend to think, which is why the Gary Marcuses of the world always backpedal like crazy and retcon their previous statements as each new release does things that they previously said were impossible). But they are a proof of concept showing that just about the simplest architecture that you could propose that might be able to generate language in a reasonable way (beyond N-gram sampling) can, in fact, do it really, really well even if all you do is scale it up, and even the simplest next-token prediction as a goal leads to much higher level abilities than you would expect. That was the hard core of the problem, building a flexible pattern mimic that can be easily trained, and it turns out to get us way further along the line to AGI than I suspect anyone working on it ever expected it would without major additions and changes to the design. Now it's probably time to start adding bits and bobs and addressing some of the shortcomings (e.g. static nature of the network, lack of online learning, the fact that chains of thought shouldn't be constrained to token sequences, addressing tokenization itself, etc), but IMO the engine at the heart of the current systems is so impressively capable that the remaining work is going to be less of an Einstein moment and more of an elbow grease and engineering grind.
We may not be close in the "2 years of known work" sense, but we're certainly not far in the "we have no idea how to prove the Riemann Hypothesis" sense anymore, where major unknown breakthroughs are still required which might be 50+ years away, or the problem might even be unsolvable.
LLMs are trained, as others have mentioned, first to just learn the language at all costs. Ingest any and all strings of text generated by humans until you can learn how to generate text in a way that is indistinguishable.
As a happy side effect, this language you've now learned happens to embed quite a few statements of fact and examples of high-quality logical reasoning, but crucially, the language itself isn't a representation of reality or of good reasoning. It isn't meant to be. It's a way to store and communicate arbitrary ideas, which may be wrong or bad or both. Thus, the problem for these researchers now becomes how do we tease out and surface the parts of the model that can produce factually accurate and reasonable statements and dampen everything else?
Animal learning isn't like this. We don't require language at all to represent and reason about reality. We have multimodal sensory experience and direct interaction with the physical world, not just recorded images or writing about the world, from the beginning. Whatever it is humans do, I think we at least innately understand that language isn't truth or reason. It's just a way to encode arbitrary information.
Some way or another, we all grok that there is a hierarchy of evidence or even what evidence is and isn't in the first place. Going into the backyard to find where your dog left the ball or reading a physics textbook is fundamentally a different form of learning than reading the Odyssey or the published manifesto of a mass murderer. We're still "learning" in the sense that our brains now contain more information than they did before, but we know some of these things are representations of reality and some are not. We have access to the world beyond the shadows in the cave.