For example there's nothing inherent to token prediction to make it capable of doing arithmetic and you wouldn't expect a small/briefly trained model to manage it. But if learning a very large model on a huge dataset leads to it "learning" the decimal number system and arithmetic via token prediction then that is an emergent capability.
Perhaps a bad example although I agree with the general argument.
There are large datasets of calculations specifically for training large language models. It’s not just picking it up from reading books. And even then these models suck at calculating, half the time they just make an answer up.
Calculation appears to be something that is actually not an emergent property of constant-time token prediction. Which we already knew from Turing anyway.
in fact their example of https://github.com/google/BIG-bench/tree/main/bigbench/bench... showed 0 emergent behavior in understating modified arithmetic even in large LLMs. the accuracy (and lack of it) in unmodified arithmetic is a simple token replacement heuristic that comes naturally from the transformer model of "attention".
This is wrong.
I asked GPT-4 right now to perform this task and it got 3/3 3-digit calculations correct, and on a 4-digit calculation was off by 10 (not 1!).
And note that artihmetic is seen as a weak spot of LLMs, it is not a good example to attack the claim that they have emergent properties.
> in this work, we identify and study four key stages for successfully teaching algorithmic reasoning to LLMs: (1) formulating algorithms as skills, (2) teaching multiple skills simultaneously (skill accumulation), (3) teaching how to combine skills (skill composition) and (4) teaching how to use skills as tools.
So it's not an emergent property of LLM but 4 new capability trainings.
Noone is saying you can't teach these things to an Agent, just that these are not emergent abilities of the LLM training. by default a LLM can only match token proximity, all trainings of LLMs improve the proximity matching (clustering) of token but they do not teach algorithmic reasoning. it needs to get bolted on as addon.
No it doesn't need to be bolted on. GPT-4 can add straight out of the box, no need for any education. Where the model hadn't implicitly figured out the algorithm of addition in 3.5, it has in 4.
Maybe but since we can't by definition know what is present in the model we can not define any behavior as emergent as opposed to simply trained.
Suppose you don't know anything about our school system and you observe that 12-graders know calculus whereas 3-rd graders do not. In their definition of emergent Calculus is an emergent ability in 12 graders because it was not present in 3-rd graders. ofc we know that it is not an emergent ability but result of new expanded mental model trained on bigger corpus of knowledge.
So, guess they should be saying "trainable" instead of "emergent". Still a useful benchmark of course.
To be truly emergent in your sense it seems an LLM would have to make a new discovery, i.e., have scientist-level intelligence. That bar keeps moving up.
not neccessary a new discovery just a new behaviour for which it was not trained on.
See https://en.wikipedia.org/wiki/Emergence
a classic example is structure of a flock of starlings in flight or a school of fish, the flock and the school of fish move in an emergent behaviour that is not observed on single (or few) fish or starlings.
Something like this may well yet emerge if a new AI agent learns how to combine the properties of a LLM with an algorithmic approach, fact-checking or a general reasoning engine. But for that we are still waiting for another breakthrough to combine these isles into one (without bolting them manually on each other)
Yes, in some sense the recent LLM results show us how big the Chinese room has to be: enough for 10^11 parameters. If the index cards Penrose imagined could hold one column of a matrix, it's around 10^8 cards. It's not surprising that people had poor intuitions for what was possible with that level of complexity.
> I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that..
Turing wrote that in 1950, so I suppose his intuition was within a couple of orders of magnitude...
> For example there's nothing inherent to token prediction to make it capable of doing arithmetic
...aren't we just witnessing proof that this is wrong? Apparently those abilities are inherent to token prediction models with sufficient parameter size.
For example there's nothing inherent to token prediction to make it capable of doing arithmetic and you wouldn't expect a small/briefly trained model to manage it. But if learning a very large model on a huge dataset leads to it "learning" the decimal number system and arithmetic via token prediction then that is an emergent capability.
It's just like the Chinese Room thought experiment - https://en.wikipedia.org/wiki/Chinese_room#Chinese_room_thou...