Because it's not inherent to the system itself. For example there's nothing inhe...

zarzavat · on April 12, 2023

Perhaps a bad example although I agree with the general argument.

There are large datasets of calculations specifically for training large language models. It’s not just picking it up from reading books. And even then these models suck at calculating, half the time they just make an answer up.

Calculation appears to be something that is actually not an emergent property of constant-time token prediction. Which we already knew from Turing anyway.

cowl · on April 12, 2023

in fact their example of https://github.com/google/BIG-bench/tree/main/bigbench/bench... showed 0 emergent behavior in understating modified arithmetic even in large LLMs. the accuracy (and lack of it) in unmodified arithmetic is a simple token replacement heuristic that comes naturally from the transformer model of "attention".

golol · on April 12, 2023

This is wrong. I asked GPT-4 right now to perform this task and it got 3/3 3-digit calculations correct, and on a 4-digit calculation was off by 10 (not 1!). And note that artihmetic is seen as a weak spot of LLMs, it is not a good example to attack the claim that they have emergent properties.

_vbnz · on April 12, 2023

Nice find, I think we'll get there eventually though, as models can hold more state.

famouswaffles · on April 12, 2023

We are already there.

100% accuracy on up to 13 digit addition can be taught to 3.5 as is.

https://arxiv.org/abs/2211.09066

And 4 has little need for such out the box

cowl · on April 12, 2023

> in this work, we identify and study four key stages for successfully teaching algorithmic reasoning to LLMs: (1) formulating algorithms as skills, (2) teaching multiple skills simultaneously (skill accumulation), (3) teaching how to combine skills (skill composition) and (4) teaching how to use skills as tools.

So it's not an emergent property of LLM but 4 new capability trainings. Noone is saying you can't teach these things to an Agent, just that these are not emergent abilities of the LLM training. by default a LLM can only match token proximity, all trainings of LLMs improve the proximity matching (clustering) of token but they do not teach algorithmic reasoning. it needs to get bolted on as addon.

famouswaffles · on April 12, 2023

No it doesn't need to be bolted on. GPT-4 can add straight out of the box, no need for any education. Where the model hadn't implicitly figured out the algorithm of addition in 3.5, it has in 4.

cowl · on April 12, 2023

Maybe but since we can't by definition know what is present in the model we can not define any behavior as emergent as opposed to simply trained. Suppose you don't know anything about our school system and you observe that 12-graders know calculus whereas 3-rd graders do not. In their definition of emergent Calculus is an emergent ability in 12 graders because it was not present in 3-rd graders. ofc we know that it is not an emergent ability but result of new expanded mental model trained on bigger corpus of knowledge.

pulvinar · on April 12, 2023

So, guess they should be saying "trainable" instead of "emergent". Still a useful benchmark of course.

To be truly emergent in your sense it seems an LLM would have to make a new discovery, i.e., have scientist-level intelligence. That bar keeps moving up.

cowl · on April 12, 2023

not neccessary a new discovery just a new behaviour for which it was not trained on. See https://en.wikipedia.org/wiki/Emergence a classic example is structure of a flock of starlings in flight or a school of fish, the flock and the school of fish move in an emergent behaviour that is not observed on single (or few) fish or starlings.

Something like this may well yet emerge if a new AI agent learns how to combine the properties of a LLM with an algorithmic approach, fact-checking or a general reasoning engine. But for that we are still waiting for another breakthrough to combine these isles into one (without bolting them manually on each other)

tlb · on April 12, 2023

Yes, in some sense the recent LLM results show us how big the Chinese room has to be: enough for 10^11 parameters. If the index cards Penrose imagined could hold one column of a matrix, it's around 10^8 cards. It's not surprising that people had poor intuitions for what was possible with that level of complexity.

fenomas · on April 12, 2023

> I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that..

Turing wrote that in 1950, so I suppose his intuition was within a couple of orders of magnitude...

Traubenfuchs · on April 12, 2023

> For example there's nothing inherent to token prediction to make it capable of doing arithmetic

...aren't we just witnessing proof that this is wrong? Apparently those abilities are inherent to token prediction models with sufficient parameter size.