Yup. That's *exactly* what language models represent internally; that's what the...

mjburgess · 2025-05-28T17:43:24 1748454204

Sure, they're a reification of some aspect of meaning. The question is: which aspect(s), and which not.

It is also the case that animals do not "reliably and univerally" implement all aspects of all meanings they are acquainted with, so we aren't looking for 100% of capacities, 100% of the time.

Nevertheless, LLMs are only implementing a limited aspect of meaning: mostly association and "some extension". And with this, plus everything ever written, they can narrowly appear to implement much more.

Let's be clear though, when we say "implement" we mean that an answer arises from a prompt for a very specific reason: because the answer is meant by the system in the relevant way. In this sense, LLMs can mean any association, perhaps they can mean a few extensions, but they cannot mean anything else.

Whenver an LLM appears to partake in more aspects of meaning it is only cheating: it is using familiarity with families of associations to overcome its disabilities.

Like the idiot savant who appears to know all hollywood starlets, but is discovered eventually, not to realise they are all film stars. We routinely discover these disabilities in LLMs, when they attempt to engage in reasoning beyond these (formally,) narrow contexts of use.

Agentic AI is a very good "on steroids" version of this. Just try to use windsurf, and the brittle edges of this trick appear quickly. It's "reasononing" whenver it seems to work, and "hallucination" when not -- but of course, it just never was reasoning.

TeMPOraL · 2025-05-28T18:27:31 1748456851

> LLMs are only implementing a limited aspect of meaning: mostly association and "some extension".

> Whenver an LLM appears to partake in more aspects of meaning it is only cheating: it is using familiarity with families of associations to overcome its disabilities.

I'm not convinced there's anything more to "meaning" - we seem to be defining concepts through relationship to other concepts, and ground that directly or indirectly with experiences. The richer that structure is, the more nuanced it gets.

> Like the idiot savant who appears to know all hollywood starlets, but is discovered eventually, not to realise they are all film stars. We routinely discover these disabilities in LLMs, when they attempt to engage in reasoning beyond these (formally,) narrow contexts of use.

I see those as limitations of degree, not kind. Less idiot savant, more like someone being hurried to answer questions on the spot. Some associations are stronger and come to mind immediately, some are less "fresh in memory", and then associations can bring false positives and it takes extra time/effort to notice and correct those. It's a common human experience, too. "Yes, those reserved words are 'void', 'var', 'volatile',... wait, 'var' is JS stuff, it's not reserved in C..." etc.

Then, of course, humans are learning continuously, and - perhaps more importantly - even if they're not learning, they're reinforcing (or attenuating) existing associations through bringing them up and observing feedback. LLMs can't do that on-line, but that's an engineering limitation, not a theoretical one.

I'm not claiming that LLMs are equivalent to humans in general sense. Just that they seem to be implementing the fundamental machinery behind "meaning" and "understanding" in general sense, and the theoretical structure behind it is quite pretty, and looks to me like a solution to a host of philosophical problems around meaning and language.

mjburgess · 2025-05-28T19:24:36 1748460276

Is this based on analogising LLMs to animal mental capacities, or based on a scientific study of these capacities? ie., is this confirmation bias, or science?

One can always find a kind of confirmation bias analysis here, which "saves the appearances", ie., one can always say "take a measurement set of people's mental capacities, given in their linguistic behaviour" and find such behaviours apparent in LLMs. This will always be possible for the obvious reason that LLMs are trained on human linguistic practice.

This makes "linguistic measurement" of LLMs especially deceptive. Consider the analogous case of measuring a video games by it's pixels: does it really have a "3d space" ? No. It only appears to. We know that pixel-space measurements of video games are necessarily deceptive, because we constructed them that way, so it is obvious that you cannot "walk into a tv".

Yet we did not construct the mechanism of deception in LLMs, making seeing thru the failure of "linguistic measurement" apparently somewhat harder. But I imagine this is just a matter of time -- in particular, when LLM's mechanisms are fully traced, it will be more obvious that their outputs are not generated for the reasons we suppose. That the "reason to linguistic output" mapping we use on people is deceptive as applied to LLMs. Just as a screenshot of a video game is a deceptive measure, whereas a photograph isnt. For a photograph, the reason the mountain is small is because its far away; for a screenshot, it isnt: there is no mountain, it is not far away from the camera, there is no camera.

In the case of LLMs we know they cannot mean what they say. We know that if an LLM offers a report on new york it cannot mean what a person who has travelled to new york means. The LLM is drawing on an arrangment, in token space, of tokens placed there by people who have been to new york. This arrangement is like the "rasterization" of a video game: it places pixels as-if there were 3d. You could say, then, that an LLM's response is a kind of rasterization of meaning.

And just as with a video game, there are failures, eg., clipping through "solid" objects. LLMs do not genuinely compose concepts, because they have no concpets -- they can only act as if they are composing them, so long as a token-space measurement of composition is available in the weight-space of the model. (And so on...)

The failures of LLMs to have these capacities will be apparent after awhile, at the moment we're on the hype rollercoaster, and its not yeet peaked. At the moment, people are still using the "reason-lingusitic" mapping theyve learned from human communication on LLMs, to impart the relevant menetal states they would with people. The boundaries of the failure of this mapping isnt yet clear to everyone. Users don't yet avoid "clipping thru" objects, beause they can't understand what clipping is -- at the moment, many seem to be desperate to say that if a video game object is clipped thru, it must be designed to be hollow.

In any case, as i've said in many places in this thread (which you can see from my recent comment history) -- there are a large variety of mental capacities associated with apprehending meaning that LLMs lack. But the process is anti-inductive so it will take quite awhile: for all those who are finding the fragile boundaries ("clipping thru the terrian") new models come out with invisible walls.

TeMPOraL · 2025-05-28T22:05:09 1748469909

> Is this based on analogising LLMs to animal mental capacities, or based on a scientific study of these capacities? ie., is this confirmation bias, or science?

- On how embeddings work;

- On the observation that in very high-dimensional space you can encode a lot of information in relative arrangement of things;

- On the observation that the end result (LLMs) are too good at talking and responding like people in nuanced way for this to be uncorrelated;

- On noticing similarities behind embeddings in high-dimensional spaces and what we arrive when we try to express what we mean by "concept", "understanding" and "meaning", or even how we learn languages and acquire knowledge - there's a strong undertone of defining things in terms of similarity to other things, which themselves are defined the same way (recursively). Naively, it sounds like infinite regress, but it's exactly what embeddings are about.

- On the observation that the goal function for language model training is, effectively, "produce output that makes sense to humans", in fully general meaning of that statement. Given constraints on size and compute, this is pressuring the model to develop structures that are at least functionally equivalent to our own thinking process; even if we're not there yet, we're definitely pushing the models in that direction.

- On the observation that most of the failure modes of LLMs also happen to humans, up to and including "hallucinations" - but they mostly happen at the "inner monologue" / "train of thought" level, and we do extra things (like explicit "system 2" reasoning, or tools) to fix them before we write, speak or act.

- And finally, on the fact that researchers have been dissecting and studying inner workings of LLMs, and managed to find direct evidence of them encoding concepts and using them in reasoning; see e.g. the couple major Anthropic studies, in which they demonstrated the ability to identify concrete concepts, follow their "activations" during inference process, and even control the inference outcome by actively suppressing or amplifying those activations; the results are basically what you'd expect if you believed the "concepts" inside LLMs were indeed concepts as we understand them.

- Plus a bunch of other related observations and introspections, including but not limited to paying close attention to how my own kids (currently 6yo, 4yo and 1.5yo) develop their cognitive skills, and what are their failure modes. I used to joke that GPT-4 is effectively a 4yo that memorized half the Internet, after I noticed that stories produced by LLMs of that time and those of my own kid follow eerily similar patterns, up to and including what happens when the beginning falls out of the context window. I estimated that at 4yo, my eldest daughter had a context window of about 30s long, and I could see it grow with each passing week :).

That's in a gist, what adds up to my current perspective on LLMs. Might not be hard science, but I find a lot of things pointing in the direction of us narrowing down on the core functionality that also exists in our brain (but not the whole thing, obviously) - and very little that would point otherwise.

(I actively worry that it might be my mental model is too "wishy washy" and lets me interpret anything in a way that fits it. So far, I haven't noticed any warning signs, but I did notice that none of the quirks or failure modes feel surprising.)

--

I'm not sure if I got your videogame analogy the way you intended, but FWIW, we also learn and experience lots of stuff indirectly; the whole point of language and communication is to transfer understanding this way - and a lot of information is embodied in the larger patterns and structures of what we say (or don't say) and how we say it. LLM training data is not random, it's highly correlated with human experience, so the information for general understanding of how we think and perceive the world is encoded there, implicitly, and at least in theory the training process will pick up on it.

--

I don't have a firm opinion on some of the specifics you mention, just couple general heuristic/insights that tell me it could be possible we narrowed down on the actual thing our own minds are doing:

1. We don't know what drives our own mental processes either. It might be we discover LLMs are "cheating", but we might also discover they're converging to the same mechanisms/structures our own minds use. I don't have any strong reason to assume the former over the latter, because we're not designing LLMs to cheat.

2. Human brains are evolved, not designed. They're also the dumbest possible design evolution could arrive at - we're the first to cross the threshold after which our knowledge-based technological evolution outpaced natural evolution by orders of magnitude. All we've achieved to date, we did with a brain that was the nature's first prototype that worked.

3. Given the way evolution works - small, random, greedy increments that have to be incrementally useful at every step - it stands to reason that whatever the fundamental workings of a mind are, they must not be that complicated, and they can be built up incrementally through greedy optimization. Humans are a living proof of that.

4. (most speculative) It's unlikely there are multiple alternative implementations of thinking minds that are very different from each other, yet all are equally easy to reach through random walk, and that evolution just picked one of those and run with it. It's more likely that, when we get to that point (we might already be there), we'll find the same computational design nature did. But even if not, diffing ours and nature's solution will tell us much about ourselves.

mjburgess · 2025-05-29T12:18:05 1748521085

> On the observation that most of the failure modes of LLMs also happen to human

That's assuming that LLMs operate according to how we read their text. What you're doing is reading llm chain-of-thought as-if said by a human, and imparting the implied capacities that would be implied if a human said it. But this is almost certainly not how LLMs work.

LLMs are replaying "linguisitc behaivour" which we take, often accurately, to be dispositive of mental states in people. They are not evidence of mental capacities and states in LLMs, for seemingly obvious reasons. When a person says, "I am hungry" it is, in verdical cases, caused by their hunger. When an LLM says it the cause is something like, "responding appropriately, accoring to a history of appropriate use of such words, on the occasion of a prompt which would, in ordinary historical cases, give this response".

The reason an LLM generates a text prima fascie never involves any associated capacities which would have been required for that text to have been written in the first place. Overcoming this leap of logic requires vastly more than "it seems to me".

> On how embeddings work

The space of necessary capacities is no exhausted by "embedding", by which you mean a (weakly) continuous mapping of historical exemplars into a space. Eg., logical relationships, composition, recursion, etc. are not mental capacities which can be implemented this way.

> We don't know what drives our own mental processes either.

Sure we do. At the level of enumerating mental capacities, their operation and so on, we can give very exhaustive lists. We do not know how even the most basic of these is implemented biologically, save I believe, we can say quite a lot about how properties of complex biological systems generically enable this.

But we have a lot of extremely carefully designed experiments to show the existence of relevant capacities in other animals. None of these experiments can be used on an LLM, because by design, any experiment we would run would immediately reveal the facade: any measurement of the GPU running the LLM and its environmental behaviour shows a total empirical lack of anything which could be experimentally measured.

We are, by the charaltan's design, only supposed to use token-in/token-out as "measuremnt". But this isn't a valid measure, becuase LLMs are constructed on historical cases of linguistic behaviour in people. We know, prior to any experiment, that the one thing designed to be a false measure, is the lingustic behaviour of the LLM.

Its as if we have constructed a digital thermometer to always replay historical temperature readings -- we know, by design, that these "readings" are therefore never indicative of any actual capacity of the device to measure temperature.