Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The mere token prediction comment is wrong, but I don't think any of the other comments really explained why. Next token prediction is not what the AI does, but its goal. It's like saying soccer is a boring sport having only ever seen the final scores. The important thing about LLMs is that they can internally represent many different complex ideas efficiently and coherently! This makes them an incredible starting point for further training. Nowadays no LLM you interact with will be a pure next token predictor anymore, they will have all gone through various stages of RL, so that they actually do what we want them to do. I think I really feel the magic looking at the "circuit" work by Anthropic. It really shows that these models have some internal processing / thinking that is complex and clever.


> that they can internally represent many different complex ideas efficiently and coherently

The Transformer circuits[0] suggest that this representation is not coherent at all.

[0] https://transformer-circuits.pub


I guess that depends on what you think is coherent. A key finding is that the larger the network the more coherent the representation becomes. One example is larger networks merge the same concept across different languages into a single concept (as humans do). The addition circuits are also fairly easy to interpret.


> merge the same concept

It's doing compression which does not mean it's coherent.

> The addition circuits are also fairly easy to interpret.

The addition circuits make no sense whatsoever. It's doing great at guessing that's all.


I am curious, what would you count as coherent? I think it is absolutely insane that we can open and understand what are essentially alien intelligences at all!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: