Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been paying for GPT-4 since it came out and have used it extensively. It's clearly an iteration on the same thing and behaves in qualitatively the same way. The differences are just differences of degree.

It's not hard to get a feel for the "edges" of an LLM. You just need to come up with a sequence of related tasks of increasing complexity. A good one is to give it a simple program and ask what it outputs. Then progressively add complications to the program until it starts to fail to predict the output. You'll reliably find a point where it transitions from reliably getting it right to frequently getting it wrong, and doing so in a distinctly non-humanlike way that is consistent with the space of possible programs and outputs becoming too large for its approach of predicting tokens instead of forming and mentally "executing" a model of the code to work. The improvement between 3.5 and 4 in this is incremental: the boundary has moved a bit, but it's still there.



Most developers -- let alone humans -- I've met can't run trivial programs in their head successfully, let alone complex ones.

I've thrown crazy complicated problems at GPT 4 and had mixed results, but then again, I get mixed results from people too.

I've had it explain a multi-page SQL query I couldn't understand myself. I asked it to write doc-comments for spaghetti code that I wrote for a programming competition, and it spat out a comment for every function correctly. One particular function was unintelligible numeric operations on single-letter identifiers, and its true purpose could only be understood through seven levels of indirection! It figured it out.

The fact that we're debating the finer points of what it can and can't do is by itself staggering.

Imagine if next week you could buy a $20K Tesla bipedal home robot. I guarantee you then people would start arguing that it "can't really cook" because it couldn't cook them a Michelin star quality meal with nothing but stale ingredients, one pot, and a broken spatula.


"In a distinctly non-humanlike way". You can learn a lot about how a system works from how it fails and in this case it fails in a way consistent with the token-prediction approach we know it is using rather than the model-forming approach some are claiming has "emerged" from that. It doesn't show the performance on a marginally more complex example that you would expect from a human with the same performance on the slightly simpler one, which is precisely the point Rodney Brooks is making. It applies equally to GPT-3.5 and GPT-4.

But I didn't respond to debate the nature or merits of LLMs. It's been done to death and I wouldn't expect to change your mind. I'm just offering myself as a counterexample to your assertion that everyone (emphasis yours) that is unconvinced by some of the claims being made about LLM capabilities (I dislike your "sides" characterisation) is using GPT-3.5.


>"In a distinctly non-humanlike way".

Over the long term this is going to be a primary alignment problem of AI as it becomes more capable.

What is my reasoning behind that?

Because humans suck, or at least our constraints that we're presented with do. All your input systems to your brain are constantly behind 'now' and the vast majority of data you could input is getting dropped on the ground. For example if I'm making a robotic visual input system, it makes nearly zero sense for it to behave like human vision. Your 20/20 visual acuity area is tiny and only by moving your eyes around rapidly and then by your brain lying to you, do we have a high resolution view on the world.

And that is just an example of one of those weird human behaviors we know about. It's likely we'll find more of these shortcuts over time because AI won't take them.


What other system could possibly work? Even a faster system would still be slightly behind “reality”. Things must happen before they can be perceived.


My take-away is that your interaction with the OP has not changed your opinion about "everyone", expressed above:

>> What I and many others have noticed about the "Are LLMs really smart?" debate is that everyone on the "Nay" side is using 3.5 and everyone on the "Yay" side is using 4.0.

Sometimes there really is no point in trying to make curious conversation. Curiosity has left the building.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: