Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It seems to me that the real question here is what is true human intelligence.

IMHO the main weakness with LLMs is they can’t really reason. They can statistically guess their way to an answer - and they do so surprisingly well I will have to admit - but they can’t really “check” themselves to ensure what they are outputting makes any sense like humans do (most of the time) - hence the hallucinations.



Apparently GPT-4 is getting pretty good at knowing when it's wrong: https://thezvi.substack.com/p/ai-26-fine-tuning-time#%C2%A7g...

(They asked GPT-3.5 and GPT-4 "are you sure" to see if it would change its answer, both when the original answer was right, and when it was wrong)


Does it do that because it can check it’s own reasoning? Or is it just doing so because OpenAI programmed it to not show alternative answers if the probability of the current answer being right is significantly higher than the alternatives?


I don't know. I don't think anyone is directly programming GPT-4 to behave in any way, they're just training it to give the responses they want, and it learns. Something inside it seems to be figuring out some way of representing confidence in its own answers, and reacting in the appropriate way, or perhaps it is checking its own reasoning. I don't think anyone really knows at this point.


As the other poster said, they can check themselves but this requires an iterative process where the output is fed back in as input. Think of LLMs as the output of a human's stream of consciousness: it is intelligent, but has a high chance of being riddled with errors. That's why we iterate on our first thoughts to refine them.


Why do we have to feed the input back? Why doesn’t it do it itself?

Maybe it’s because it can’t tell when it’s wrong and needs to “try again”, and we have to do it for them.


Because that's how LLM chatbots are designed. Those papers describe systems where the review process is automated for better results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: