Are you using GPT-4? If not, it's understandable. If you don't pay for ChatGPT, ...

coldtea · on Aug 25, 2023

I'm using GTP-4 and it makes trivial errors all the time.

I asked it (actual names changed):

"I run the Linux command line program "foo". When I use the flags -xyz, I get results, but when I use -txyz I get nothing. What could this mean?"

And it told me: "The lack of results is because you didn't use the -t flag".

Or I ask it some very basic music theory questions and it gets stuff wrong all the time, giving impossible answers.

dontupvoteme · on Aug 25, 2023

They definitely nerfed the hell out of GPT4 via the webUI at least.

Do you have API access? the old model there still gives me very good results.

coldtea · on Aug 25, 2023

I also have API access, but this was from the web

dontupvoteme · on Aug 25, 2023

The web interface seems to be the one they notoriously nerf the most. Far fewer complaints vs the API - though it is faster and more convenient.

I'd kill to have an easy, wont-get-me-banned-way to submit a query to both the UI and API at the same time and show the results in, say, meld or so.

dustymcp · on Aug 25, 2023

it really does, it came up with multiple wrong dockerfiles for me yesterday, but it seems to correct it when you tell it

avereveard · on Aug 25, 2023

gpt4 gets thing wrong as well, especially as soon as you are out of a well beaten path. I tried writing code with brain off and gpt4 on, and the terraform code was mostly right but didn't work, python code for imports of recent libraries (llama-cpp-smth) were a complete fabrication, even if I gave the ai a documentation before hand, and we went in cycles around a problem for which it kept giving me the same solution and resulted in the same error (around python multiprocess, which is very picky aroud nested parallelism and method import)

andrepd · on Aug 25, 2023

Why is it always the same reply? Yes, GPT4 is as useless as GPT3.5 on any non trivial task.

Kiro · on Aug 25, 2023

> Why is it always the same reply?

When there's one definitive answer to something that people keep repeating there's a slight chance that it's actually true. Shocking, I know.

andrepd · on Aug 25, 2023

Well but it's not. Also, your reasoning would imply e.g. Qanon is true based on the number of people repeating the same thing, so that's pretty weak.

jstummbillig · on Aug 25, 2023

Well. You would definitely have to very carefully select a very, VERY narrow slice of society, to get a piece where Qanon supporters make up a significant percentage of people.

But hey, if you are really looking to convince yourself of something, I have no doubt that it can be done.

sebzim4500 · on Aug 25, 2023

>Why is it always the same reply?

I keep going around telling people that 1+1=3, why do they always give me the same nonsense about the number '2'?

I blame Sam Altman.

andrepd · on Aug 25, 2023

Kiro · on Aug 25, 2023

The point is that people keep repeating it because it's true. Why are you being so obnoxious about it? Anyone who has used GPT-3.5 vs GPT-4 knows this and it's just ridiculous to claim otherwise.

andrepd · on Aug 25, 2023

1. I've used it, and I don't think so.

2. Obnoxious? This is my opinion, I don't get what Sam Altman or 1+1=3 is supposed to mean in this context.

dontupvoteme · on Aug 25, 2023

The WebUI was definitely nerfed but the API still seems ok.

input_sh · on Aug 25, 2023

My guess is people feel the need to self-justify their $20/month subscription.

aenis · on Aug 25, 2023

My guess is most people into AI don't even remember that they are paying a $20/month for this.

We do a lot of experiments involving gpt3.5, 4, claude-v2, titan-large, and palm2, and for what it's worth, on our real production workloads gpt4 shines. We can make Palm2 produce decent results with a lot of extra effort, and claude-v2 is passable but gpt4 does not disappoint. This is low-grade knowledge management stuff, and we are not using it as a information-retrieval system - but for basic 'cognitive' tasks where all the information needed is provided in the prompt. I'd not rely on it for info retrieval tasks such as the examples quoted above - its knowledge base is highly compressed, after all.