Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anytime I ask these things something (bard, gpt etc), 33% of the answer is genius, 33% misleading garbage, 33% filler stuff that’s neither here or there

The problem is distinguishing between these parts requires me to be be an expert in the area I’m inquiring about - and then why the heck do I need to ask some idiot bot for answers to questions that I already know an answer to?

I don’t know who finds these things useful and more importantly blowing smoke up everyone’s collective rear, especially medias.



Bard is a brain-damaged-but-literate idiot compared to GPT 3, which is still dumber than the typical human.

Try GPT 4 for a week.

I've found it to be more like 50% immediately useful, 25% very impressive, and 25% where it's not wrong but I have to poke it a few times with different prompts to coax out the specific answer I'm looking for.

That's better than most humans that I collaborate with at work.

Literally half of humans -- in a professional IT setting -- can't understand simplified, clear english in emails. Similarly, in my experience about half can't follow simple A -> B logic. Many are perpetually perplexed that prerequisites need to precede the work, not be a footnote in the post-mortem of the predictable failure. Etc...

PS: That last sentence is too hard for several English-native speakers I work with to parse. Seriously. I'm not even exaggerating the tiniest bit. I've had coworkers fail to understand words like "orthogonal" or "vanilla" in a sentence. Vanilla!

In my estimation, Chat GPT 4 is already smarter than many people, certainly the bottom 25% of the human population.

LLMs are a real existential threat to those people in their current state. A few more years of improvement, and they'll be displacing the bottom 50% in workplaces, easily.


> I’ve had coworkers fail to understand words like “orthogonal” or “vanilla” in a sentence. Vanilla!

Presumably, you are referring to the idiomatic use of vanilla, which is probably a less-universal idiom than you think it is (it is of fairly recent origin in wide use, derives from a specific American cultural loading of the literal vanilla flavor) and which, even when the general idiom is understood, can rely on a deeply shared understanding of what is the basic default in the referenced context to actually be understand as to its contextual meaning.


>I've found it to be more like 50% immediately useful, 25% very impressive, and 25% where it's not wrong but I have to poke it a few times with different prompts to coax out the specific answer I'm looking for.

That could tell us more about your questions than GPT's capabilities.


> Literally half of humans -- in a professional IT setting -- can't understand simplified, clear english in emails. Similarly, in my experience about half can't follow simple A -> B logic.

There are alternate hypotheses.

People have preferences. When it appears that someone does not understand something, they may be pretending they don't understand it, or they may simply be ignoring it. Maybe they are trying to avoid an unpleasant task, or maybe they find dealing with a specific person unpleasant and not worth the effort.

In my experience, people are far more capable and competent when they feel comfortable and are interested in the task.


> In my experience, people are far more capable and competent when they feel comfortable and are interested in the task.

That's definitely true, but in my experience people have limits: simple biological ones. Repetitive tasks make practically all humans bored, for example.

The fact that AIs never get sleepy, distracted, or bored already makes them super-human in at least that one aspect. That they have essentially perfect English comprehension, and hence aren't phased by the use of jargon or technical language, puts them head-and-shoulders above most humans.

The frustrations I'm venting aren't some rare thing. I'm working on a technical team where the project manager doesn't understand what the team members are saying. This is not just a matter of syntax, or jargon. They just don't understand the concepts. This is so common in the wider industry that I'm pleasantly surprised, shocked even, when I come across a PM that can ask useful questions instead of needing endless corrections along the lines of: "It's spelled SQL, not Sequel." I've never met a PM that could do simple arithmetic, like "10 TB at 100 MB/s will take over a day to copy, we should plan for that!". Never.

I've tested Chat GPT 4 on both language and concepts that I've seen trip up PMs, and it understood "well enough" every time.

For example, GPT 4: The sentence "We deployed sequel server successfully last night" seems incorrect due to the incorrect naming of a product. "Sequel server" should actually be "SQL Server", a popular relational database management system (RDBMS) developed by Microsoft. Therefore, the corrected sentence should be: "We deployed SQL Server successfully last night."

PS: If you tell GPT 4 to pretend it is a technical project manager and instruct it to ask followup questions, it is noticeably better at this than any PM I have worked with in the last few years.


The particular problem with work is, people that are commonly promoted to tasks that are neither interested nor comfortable with the task they have been given.

You've instead moved the task from general human capability to one of management alignment with worker capability and human statistical probability. This is something that human management has been failing at for about forever, especially as team size gets large. Maybe we'll see AI 'management' align humans to tasks better, or more likely as time and LLM capability progresses, we'll just see the average AI capability increase over the average worker capability and companies will just depend on unreliable meat less.


Not to be rude, but are you paid by OpenAI to say this?

The amount of comments here telling people to “upgrade to ChatGPT 4” is absolutely unprecedented.

I know it might be good, but people will find value in it and upgrade if they see the need to do so?


I am in no shape, way, or form affiliated with OpenAI or any other AI company.

What I and many others have noticed about the "Are LLMs really smart?" debate is that everyone on the "Nay" side is using 3.5 and everyone on the "Yay" side is using 4.0.

The naming and the versioning implies that GPT 4 is somehow slightly better than 3.5, like not even a "full +1" better, just "+0.5" better. (This goes to show how trivial it is to trick "mere" humans and their primitive meat brains.)

Similarly, all pre-4 LLMs including not just the older ChatGPT variants, but Bard, Vicuna, etc... are all very clearly and obviously sub-par, making glaring mistakes regularly. Hence, people generalise and assume GPT 4 must be more of the same.

For the last few weeks, across many forums, every time someone has said "AIs can't do X" I have put X into ChatGPT 4 and it could do it, with only a very few exceptions.

The unfortunate thing is that there is no free trial for GPT 4, and the version on Bing doesn't seem to be quite the same. (It's probably too restricted by a very long system prompt.)

So no, people won't form their own opinions, at least not yet, because they can't do so without paying for access.


I've been paying for GPT-4 since it came out and have used it extensively. It's clearly an iteration on the same thing and behaves in qualitatively the same way. The differences are just differences of degree.

It's not hard to get a feel for the "edges" of an LLM. You just need to come up with a sequence of related tasks of increasing complexity. A good one is to give it a simple program and ask what it outputs. Then progressively add complications to the program until it starts to fail to predict the output. You'll reliably find a point where it transitions from reliably getting it right to frequently getting it wrong, and doing so in a distinctly non-humanlike way that is consistent with the space of possible programs and outputs becoming too large for its approach of predicting tokens instead of forming and mentally "executing" a model of the code to work. The improvement between 3.5 and 4 in this is incremental: the boundary has moved a bit, but it's still there.


Most developers -- let alone humans -- I've met can't run trivial programs in their head successfully, let alone complex ones.

I've thrown crazy complicated problems at GPT 4 and had mixed results, but then again, I get mixed results from people too.

I've had it explain a multi-page SQL query I couldn't understand myself. I asked it to write doc-comments for spaghetti code that I wrote for a programming competition, and it spat out a comment for every function correctly. One particular function was unintelligible numeric operations on single-letter identifiers, and its true purpose could only be understood through seven levels of indirection! It figured it out.

The fact that we're debating the finer points of what it can and can't do is by itself staggering.

Imagine if next week you could buy a $20K Tesla bipedal home robot. I guarantee you then people would start arguing that it "can't really cook" because it couldn't cook them a Michelin star quality meal with nothing but stale ingredients, one pot, and a broken spatula.


"In a distinctly non-humanlike way". You can learn a lot about how a system works from how it fails and in this case it fails in a way consistent with the token-prediction approach we know it is using rather than the model-forming approach some are claiming has "emerged" from that. It doesn't show the performance on a marginally more complex example that you would expect from a human with the same performance on the slightly simpler one, which is precisely the point Rodney Brooks is making. It applies equally to GPT-3.5 and GPT-4.

But I didn't respond to debate the nature or merits of LLMs. It's been done to death and I wouldn't expect to change your mind. I'm just offering myself as a counterexample to your assertion that everyone (emphasis yours) that is unconvinced by some of the claims being made about LLM capabilities (I dislike your "sides" characterisation) is using GPT-3.5.


>"In a distinctly non-humanlike way".

Over the long term this is going to be a primary alignment problem of AI as it becomes more capable.

What is my reasoning behind that?

Because humans suck, or at least our constraints that we're presented with do. All your input systems to your brain are constantly behind 'now' and the vast majority of data you could input is getting dropped on the ground. For example if I'm making a robotic visual input system, it makes nearly zero sense for it to behave like human vision. Your 20/20 visual acuity area is tiny and only by moving your eyes around rapidly and then by your brain lying to you, do we have a high resolution view on the world.

And that is just an example of one of those weird human behaviors we know about. It's likely we'll find more of these shortcuts over time because AI won't take them.


What other system could possibly work? Even a faster system would still be slightly behind “reality”. Things must happen before they can be perceived.


My take-away is that your interaction with the OP has not changed your opinion about "everyone", expressed above:

>> What I and many others have noticed about the "Are LLMs really smart?" debate is that everyone on the "Nay" side is using 3.5 and everyone on the "Yay" side is using 4.0.

Sometimes there really is no point in trying to make curious conversation. Curiosity has left the building.


So no, people won't form their own opinions, at least not yet, because they can't do so without paying for access. People will pay for access is they find it valuable enough.

I work with people who use it, I've not seen anything impressive enough come from them to make me want to pay for it so I don't. I've also screen shared because I was curious what all the fuss was about. What I saw that pissed me off was that they've stopped contributing to our internal libraries and just generate everything now. I found that kind of disturbing. It's not the products fault but it's the kind of thing I imagined would start happening.

I'm glad you like it, I just don't know why people feel the need to sell it so hard.


If you used GPT-4 well enough, you would know, at this point OpenAI does not need to pay any human to engage in online conversation, aside from legal reasons if any.

I personally created some content-creating bots with GPT-4, and it succeeded to a level that I don't trust anything I see online anymore. It does a better job than me, which doesn't say much because I am an engineer not a content creator. But still, I could get same results as one with a script that I made GPT-4 write itself.

...Yes, I am losing sleep over GPT-4's performance. If you are not losing sleep over it yet, you haven't really given it a genuine try yet.


Yeah I think the use cares for these things are far narrower than the boosters & hype cycle think.

If you could have unlimited interns for $0 (let's pretend it doesn't cost tons and tons of compute) that don't shutup, hallucinate & lie, and also do good work, in varying degrees.. how many would you want?

These things are probably going to be great for lots of blackout - propaganda, political marketing, flooding the zone with BS of unlimited iterations of messaging. Basically things that can be A/B tested to death, where veracity is of zero importance, and you have near limitless shots on goal to keep iterating.


> The problem is distinguishing between these parts requires me to be be an expert in the area I’m inquiring about - and then why the heck do I need to ask some idiot bot for answers to questions that I already know an answer to?

Because it can be significantly faster to check something for correctness than to produce it?

More so when the correctness check can itself be automated to some extent.


> ecause it can be significantly faster to check something for correctness than to produce it

Erm no There are many things that cant be easily checked especially if you dont know the topic well


Youre right but I think the point still stands. Many times it really is easier to verify than produce. Compilable code being a good example.


I dunno about you, but I spend a LOT of my time fixing code that was broken in some subtle way the compiler didn't catch

If we're talking about generating and integrating sample code, it's great at that

Anything more advanced and it's a footgun


I have very quickly picked up the habit of pasting snippets of my code into GPT-4 and simply asking "Why is this not working?" Almost every time, it succinctly explains the apparent purpose of the code, and how it is subtly wrong.

It's so good that I often do this preemptively to avoid a compile/deploy/test cycle.


Indeed, but I'm not claiming it's always faster to check than to produce. Simply that when it is the case, using GPT-4 can be worth it. It is to me - I use it daily.


My experience with Github copilot is that the time it saves me typing out boilerplate has been more than lost when I have to spend time carefully debugging bugs in the code it produces. And those are the bugs I catch right away.

I expect this will improve but it's certainly not always the case that checking something is cheaper or easier than generating it in the first place.


I was torn on Copilot - it seemed like it was saving me time, but I found myself getting way more value out of just copy / pasting code into GPT4.

So I decided to stop my copilot subscription and just see how I go without it.

I've been off copilot for a few days now and other than having to do more code lookups it's not a terrible experience not having it. It does feel like something that should be baked into the IDE for free though.


Multiple choice tests are easier than fill in the blank precisely because it's easier to recognize when something is correct than it is to regurgitate a fact from thin air.

You don't have to be an expert to recognize when ChatGPT is providing useful information. There's a middle ground between expert and novice where ChatGPT provides real value. Its the times where you would know the answer if you saw it, but can quite remember it off the top of your head.


Remember when the Internet was new and no-one believed anything on it?

Then, learning what to believe became a marketable skill for many people?

Then society fundamentally changed because not everyone learned that skill?

This is just that again. Gen Z will joke about their millennial/Gen X bosses believing anything the AI tells them and it will probably lead to some sort of mainstream conspiracy that Jackie O herself is running it or something (to those reading: please don't take this idea)


> Remember when the Internet was new and no-one believed anything on it?

Is this true?


Yes. "Anyone can write anything on there! There's no way to trust it." About halfway through high school they started letting us cite websites, and it was a big deal and considered very forward thinking.


I remember my parents and teachers indicating this, and me and my friends thinking “then why do they trust TV, books, etc.?”


This is true; it was a almost a meme before memes existed, as to say "I saw it on the internet so it must be true!", as a way to teasingly highlight how non-credible it was seen at the time.


"Kilroy was here" was a meme before "meme" was coined. I don't think the phrase "before memes existed" has meaning.


Sure. Maybe better if I qualified it with "as they exist as we know them today"


On the internet no one knew you were a dog according to the Net Yorker cartoon


yes. I remember boomers shitting on Wikipedia around 2004, but fast forward to present day and the same people would cite it.


Yeah, I don’t think a machine that generates novel genius ideas 1 out of 3 times is useful either. Creating a new idea is exactly as hard as curating them.


What novel genius ideas has the machine created so far?


From today, it lets a hobbyist create better home automation than trillion dollar FAANG companies can provide:

https://www.atomic14.com/2023/05/14/is-this-the-future-of-ho...

The novelty is asking the machine to use its own genius to do the right thing.


Sarcasm?


I personally find value because it saves me time. I’ll ask chat-gpt to write something slightly more complex than boilerplate code for me based on some requirements. Because I’m an “expert” I can read/run the code and message back improvements and tweaks until to arrives at something satisfactory. It certainly doesn’t always produce correct code from the start and I often encounter syntax errors or code which doesn’t work. However, I’ve found it pretty good at remedying those issues when I describe what’s wrong.

It produces something which is a good enough starting place. Sure, I could have written the code myself because I already know how. But I’ve found it saves me time and require minimal effort.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: