> They were predicted to end the software engineering profession for almost four...

solumunus · on Feb 2, 2025

I use LLM’s daily so I’m no skeptic. We are not seeing enormous improvements every 6 months, that’s hyperbolic. There has been a significant improvement since GPT 3.5, I’ll give you that, but even in those ~2 years I don’t think I’d describe the improvement as “enormous”. The capabilities are similar with output quality improving by a noticeable degree.

pera · on Feb 2, 2025

OpenAI API for GPT-3 was launched on June 11, 2020, that's four years and seven months ago:

https://news.ycombinator.com/item?id=23489653

ben_w · on Feb 2, 2025

I used that API. It was literally autocomplete — if you wanted it to answer a question, you had to start with, say:

-

Translate into German:

Q: "Good Morning" A: "Guten Morgen"

Q: "<the thing you actually wanted to translate>

-

And even then it might answer with "Good Morning" in five different languages.

InstructGPT is what turned GPT-3 into the ChatGPT-3.5 model.

pera · on Feb 2, 2025

Note that GP said "since GPT 3" but the parent responded with "ChatGPT". My response was to clarify the timeframe that has elapsed since GPT-3.

fragmede · on Feb 2, 2025

GPT-2 was released November, 2019, so that's five years ago. GPT-3 isn't the advancement to look at, but ChatGPT, which didn't use GPT-3, but a newer RLHF'd model, based off GPT-3, has the 2022 launch date.

https://news.ycombinator.com/item?id=21454273

kubb · on Feb 2, 2025

And what has enormously improved since ChatGPTs launch? Maybe you should ask it what it "thinks" about the hype surrounding it.

M4v3R · on Feb 2, 2025

If you don’t see the difference in quality of responses between GPT-3.5 as it launched in 2022 and o1/o3 then I don’t know what to tell you. I am using these models daily and the difference is night and day.

ben_w · on Feb 2, 2025

In addition to passing bar exam[0], improved performance on medical questions[1], economics questions that experts thought it was years away from[2], all the other things marked in green on page 6 were just the changes from 3.5 to 4: https://arxiv.org/pdf/2303.08774

4o added image analysis.

The o-series starting at o1 improves on 4o as per the margins in these charts: https://openai.com/index/learning-to-reason-with-llms/

I'll have to wait and see about o3, because only the mini model is out yet.

[0] https://law.stanford.edu/2023/04/19/gpt-4-passes-the-bar-exa...

[1] https://ai.nejm.org/doi/full/10.1056/AIdbp2300192

[2] https://www.betonit.ai/p/gpt-4-takes-a-new-midterm-and-gets

kubb · on Feb 2, 2025

At this point just paste my comments into ChatGPT and ask it to explain to you what I mean by them. Then paste your response and ask it why it's not addressing the point made. At least use the tool for what it's good for.

ben_w · on Feb 2, 2025

So you're saying that it understands you better than I do?

I get that feeling too (in both directions) but this vague and hard to quantify sensation is not what I'd suggest in response to your clearly stated question:

> And what has enormously improved since ChatGPTs launch?

Which is, I think, answered by the things I listed.

kubb · on Feb 3, 2025

It doesn’t understand me, but it could help you understand. What you listed aren’t major unexpected leaps but incremental improvements on things that already were known to be possible.

But you insist on being obstinate. ChatGPT advised me to disengage from this conversation.

habinero · on Feb 3, 2025

This is highly misleading.

ChatGPT did not ace the bar exam -- it was basically percentile graded against a group of people who mostly failed. If compared to real lawyers, it was 15th percentile on the essay portion

[0] https://law-ai.org/re-evaluating-gpt-4s-bar-exam-performance...

ben_w · on Feb 3, 2025

I said pass, not ace.

15th percentile of passes, on the weakest aspect, is still a big improvement over "not passing". That improvement is what I wish to highlight.

(The observation that 48th percentile (lowest overall from your link, let alone 15th for essays) of passes corresponds to 90th percentile of all exam takers, suggests that perhaps too many humans are taking the exams before they're ready).

TeMPOraL · on Feb 2, 2025

ChatGPT launched with GPT-3.5. We're now at o1 and o3-mini and DeepSeek-R1, but even in the last year with GPT-4 and GPT-4o, it became better than almost everything involving text than average human. It writes better than average person, faster and cheaper. It parses unstructured data better than average person. There's large number of everyday tasks for which it's perfectly reasonable today to just throw them at ChatGPT. That's all last 1 to 1.5 years.

kubb · on Feb 2, 2025

Oh, we have new letter-number combinations now. That is amazing. I stand corrected.

andrewchambers · on Feb 2, 2025

If you haven't tried using them then I am not sure your opinion on them is any good.

kubb · on Feb 2, 2025

I dare say I'm more familiar with the capabilities of the leading models than certain big tech CEOs are, at least judging by their publicly communicated opinions.

anon22981 · on Feb 2, 2025

I use the 4o very often in my work and it mostly sucks. Sometimes it’s very good, sometimes it has nice knowledge that was faster to find from it than a search engine. Mostly it spouts out unhelpful noise (for my problems).

I’m sure if you need to make a to-do list in react it’s like magic (until the app gets complicated). In real world use, not so much.

(Also I have often code reviewed PRs from people who are heavy users and surprise surprise - their output is trash and very prone to bugs or being out of spec.)

TuxSH · on Feb 3, 2025

I also think 4o sucks, but have you tried DeepSeek R1 (free on their website)? I thought it night and day between 4o and o3-mini on the following topics:

- reverse engineering: when fed assembly (or decomp or mock impl), it's been consistently been able to figure out what the function actually does/why it's there from a high-level perspective. Whereas ChatGPT merely states the obvious

- very technical C++ questions: DSR1 gives much more detailed answers, with bullet points and examples. Much better writing style. Slightly prone to hallucinations, but not that much

- any controversial topic: ChatGPT models are trained to avoid these because of its "safety" training

ChatGPT is a bit better (and faster) at writing simple code and doing some math faster, but that's it.

(obviously, common sense about what to share and not to share with these chatbots still apply, etc.)

TeMPOraL · on Feb 3, 2025

You can access DeepSeek R1? For me, both chat and API have been down for over a week now (it shut down minutes after I topped up my account and generated an API key - I never got to use it :/).

There's lots of fiddling with these models. I found Claude 3.5 Sonnet to be superior to both GPT-4o and o1-preview in around 99% of the things I do; I only started comparing it against o3-mini, and right now it's a mixed bag. Then again, I tend to develop and refine specific prompts for Sonnet, which I haven't for o1-preview and o3-mini, so that could be a factor. Etc.

TuxSH · on Feb 3, 2025

> You can access DeepSeek R1?

Yes, well, I live in the EU and thus can avoid US work hours and Chinese peak hours. I think availability has been a bit better since they disabled websearch (also I noticed DSR1 half a week before it made the mainstream news).

> There's lots of fiddling with these models.

Agreed

TeMPOraL · on Feb 3, 2025

I live in the EU too. For me, the status page[0] shows a continuous API outage for the past 8 days, that is still ongoing. Since it started, my API requests bounce back with an error, which changes seemingly at random between "unauthorized" and "insufficient balance". Neither of those reasons are valid, since I'm using a valid API key I made after creating an account, which I topped up with $20 (and have an invoice from them to prove it). I must have had a mightily bad luck that the service went down soon after I generated the API key - I'm guessing my user/key is currently stuck in the middle of some migration, or possibly wasn't captured in a backup and got subsequently wiped. For now, I'll just patiently wait for them to fix their service.

--

[0] - https://status.deepseek.com/

TuxSH · on Feb 3, 2025

AFAIK it's hosted on Chutes for free too (though limited to between 2k and 10k output tokens). Azure as well, though it might be ratelimited there (or at least it is through openrouter)

mmcnl · on Feb 2, 2025

This is a good question. According to some the growth is exponential. Others think ChatGPT is basically still the same as it was at the end of 2022, minor differences aside. Why are the perspectives so different?

kubb · on Feb 2, 2025

One of these archetypes is drowning in a hype-fueled news cycle, they mistake speculation for inevitability, dismiss skepticism as ignorance, and construct a reality where the technology's success is unquestionable.

The other is simply using the technology for what it's good for, observing that it's slowly, incrementally improving at tasks that it was already capable of since the major breakthrough, and acknowledging its limitations.

Incremental improvements don't give us any assurance that another major breakthrough is waiting around the corner.

Capricorn2481 · on Feb 2, 2025

> to the point we can expect enormous improvements in capabilities every six months or so

Not really, we just can see we've had improvements. That is not evidence of upcoming improvement.