Hacker Newsnew | past | comments | ask | show | jobs | submit | raincole's commentslogin

Probably zero. At the end of the day people pay for LLMs that write better code or summarize PDFs of hundreds of pages faster, not the ones that can count the letter r's better.

When LLMs can't count r's: see? LLMs can't think. Hoax!

When LLMs count r's: see? They patched and benchmark-maxxed. Hoax!

You just can't reason with the anti-LLM group.


Whenever an "LLM fail" goes viral like the car wash question, you can observe the exact same wording of the question get "fixed" within a week or so. With slight variations in phrasing still able to replicate the problem.

Followed by lots of "works perfectly for me, why are people even talking about this?"

I can't say what exactly they're doing behind the scenes but it's a consistent pattern among the big SOTA model providers. With obvious incentive to "fix" the problem so users will then organically "debunk" the meme as they try it themselves and share their experiences.


You are misremembering. There’s no patch. All these examples used the instant model.

The same non-argument could be said for all kinds of cheating on benchmarks by tech companies and yet we have tons of documented example of them caught with pants down.

>You just can't reason with the anti-LLM group.

On the contrary, the reasoning is simple and consistent:

LLMs can't count r's shows that LLM don't actually think the way we understand thought (since nobody with the kind of high skills they have in other areas would fail that). And because of that, there are (likely) patches for commonly reported cases, since it's a race to IPO and benchmark-maxxing is very much conceivable.


Ofcom is a disgrace to humanity. I guess this is an early signal that they plan to control how people use YouTube.

UK voters get what they vote for.

Literally untrue as we don't have proportional representation.

So Ofcom publishes a factual report with the positively dull actual title of "Passive social media use, AI companionship, and online side hustles: UK adults’ media and online lives revealed"

A report that makes zero comment on controlling how people use YouTube or any other social media.

And yet somehow they are "a disgrace to humanity"

Really?


I think your disgrace-level calibration needs adjusting given everything else that's going on rn buddy.

A valuable lesson AI taught me is how bad articles on Bloomberg and Forbes are. They probably have always been this bad, but I were unaware of that until they started writing about AI (because, admittedly, I subconsciously thought well-known = good).

There’s something called the Gell-Mann amnesia effect where people often see what you have but then go back to assuming the other stories are all reliable.

I used to love Private Eye and they have done great journalism that’s highly acclaimed, but the only thing they wrote that I really knew about (literally the office I was in) was outrageously wrong and would have been so easy to verify (ask literally anyone in the BBC building we were in to go to that floor, or take a tour or write an email). Can’t read it any more.


Here's Wikipedia's entry on the Gell-Mann Amnesia Effect, because I've found it a very useful concept to know. Despite my media experiences, I still keep falling for it. And I love that we're still referring to it as Gell-Mann Amnesia here:

https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amn...

In a speech in 2002, Crichton coined the term "Gell-Mann amnesia effect" to describe the phenomenon of experts reading articles within their fields of expertise and finding them to be error-ridden and full of misunderstanding, but seemingly forgetting those experiences when reading articles in the same publications written on topics outside of their fields of expertise, which they believe to be credible. He explained that he had chosen the name ironically, because he had once discussed the effect with physicist Murray Gell-Mann, "and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have".


> "and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have".

Ahh, yes, the SyneRyder effect.


Everything I've known anything about first hand has been utterly garbled - or was completely made up - when written up in Private Eye.

Odd take, as this was actually a pretty good article. The GP appears to be mostly bemoaning the fact that it's targeted at a lay audience.

Claude Code (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.

In less than four years the AI coding workflow has been overhauled at least twice: from Chat interface (ChatGPT) to editor integration (Cursor), then to CLI agent harnesses (CC/Codex). It would be crazy to assume that harnesses are the end of evolution.


> Everyone knows that.

Except, apparently, Anthropic - who are doing their darndest to get everyone onboard their tools as a moat. Apparently that's the only strategy to AI stickiness.


And their strategy kind of worked, right? CC is the most popular agentic coding tool. Anthropic faces competition from OpenAI (potentially better model, weaker TUI tool) and from the rest (potentially worse models, weaker TUIs). So their strategy is to develop both: make their closed model and closed tool better than competition so that when people want to vibceode they will choose their ecosystem.

OpenAI Codex is a much higher quality harness than Claude Code or OpenCode, and available as open source.

Claude Code 2.0 (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.

Claude Code 3.0 (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.

And so on and on and on.

A promise of AI was mature software


If there is a market for a mature harness, surely it will be built right?

I don't know what your point is. What you said is exactly what I expect to happen, except they might have a more creative name than "Claude Code 2.0".

My point is that the mature version will always be the next future version.

What changed is you, the reader. In 2026 we treat the smallest signs as evidence of LLM writing. Too long? LLM. Too short? LLM. Too grammatically correct? Must be LLM.

For me it was the "it's not x"/"it's y" stuff and some other structures Claude is very fond of using all the time. Perhaps humans are starting to write like LLMs!

Perhaps, just perhaps, LLMs are just statistical models that literally can't create novel things, therefore any structure LLMs write was learnt from human writing?

But who knows!


What kind of human writing has "it's not X—it's Y" in every single paragraph?

The answer is none. LLMs haven't accurately modeled human writing for years, current models have been smacked on the head with the coding RLHF bat so much, they all write distinctly inhuman text.


The thing is, people are screaming “AI” when they see a single “it's not X—it's Y" pattern in a post, despite this being a fairly common construct.

People are nitpicking every tiny thing in their search for proof of AI. It’s not useful and ends up dominating the conversation. AI panic is degrading the value of forums at least as much as actual AI at this point.


> GPT-4o

Why is this on the list? Like... what? How about including GPT 3.5 and GPT 2 here too?


In TFA it is put on the list because some of the users of this GPT version were discontent with its cancellation, which caused even OpenAI to oscillate in its decision, so they first cancelled it, then they resurrected it and then they cancelled it permanently, probably because continuing to run it would have cost more than the generated revenue.

Nothing similar happened when the earlier, presumably worse versions were discontinued.


It's Forbes, lads.

Gemini models even in last month add this 4o to any text I can bet that that is added by Gemini :D

> The large gap between OpenAI’s $852 billion valuation and Anthropic’s $380 billion has investors rushing to grab equity in the latter before it rises, according to Augment co-founder Adam Crawley.

That's the only thing you need to get from this article. They're doing mostly the same thing, aiming the same market. But Anthropic's shares are at 50% off discount.


Due-diligence once again proving to be mostly cargo-culting.

> The general public does not care about anything other than the capabilities and limitations of your product.

The developers don't care that either. If developers cared the whole npm ecosystem wouldn't exist.


This genre has always been very prevalent on HN. Move from cloud to on-premise. Move away from US-based services. Move away from Gmail. Move away from Github.

Was there one for memory managed languages like C# vs self managed like C/C++?

Rust.

bit of an edge case because the support was not for the incumbent

> just TUIs

For starters, CC's TUI is React-based.


Somebody somewhere is bragging to someone about using React to render a grid of ASCII characters.

https://x.com/trq212/status/2014051501786931427

" Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".

For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw

We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written. "


You can argue that any UI is like a game engine in that sense. Some make sensible choices and don't need to pretend they have to render at 60fps.

60fps is pathetic for a TUI when most terminals worth their salt are GPU accelerated and displays can be up to 240fps or even more. But let’s be real if I can play Quake at >500 fps they have no excuse.

Do they reconstruct the scene graph for each frame?! Maybe I'm overinterpreting the phrasing. Someone take a peek at the source?

Not reliably.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: