More

raincole · 2026-04-02T18:10:23 1775153423

Probably zero. At the end of the day people pay for LLMs that write better code or summarize PDFs of hundreds of pages faster, not the ones that can count the letter r's better.

When LLMs can't count r's: see? LLMs can't think. Hoax!

When LLMs count r's: see? They patched and benchmark-maxxed. Hoax!

You just can't reason with the anti-LLM group.

toraway · 2026-04-02T18:28:46 1775154526

Whenever an "LLM fail" goes viral like the car wash question, you can observe the exact same wording of the question get "fixed" within a week or so. With slight variations in phrasing still able to replicate the problem.

Followed by lots of "works perfectly for me, why are people even talking about this?"

I can't say what exactly they're doing behind the scenes but it's a consistent pattern among the big SOTA model providers. With obvious incentive to "fix" the problem so users will then organically "debunk" the meme as they try it themselves and share their experiences.

simianwords · 2026-04-02T18:49:39 1775155779

You are misremembering. There’s no patch. All these examples used the instant model.

coldtea · 2026-04-02T19:30:04 1775158204

The same non-argument could be said for all kinds of cheating on benchmarks by tech companies and yet we have tons of documented example of them caught with pants down.

>You just can't reason with the anti-LLM group.

On the contrary, the reasoning is simple and consistent:

LLMs can't count r's shows that LLM don't actually think the way we understand thought (since nobody with the kind of high skills they have in other areas would fail that). And because of that, there are (likely) patches for commonly reported cases, since it's a race to IPO and benchmark-maxxing is very much conceivable.

raincole · 2026-04-02T11:06:05 1775127965

Ofcom is a disgrace to humanity. I guess this is an early signal that they plan to control how people use YouTube.

eru · 2026-04-02T11:09:34 1775128174

UK voters get what they vote for.

4ndrewl · 2026-04-02T11:19:03 1775128743

Literally untrue as we don't have proportional representation.

lonelyasacloud · 2026-04-02T14:27:26 1775140046

So Ofcom publishes a factual report with the positively dull actual title of "Passive social media use, AI companionship, and online side hustles: UK adults’ media and online lives revealed"

A report that makes zero comment on controlling how people use YouTube or any other social media.

And yet somehow they are "a disgrace to humanity"

Really?

4ndrewl · 2026-04-02T11:18:15 1775128695

I think your disgrace-level calibration needs adjusting given everything else that's going on rn buddy.

raincole · 2026-04-02T08:31:18 1775118678

A valuable lesson AI taught me is how bad articles on Bloomberg and Forbes are. They probably have always been this bad, but I were unaware of that until they started writing about AI (because, admittedly, I subconsciously thought well-known = good).

IanCal · 2026-04-02T08:43:55 1775119435

There’s something called the Gell-Mann amnesia effect where people often see what you have but then go back to assuming the other stories are all reliable.

I used to love Private Eye and they have done great journalism that’s highly acclaimed, but the only thing they wrote that I really knew about (literally the office I was in) was outrageously wrong and would have been so easy to verify (ask literally anyone in the BBC building we were in to go to that floor, or take a tour or write an email). Can’t read it any more.

SyneRyder · 2026-04-02T09:35:28 1775122528

Here's Wikipedia's entry on the Gell-Mann Amnesia Effect, because I've found it a very useful concept to know. Despite my media experiences, I still keep falling for it. And I love that we're still referring to it as Gell-Mann Amnesia here:

https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amn...

In a speech in 2002, Crichton coined the term "Gell-Mann amnesia effect" to describe the phenomenon of experts reading articles within their fields of expertise and finding them to be error-ridden and full of misunderstanding, but seemingly forgetting those experiences when reading articles in the same publications written on topics outside of their fields of expertise, which they believe to be credible. He explained that he had chosen the name ironically, because he had once discussed the effect with physicist Murray Gell-Mann, "and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have".

stavros · 2026-04-02T10:34:28 1775126068

> "and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have".

Ahh, yes, the SyneRyder effect.

zwischenzug · 2026-04-02T10:21:39 1775125299

Everything I've known anything about first hand has been utterly garbled - or was completely made up - when written up in Private Eye.

cwillu · 2026-04-02T13:53:50 1775138030

Odd take, as this was actually a pretty good article. The GP appears to be mostly bemoaning the fact that it's targeted at a lay audience.

raincole · 2026-04-02T06:38:44 1775111924

Claude Code (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.

In less than four years the AI coding workflow has been overhauled at least twice: from Chat interface (ChatGPT) to editor integration (Cursor), then to CLI agent harnesses (CC/Codex). It would be crazy to assume that harnesses are the end of evolution.

otabdeveloper4 · 2026-04-02T06:57:04 1775113024

> Everyone knows that.

Except, apparently, Anthropic - who are doing their darndest to get everyone onboard their tools as a moat. Apparently that's the only strategy to AI stickiness.

benterix · 2026-04-02T08:56:28 1775120188

And their strategy kind of worked, right? CC is the most popular agentic coding tool. Anthropic faces competition from OpenAI (potentially better model, weaker TUI tool) and from the rest (potentially worse models, weaker TUIs). So their strategy is to develop both: make their closed model and closed tool better than competition so that when people want to vibceode they will choose their ecosystem.

zozbot234 · 2026-04-02T09:01:24 1775120484

OpenAI Codex is a much higher quality harness than Claude Code or OpenCode, and available as open source.

croes · 2026-04-02T06:46:31 1775112391

Claude Code 2.0 (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.

Claude Code 3.0 (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.

And so on and on and on.

A promise of AI was mature software

operatingthetan · 2026-04-02T07:46:10 1775115970

If there is a market for a mature harness, surely it will be built right?

raincole · 2026-04-02T08:03:31 1775117011

I don't know what your point is. What you said is exactly what I expect to happen, except they might have a more creative name than "Claude Code 2.0".

croes · 2026-04-02T14:05:10 1775138710

My point is that the mature version will always be the next future version.

raincole · 2026-04-02T05:51:59 1775109119

What changed is you, the reader. In 2026 we treat the smallest signs as evidence of LLM writing. Too long? LLM. Too short? LLM. Too grammatically correct? Must be LLM.

sanitycheck · 2026-04-02T06:28:11 1775111291

For me it was the "it's not x"/"it's y" stuff and some other structures Claude is very fond of using all the time. Perhaps humans are starting to write like LLMs!

raincole · 2026-04-02T06:33:10 1775111590

Perhaps, just perhaps, LLMs are just statistical models that literally can't create novel things, therefore any structure LLMs write was learnt from human writing?

But who knows!

bakugo · 2026-04-02T07:49:00 1775116140

What kind of human writing has "it's not X—it's Y" in every single paragraph?

The answer is none. LLMs haven't accurately modeled human writing for years, current models have been smacked on the head with the coding RLHF bat so much, they all write distinctly inhuman text.

dpark · 2026-04-02T15:45:20 1775144720

The thing is, people are screaming “AI” when they see a single “it's not X—it's Y" pattern in a post, despite this being a fairly common construct.

People are nitpicking every tiny thing in their search for proof of AI. It’s not useful and ends up dominating the conversation. AI panic is degrading the value of forums at least as much as actual AI at this point.

raincole · 2026-04-01T17:10:58 1775063458

> GPT-4o

Why is this on the list? Like... what? How about including GPT 3.5 and GPT 2 here too?

adrian_b · 2026-04-01T17:22:04 1775064124

In TFA it is put on the list because some of the users of this GPT version were discontent with its cancellation, which caused even OpenAI to oscillate in its decision, so they first cancelled it, then they resurrected it and then they cancelled it permanently, probably because continuing to run it would have cost more than the generated revenue.

Nothing similar happened when the earlier, presumably worse versions were discontinued.

aerhardt · 2026-04-01T19:12:14 1775070734

It's Forbes, lads.

kingleopold · 2026-04-01T18:41:21 1775068881

Gemini models even in last month add this 4o to any text I can bet that that is added by Gemini :D

raincole · 2026-04-01T15:33:41 1775057621

> The large gap between OpenAI’s $852 billion valuation and Anthropic’s $380 billion has investors rushing to grab equity in the latter before it rises, according to Augment co-founder Adam Crawley.

That's the only thing you need to get from this article. They're doing mostly the same thing, aiming the same market. But Anthropic's shares are at 50% off discount.

tren_hard · 2026-04-01T19:39:54 1775072394

Due-diligence once again proving to be mostly cargo-culting.

raincole · 2026-04-01T15:27:16 1775057236

> The general public does not care about anything other than the capabilities and limitations of your product.

The developers don't care that either. If developers cared the whole npm ecosystem wouldn't exist.

raincole · 2026-04-01T15:24:39 1775057079

This genre has always been very prevalent on HN. Move from cloud to on-premise. Move away from US-based services. Move away from Gmail. Move away from Github.

simianwords · 2026-04-01T15:27:55 1775057275

Was there one for memory managed languages like C# vs self managed like C/C++?

raincole · 2026-04-01T15:34:06 1775057646

Rust.

simianwords · 2026-04-01T15:37:45 1775057865

bit of an edge case because the support was not for the incumbent

raincole · 2026-04-01T13:54:39 1775051679

> just TUIs

For starters, CC's TUI is React-based.

ale · 2026-04-01T14:44:36 1775054676

Somebody somewhere is bragging to someone about using React to render a grid of ASCII characters.

GoatInGrey · 2026-04-01T15:46:08 1775058368

https://x.com/trq212/status/2014051501786931427

" Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".

For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw

We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written. "

xendo · 2026-04-01T16:04:42 1775059482

You can argue that any UI is like a game engine in that sense. Some make sensible choices and don't need to pretend they have to render at 60fps.

throwatdem12311 · 2026-04-01T22:29:08 1775082548

60fps is pathetic for a TUI when most terminals worth their salt are GPU accelerated and displays can be up to 240fps or even more. But let’s be real if I can play Quake at >500 fps they have no excuse.

thinkling · 2026-04-01T19:51:17 1775073077

Do they reconstruct the scene graph for each frame?! Maybe I'm overinterpreting the phrasing. Someone take a peek at the source?

joquarky · 2026-04-02T03:43:34 1775101414

Not reliably.