Hacker Newsnew | past | comments | ask | show | jobs | submit | jqpabc123's commentslogin

Nothing an LLM produces should be surprising. The basic algorithm is mainly just probability and statistics.

All this means is that Anthropic's marketing rhetoric is slightly more dominate in it's database.

But kudos to ChatGPT for not taking the time/effort to rig things in their favor.


Any time I see *unlimited* anything offered for a fixed rate, my immediate thought is either *mistake* or blatantly *false*.

Wireless service providers are a good example. They offer *unlimited* data plans --- which (surprise, surprise) are actually anything but *unlimited*.


Good point. “Unlimited” just means Emit does not impose a per-contact or campaign cap. The actual sending still follows Gmail API limits.

LLMs don't do reasoning --- they do probability.

With large enough training database, this can produce surprisingly similar results in a lot of simple cases.

The real problem is the only foolproof way to detect when they don't is by painstaking duplicating and verifying any results from an LLM.

But this verification negates much of the advantage LLMs are supposed to offer. So the natural human response is to simply skip the due diligence. And this is a liability issue waiting to happen.

The cost of AI liability has yet to be priced into the market. I expect that some companies and AI service providers will start to restrict or even prohibit the use of AI in some cases because the cost of liability outweighs any real benefit.

Liability disclaimers don't legally apply to a lot of professional services. Selling fake "intelligence" to doctors and lawyers is a risky proposition.


[flagged]


You can't predict --- and therein lies the problem. All you can do is verify. And this negates a lot of the value proposition of AI.

The best use of current AI is what it was originally designed for --- things that don't matter much and are highly tolerant of errors --- like web search.


Coding agents work well enough to be useful because they can check their own work. Nevertheless it is too generous to call that reasoning. If they're right 80% of the time and they rerun a prompt if the project won't build, they might be right 95% of the time, or even 99% of the time on the third try. And if you know a bit about coding you're probably able to recognize when to intervene.

That's not to denigrate a real productivity booster. It is however a warning to anyone applying LLM-based AI to use cases where you don't have the kind of training corpus and formal framework around correctness that characterizes coding.

Coding surely matters. But it also might be a truly unique use case.


To an extent… that will get you a project that builds, and passes any other objective detectable test, but it can’t tell you if it’s reasonable or good.

Which means we’re getting a lot of shitty work that passes tests.


[flagged]


That’s a massive oversimplification of the field's trajectory.

Google introduced the Transformer model in 2017. They built interactive voice response (Google Assistant) and applied it to web search and language translation (all have a large error tolerance) but didn't do much more because they considered reliability to be an issue.

ChatGPT was introduced in 2022. It was based on the Transformer model as are all the current AI chatbots.

ChatGPT's big innovation was scale. They spend billions to digitize everything they could find on the web and beyond and market it as a general purpose AI.

But scale has hit a wall. Even with a world of data and an energy budget larger than a small country, reasoning and reliability remains a largely unresolved issue.

Computing has traditionally been about reliable answers at low cost. AI offers the opposite --- unreliable answers at high cost.

https://research.google/blog/transformer-a-novel-neural-netw...


Why do you sound like an LLM?

> While LLMs are probabilistic, their accuracy in specific domains—like Tool Calling—is already hitting near-100% reliability. That is where industrialization happens.

Is this just an AI bot replying to comments on its own AI post?


I have already cracked down on MS Authenticator --- I wrote my own --- and it's not very difficult to do.

For basic 6-digit codes sure. But MS authenticator uses push notice based approvals for many corporate SSO accounts including mine.

"push notice based approvals"

My simple work aound --- put the code in the clipboard. Use Ctrl-V to paste.

My authenticator app can also be configured to automatically launch a sign-in page.


You hardly ever have to use that system - despite the way they present it during enrollment (like zoom tries to get you to install their electron app every time you video conference).

Yes, they have an ethics problem.

They have no idea what ethics is --- nor do they care.

If you ask them, they will gladly regurgitate a philosophical definition from their training data.

But regurgitation is not the same as understanding.


People who pretend the machine will do ethics for them have an ethics problem.

Funny thing about renewable energy --- the cost doesn't jump dramatically just because a war broke out in the Middle East.

Capital costs will when there is a war over Taiwan though.

yes, llms can produce bad code, they can also produce good code, just like people

Over time, you develop a feel for which human coders tend to be consistently "good" or "bad". And you can eliminate the "bad".

With an LLM, output quality is like a box of chocolates, you never know what you're going to get. It varies based on what you ask and what is in it's training data --- which you have no way to examine in advance.

You can't fire an LLM for producing bad code. If you could, you would have to fire them all because they all do it in an unpredictable manner.


no but you're a human and you're responsible for it, so it's on you

you can make horrible images with photoshop that doesn't make photoshop bad


The key word here is *you*.

Photoshop doesn't make anything --- *you* make the image horrible --- or not. Any results relate directly to *your* skill.

A direct comparison to agentic AI is less than equitable. AI is supposedly able to provide skill --- which it often fails to do.


you talk to the llm bro, you are responsible for the outcome

Trump is working to create a global economic crash.

He is indistinguishable from a deliberate enemy attack.

20 years from now, everybody will have always been against this.


LLMs have no idea what "correct" means.

Anything they happen to get "correct" is the result of probability applied to their large training database.

Being wrong will always be not only possible but also likely any time you ask for something that is not well represented in it's training data. The user has no way to know if this is the case so they are basically flying blind and hoping for the best.

Relying on an LLM for anything "serious" is a liability issue waiting to happen.


Yes Transformer models are non-deterministic, but it is absolutely not true that they can't generalise (the equivalent of interpolation and extrapolation in linear regression, just with a lot more parameters and training).

For example, let's try a simple experiment. I'll generate a random UUID:

> uuidgen 44cac250-2a76-41d2-bbed-f0513f2cbece

Now it is extremely unlikely that such a UUID is in the training set.

Now I'll use OpenCode with "Qwen3 Coder 480B A35B Instruct" with this prompt: "Generate a single Python file that prints out the following UUID: "44cac250-2a76-41d2-bbed-f0513f2cbece". Just generate one file."

It generates a Python file containing 'print("44cac250-2a76-41d2-bbed-f0513f2cbece")'. Now this is a very simple task (with a 480B model), but it solves a problem that is not in the training data, because it is a generalisation over similar but different problems in the training data.

Almost every programming task is, at some level of abstraction, and with different levels of complexity, an instance of solving a more general type of problem, where there will be multiple examples of different solutions to that same general type of problem in the training set. So you can get a very long way with Transformer model generalisations.


It’s a shame of bulk of that training data is likely 2010s blogspam that was poor quality to begin with.

But isn't that a reflection of reality?

If you've made a significant investment in human capital, you're even more likely to protect it now and prevent posting valuable stuff on the web.



Yes it is. There’s a reason why university knowledge is gated. And was gated for centuries.

Can’t believe I have to explain simple stuff.


Aye. I wish more conversations would be more of this nature - in that we should start with basic propositions - e.g. the thing does not 'know' or 'understand' what correct is.

This is about to change very soon. Unlike many other domains (such as greenfield scientific discovery), most coding problems for which we can write tests and benchmarks are "verifiable domains".

This means an LLM can autogenerated millions of code problem prompts, attempt millions of solutions (both working and non-working), and from the working solutions, penalize answers that have poor performance. The resulting synthetic dataset can then be used as a finetuning dataset.

There are now reinforcement finetuning techniques that have not been incorporated into the existing slate of LLMs that will enable finetuning them for both plausibility AND performance with a lot of gray area (like readability, conciseness, etc) in between.

What we are observing now is just the tip of a very large iceberg.


Lets suppose whatever you say is true.

If Im the govt, Id be foaming at the mouth - those projects that used to require enormous funding now will supposedly require much less.

Hmmm, what to do? Oh I know. Lets invest in Digital ID-like projects. Fun.


It is true. Here is the publication going over how to generate this type of dataset and finetune: https://arxiv.org/pdf/2506.14245

I don't think you grasp my statement. LLMs will exceed humans greatly for any domain that is easy to computationally verify such as math and code. For areas not amenable to deterministic computations such as human biology, or experimental particle physics, progress will be slower


lol did you even read my post, dude?

This is easily proven incorrect. Just go to ChatGPT and say something incorrect and ask it to verify. Why do people still believe this type of thing?

I did this yesterday and it was happy to provide me with an incorrect explanation. Not just that, but incorrect thermodynamic data supporting its claims, despite readily available published values to the contrary.

And yet models get things wrong all the time, too.

That’s what I would expect even if it can have the concept of truth. Like humans.

The Pentagon officially notifies the world that it is petty, vindicative and lacks credibility.

America did that to itself years ago.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: