With the rate of progress (and in the opposite direction, the physical limitations Intel/AMD/TSMC/ETC are bumping into), there's no guarantees about what a machine will look like a decade from now. But, simple logic applies: if the user's machine scales to X amounts of RAM, the hyperscaler's rack scales to X*Y RAM and assuming the performance/scaling relationship we've seen holds true, it will be correspondingly far smarter/better/powerful compared to the user's AI.
Maybe that won't matter when the user is asking it a 5th grade question, but for any more complex application of AI than "what's the weather" or "turn on a light", users should want a better AI, particularly if they don't have to pay for all that silicon sitting around unused in their machine for most of the day?
This argument would sound nearly identical if you made it in the 70s or early 80s about mainframes and personal computers.
It's not that mainframes (or supercomputers, or servers, or the cloud) stopped existing, it's that there was a "good enough" point where the personal computer was powerful enough to do all the things that people care about. Why would this be different?*
And aren't we all paying for a bunch of silicon that sits mostly unused? I have a full modern GPU in my Apple SoC capable of throwing a ridiculous number of polygons per second at the screen and I'm using it to display two terminal emulator windows.
* (I can think of a number of reasons why it would in fact turn out different, but none of them have to do with the limits of technology -- they are all about control or economic incentives)
It’s different because of the ubiquity of the internet and the financial incentives of the companies involved.
Right now you can get 20TB hard drives for cheap and setup your own NAS, but way more people spend money every month on Dropbox/iCloud/onedrive - people value convenience and accessibility over “owning” the product.
Companies also lean into this. Just consider Photoshop. It used to be a one-time purchase, then it became a cloud subscription, now virtually every new AI feature uses paid credits. Despite having that fast SoC, Photoshop will still throw your request to their cloud and charge you for it.
The big point still remains: by the time you can run that trillion parameter model at home, it’s old news. If the personal computer of the 80s was good enough, why’s nobody still using one? AI on edge devices will exist, but will forever remain behind data center AI.
Right now you can get 20TB hard drives for cheap and setup your own NAS, but way more people spend money every month on Dropbox/iCloud/onedrive - people value convenience and accessibility over “owning” the product.
Yes, this is a convenience argument, not a technical one. It's not that your PC doesn't have or could have more than enough storage -- it likely does -- it's that there are other factors that make you use Dropbox.
So now the question becomes: do we not believe that personal devices will ever become good enough to run a "good enough" LLM (technical barrier), or do we believe that other factors will make it seem less desirable to do so (social/financial/legal barrier)?
I think there's a very decent chance that the latter will be true, but the original argument was a technical one -- that good-enough LLMs will always require so much compute that you wouldn't want to run one locally even if you could.
If the personal computer of the 80s was good enough, why’s nobody still using one?
What people want to do changes with time, and therefore your PC XT will no longer hack it in the modern workplace, but the point is that from the point that a personal computer of any kind was good enough, people kept using personal computers. The parallel argument here would be that if there is a plateau where LLM improvement slows and converges with ability to run something good enough on consumer hardware, why would people not then just keep running those good enough models on their hardware? The models would get better with time, sure, but so would the hardware running them.
The original point that I was making was never purely a technical one. Performance, economics, convenience, and business trends all play a part in what I think will happen.
Even if LLM improvement slows, it’ll probably result in the same treadmill effect we see in other software.
Consider MS Office, Adobe Creative (Cloud), or just about any pro level software. The older versions aren’t really used, for various reasons, including performance, features, compatibility, etc. Why would LLMs, which seem to be on an even faster trajectory than conventional software, be any different? Users will want to continue upgrading, and in the case of AI, that’ll mean continuing to access the latest cloud model.
No doubt that someone can run gpt-oss-120b five years from now on device, but outside of privacy, why would they when you can get a faster, smarter answer (for free, likely) from a service?
Maybe that won't matter when the user is asking it a 5th grade question, but for any more complex application of AI than "what's the weather" or "turn on a light", users should want a better AI, particularly if they don't have to pay for all that silicon sitting around unused in their machine for most of the day?