Numbers like these really don't bode well for the longer term prospects of open ...

mikehollinger · on May 21, 2024

> Numbers like these really don't bode well for the long-term prospects of open source models, I doubt the current strategy of waiting expectantly for a corporation to spoonfeed us yet another $100,000 model for free is going to work forever.

I would add “in their current form” and agree. There’s three things that can change here: 1. Moore’s law: The worldwide economy is built around the steady progression of cheaper compute. Give it 36 months and your problem becomes a $25,000 problem. 2. Quantization and smaller models: There’ll likely become specializations of the various models (is this the beginning of the “Monolith vs Microservices” debate? 3. E2E Training isn’t for everyone: Finetunes and Alignment are more important than an end to end training run, IF we can coerce the behaviors we want into the models by finetuning them. That along with quantized models (imho) unlocked vision models which are now in the “plateau of producivity” in the gartner hype cycle compared to a few years ago.

So as an example today, I can grab a backbone and pretrained weights for an object detector, and with relatively little data (from a few lines to a few 10’s of lines of code, and 50 to 500 images) and relatively little wall clock time and energy (say 5 to 15 minutes) on a PC, I can create a customized object detector that can detect -my- specific objects pretty well. I might need to revise it a few times, but it’ll work pretty well.

Why would we not see the same sort of progression with transformer architectures? It hinges on someone creating the model weights for the “greater good,” or us figuring out how to do distributed training for open source in a “seti@home” style (long live the blockchain, anyone?).

jsheard · on May 21, 2024

Yeah, there's no accounting for breakthroughs in training efficiency. I wouldn't count on Moores Law though, the amount of compute you can put into these problems is effectively unbounded so more efficient silicon just means those with money can train even bigger models. 3D rendering is a decent analogy, Moores Law has made it easy to render something comparable to the first Toy Story movie, but Pixar poured those gains back into more compute and is using it to do things you definitely can't afford to.

philjohn · on May 21, 2024

I wonder if a kind of Seti@Home approach could work - although I'm guessing the limited VRAM in most consumer cards compared to an H100, as well as the much slower "virtual WAN interconnect" versus the mellanox goodies that nVidia clusters enjoy would be too big an obstacle?

jsheard · on May 21, 2024

Even if you could get that to work, how many people would be willing to run their >300W GPUs at full tilt 24/7 in order to contribute to the training cause? You would basically be asking people to deal with the logistics of running a cryptocurrency mining operation but without the prospect of getting paid for it.

loudmax · on May 21, 2024

Depends on the logistics. If I were confident about the security, I wouldn't mind letting my GPU participate in a distributed effort to significantly improve an open source model. This should be a few dollars a month on my power bill, not dozens or hundreds of dollars, especially if I undervolt.

Now, I don't know of any distributed training technique that will make a significant impact on improving a model, and that security component is a big "if". But if something promising comes a long, I'd bet lots of people would be willing to donate some GPU time, especially if it were easy to set up.