Hacker Newsnew | past | comments | ask | show | jobs | submit | hurrdurr57's commentslogin

The Phi models always seem to do really well when it comes to benchmarks but then in real world performance they always fall way behind competing models.


Will it generate racially diverse Nazis at higher resolutions than their old model?


>the biggest indication of flawed leadership is a company or agency leadership photo where the majority percentage of the people in it are all the same skin tone.

Does this opinion come from your actual experience or just from your ideological indoctrination?

Virtually all non-western businesses have zero concern about fostering racial diversity, they are all failures in your opinion?


Realistically, I think it would have to be from the release date.


Given how quickly AI is progressing from the software side, and how poorly AI scales from just throwing raw compute time at a model, I don't see a company holding onto the lead for very long with that strategy.

If I can come out with a model a year later, and it can provide 95% of the performance while costing 10% as much to run, I think I would end up stealing a lot of customers before they had a chance to break even.

Take Llama3-8B for example, this is an 8 billion parameter model from 2024 that performs about as well the the original ChatGPT, a 175 billion parameter model from 2022. It only took 2 years before a model that can run on a desktop could compete with a model that required a data center.


LLMs actually scale extremely well just by throwing compute at them. That's the whole reason they took off. Training a bigger model or training it longer or increasing the dataset all work more or less equally well. Now that we've saturated the dataset component (at least for human written text) pretty much, everyone throws their compute at bigger models or more epochs.


It's totally reasonable to take both bets. It's unclear that the company betting 100B wouldn't also be the company making the 1 MM bet.

If you're MSFT - you don't care who wins as long as you have cost competitive rights to embed the AI in all of your products - earlier than others.


Well, I guess the question I have is, what exactly does he mean by the "cost to train"? As in, just the cost of the electricity used to train that one model? That seems really excessive.

Or is it the total overall cost of buying TPUs / GPUs, developing infrastructure, constructing data centers, putting together quality data sets, doing R&D, paying salaries, etc. as well as training the model itself? I could see that overall investment into AI scaling into the tens of billions over the next few years.


>Cruz cannot give any interviews without his permission

That's a really weird settlement.


Hammers don't kill people, people kill people

... with hammers


Well, the statement that GPT-4 is 1.8T parameters is a little misleading since it's really a 8 x 220B MoE (according to the rumors at least).

Also the size of the model itself isn't the only factor that determines performance, LLama 3 70B outperforms LLama 2 70B even though they have the same size.


This would be doable, but expensive. You'd need to pay people to go through 10,000 pornographic images and manually blur out all of the fun bits.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: