More

hurrdurr57 · on Aug 21, 2024

The Phi models always seem to do really well when it comes to benchmarks but then in real world performance they always fall way behind competing models.

hurrdurr57 · on Aug 20, 2024

Will it generate racially diverse Nazis at higher resolutions than their old model?

hurrdurr57 · on Aug 20, 2024

>the biggest indication of flawed leadership is a company or agency leadership photo where the majority percentage of the people in it are all the same skin tone.

Does this opinion come from your actual experience or just from your ideological indoctrination?

Virtually all non-western businesses have zero concern about fostering racial diversity, they are all failures in your opinion?

hurrdurr57 · on Aug 5, 2024

Realistically, I think it would have to be from the release date.

hurrdurr57 · on July 8, 2024

Given how quickly AI is progressing from the software side, and how poorly AI scales from just throwing raw compute time at a model, I don't see a company holding onto the lead for very long with that strategy.

If I can come out with a model a year later, and it can provide 95% of the performance while costing 10% as much to run, I think I would end up stealing a lot of customers before they had a chance to break even.

Take Llama3-8B for example, this is an 8 billion parameter model from 2024 that performs about as well the the original ChatGPT, a 175 billion parameter model from 2022. It only took 2 years before a model that can run on a desktop could compete with a model that required a data center.

sigmoid10 · on July 9, 2024

LLMs actually scale extremely well just by throwing compute at them. That's the whole reason they took off. Training a bigger model or training it longer or increasing the dataset all work more or less equally well. Now that we've saturated the dataset component (at least for human written text) pretty much, everyone throws their compute at bigger models or more epochs.

lumost · on July 8, 2024

It's totally reasonable to take both bets. It's unclear that the company betting 100B wouldn't also be the company making the 1 MM bet.

If you're MSFT - you don't care who wins as long as you have cost competitive rights to embed the AI in all of your products - earlier than others.

hurrdurr57 · on July 8, 2024

Well, I guess the question I have is, what exactly does he mean by the "cost to train"? As in, just the cost of the electricity used to train that one model? That seems really excessive.

Or is it the total overall cost of buying TPUs / GPUs, developing infrastructure, constructing data centers, putting together quality data sets, doing R&D, paying salaries, etc. as well as training the model itself? I could see that overall investment into AI scaling into the tens of billions over the next few years.

hurrdurr57 · on July 1, 2024

>Cruz cannot give any interviews without his permission

That's a really weird settlement.

hurrdurr57 · on June 27, 2024

Hammers don't kill people, people kill people

... with hammers

hurrdurr57 · on June 17, 2024

Well, the statement that GPT-4 is 1.8T parameters is a little misleading since it's really a 8 x 220B MoE (according to the rumors at least).

Also the size of the model itself isn't the only factor that determines performance, LLama 3 70B outperforms LLama 2 70B even though they have the same size.

hurrdurr57 · on June 15, 2024

This would be doable, but expensive. You'd need to pay people to go through 10,000 pornographic images and manually blur out all of the fun bits.