Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
_0ffh
62 days ago
|
parent
|
context
|
favorite
| on:
Nvidia Stock Crash Prediction
You'd be surprised how quickly improvement of autoregressive language models levels off with epoch count (though, admittedly, one epoch is a LOT). Diffusion language models otoh indeed keep profiting for much longer, fwiw.
zozbot234
62 days ago
[–]
Does this also apply to LLM training at scale? I would be a bit surprised if it does, fwiw.
_0ffh
62 days ago
|
parent
[–]
Yup, as soon as data is the bottleneck and not compute, diffusion wins. Tested following the Chinchilla scaling strategy from 7M to 2.5B parameters.
https://arxiv.org/abs/2507.15857
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: