> It exhibits Transformer-like scaling laws: we find empirically that BDH rivals...

ACCount37 · 2025-10-22T14:32:01 1761143521

Pretty much.

Everyone and their dog says "transformer LLMs are flawed", but words are cheap - and in practice, no one seems to have come up with something that's radically better.

Sidegrades yes, domain specific improvements yes, better performance across the board? Haha no. For how simple autoregressive transformers seem, they sure set a high bar.