It explicitly says "Results on AIME and GPQA are really strong". So I would assume it means it can get (statistically significantly, I assume) better score in AIME and GPQA benchmarks compared to 4o.
It explicitly says "Results on AIME and GPQA are really strong". So I would assume it means it can get (statistically significantly, I assume) better score in AIME and GPQA benchmarks compared to 4o.