Which models will this be able to run at an acceptable token/s rate?

simlevesque · 2025-11-10T16:49:35 1762793375

gpt-oss:120b

hamdingers · 2025-11-10T16:54:33 1762793673

Am I missing it or is there no information about performance? Looking for a tokens/sec

aseipp · 2025-11-10T21:22:46 1762809766

Right now I get 59 tok/sec on GPT-OSS 120B using Unsloth's dynamic 4-bit quants, via llama.cpp https://news.ycombinator.com/item?id=45881049

simlevesque · 2025-11-10T16:56:45 1762793805

He didn't give that info but the transcript linked at the end shows how much time was spent for each query.