GMKtec, maker of the EVO-X2 mini-PC that uses a Ryzen AI Max 395+, posted a blog...

mindcrash · 2025-11-10T20:15:24 1762805724

And additionally Framework apparently benchmarked GPT-OSS 120B (!) on the maxed out 395+ Desktop and reached a 38.0 tok/sec Generation Speed. Given that Nvidia can't even keep up on a 20B model, I assume they can't keep up on the 120B model aswell.

https://frame.work/nl/en/desktop?tab=machine-learning

So to me the only thing which seems to be interesting about the Spark atm is the ability to daisy link several units together so you can create a InfiniBand-ish network at InfiniBand speeds of Sparks.

But overall for just plain development and experimentation, and since I don't work at Big AI, I'm pretty sure I would not purchase Nvidia at the moment.

aseipp · 2025-11-10T21:16:34 1762809394

Unfortunately comparing tok/sec right now in a vacuum and especially across weeks of time is kind of pointless. Everything is still evolving; there were patches within days that bumped GB10 performance by double digit percentiles in some frameworks. You just kind of have to accept things are a moving target.

For comparison, as of right now, I can run GPT-OSS 120b @ 59 tok/sec, using llama.cpp (revision 395e286bc) and Unsloth dynamic 4-bit quantized models.[1] GPT-OSS 20b @ 88 tok/sec [2]. The MXFP4 variant comes in the same, at ~89 tok/sec[3]. It's probably faster on other frameworks, llama.cpp is known to not be the fastest. I don't know what LM Studio backend they used. All of these numbers put the GB10 well ahead of Strix Halo, if only going by the numbers we see here.

If the AMD software wasn't also comparatively optimized by the same amount in the same timeframe, then the GB10 would be faster, now. Maybe it was optimized just as much; I don't have a Strix Halo part to compare. But my point is, don't just compare numbers from two various points in time, it's going to be very misleading.

[1]: https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/U... [2]: https://huggingface.co/unsloth/gpt-oss-20b-GGUF/resolve/main... [3]: https://huggingface.co/unsloth/gpt-oss-20b-GGUF/resolve/main...

nl · 2025-11-10T21:41:23 1762810883

These are valid points but the numbers are still useful as a floor on performance.

Given Strix Halo is so much cheaper I'd expect more people to work on improving it, but the NVIDIA tools are better so unclear which has more headroom.

aseipp · 2025-11-10T22:25:31 1762813531

Yeah that's fair. 60 tok/sec on a gpt-oss-120b is certainly nice to know if you should even think about it at all. I'm quite happy with it anyway.

The pricing is definitely by far the worst part of all of this. I suspect the GB10 still has more perf left on the table, Blackwell has been a rough launch. But I'm not sure it's $2000 better if you're just looking to get a fun little AI machine to do embeddings/vision/LLMs on?

EnPissant · 2025-11-11T16:51:48 1762879908

This is nonsense. The NVIDIA will slightly win on all generation speed, and be _much_ faster on first token response time.