Test Model Metric EVO – X2 NVIDIA GB10 Winner
Llama 3.3 70B Generation Speed (tok/sec) 4.90 4.67 AMD
First Token Response Time (s) 0.86 0.53 NVIDIA
Qwen3 Coder Generation Speed (tok/sec) 35.13 38.03 NVIDIA
First Token Response Time (s) 0.13 0.42 AMD
GPT-OSS 20B Generation Speed (tok/sec) 64.69 60.33 AMD
First Token Response Time (s) 0.19 0.44 AMD
Qwen3 0.6B Model Generation Speed (tok/sec) 163.78 174.29 NVIDIA
First Token Response Time (s) 0.02 0.03 AMD
And additionally Framework apparently benchmarked GPT-OSS 120B (!) on the maxed out 395+ Desktop and reached a 38.0 tok/sec Generation Speed. Given that Nvidia can't even keep up on a 20B model, I assume they can't keep up on the 120B model aswell.
So to me the only thing which seems to be interesting about the Spark atm is the ability to daisy link several units together so you can create a InfiniBand-ish network at InfiniBand speeds of Sparks.
But overall for just plain development and experimentation, and since I don't work at Big AI, I'm pretty sure I would not purchase Nvidia at the moment.
Unfortunately comparing tok/sec right now in a vacuum and especially across weeks of time is kind of pointless. Everything is still evolving; there were patches within days that bumped GB10 performance by double digit percentiles in some frameworks. You just kind of have to accept things are a moving target.
For comparison, as of right now, I can run GPT-OSS 120b @ 59 tok/sec, using llama.cpp (revision 395e286bc) and Unsloth dynamic 4-bit quantized models.[1] GPT-OSS 20b @ 88 tok/sec [2]. The MXFP4 variant comes in the same, at ~89 tok/sec[3]. It's probably faster on other frameworks, llama.cpp is known to not be the fastest. I don't know what LM Studio backend they used. All of these numbers put the GB10 well ahead of Strix Halo, if only going by the numbers we see here.
If the AMD software wasn't also comparatively optimized by the same amount in the same timeframe, then the GB10 would be faster, now. Maybe it was optimized just as much; I don't have a Strix Halo part to compare. But my point is, don't just compare numbers from two various points in time, it's going to be very misleading.
These are valid points but the numbers are still useful as a floor on performance.
Given Strix Halo is so much cheaper I'd expect more people to work on improving it, but the NVIDIA tools are better so unclear which has more headroom.
Yeah that's fair. 60 tok/sec on a gpt-oss-120b is certainly nice to know if you should even think about it at all. I'm quite happy with it anyway.
The pricing is definitely by far the worst part of all of this. I suspect the GB10 still has more perf left on the table, Blackwell has been a rough launch. But I'm not sure it's $2000 better if you're just looking to get a fun little AI machine to do embeddings/vision/LLMs on?
from https://www.gmktec.com/blog/evo-x2-vs-nvidia-dgx-spark-redef... (text taken from https://wccftech.com/forget-nvidia-dgx-spark-amd-strix-halo-... since the GMKtec table was an image, but wccftech converted to an HTML table - EDIT-reformatted to make table look nicer in monospace font w/o tabs)