> Doing inference with a Mac Mini to save money is more or less holding it wrong.
No one's running these large models on a Mac Mini.
> Of course if you buy some overpriced Apple hardware it’s going to take years to break even.
Great, where can I find cheaper hardware that can run GLM 5's 745B or Kimi K2.5 1T models? Currently it requires 2x M3 Ultras (1TB VRAM) to run Kimi K2.5 at 24 tok/s [1] What are the better value alternatives?
Six months ago I'd have said EPYC Turin. You could do a heck of a build with 12Ch DDR5-6400 and a GPU or two for the dense model parts. 20k would have been a huge budget for a homelab CPU/GPU inference rig at the time. Now 20k won't buy you the memory.
No one's running these large models on a Mac Mini.
> Of course if you buy some overpriced Apple hardware it’s going to take years to break even.
Great, where can I find cheaper hardware that can run GLM 5's 745B or Kimi K2.5 1T models? Currently it requires 2x M3 Ultras (1TB VRAM) to run Kimi K2.5 at 24 tok/s [1] What are the better value alternatives?
[1] https://x.com/alexocheema/status/2016404573917683754