This is still half the speed of a consumer NVidia card, but the large amounts of memory is great, if you don't mind running things more slowly and with fewer libraries.
Was this example intended to describe any particular device? Because I'm not aware of anything that operates at 8800 MT/s, especially not with 64-bit channels.
That seems unlikely given the mismatched memory speed (see the parent comment) and the fact that Apple uses LPDDR which is typically 16 bits per channel. 8800MT/s seems to be a number pulled out of thin air or bad arithmetic.
Heh, ok, maybe slightly different. But apple spec claims 546GB/sec which works out to 512 bits (64 bytes) * 8533. I didn't think the point was 8533 vs 8800.
I believe I saw somewhere that the actual chips used are LPDDR5X-8533.
Effectively the parents formula describes the M4 max, give or take 5%.
Fewer libraries? Any that a normal LLM user would care about? Pytorch, ollama, and others seem to have the normal use cases covered. Whenever I hear about a new LLM seems like the next post is some mac user reporting the token/sec. Often about 5 tokens/sec for 70B models which seems reasonable for a single user.
Is there a normal LLM user yet? Most people would want their options to be as wide as possible. The big ones usually get covered (eventually), and there are distinct good libraries emerging for Mac only (sigh), but last I checked the experience of running every kit (stable diffusion, server-class, etc) involved overhead for the Mac world.
A 24gb model is fast and ranks 3.
A 70b model is slow and 8.
A top tier hosted model is fast and 100.
Past what specialized models can do, it's about a mixture/agentic approach and next level, nuclear power scale. Having a computer with lots of relatively fast RAM is not magic.
Thanks, but just to put things into perspective, this calculation has counted 8 channels which is 4 DIMMs and that's mostly desktops (not dismissing desktops, just highlighting that it's a different beast).
Desktops are two channels of 64 bits, or with DDR5 now four (sub)channels of 32 bits; either way, mainstream desktop platforms have had a total bus width of 128 bits for decades. 8x64 bit channels is only available from server platforms. (Some high-end GPUs have used 512-bit bus widths, and Apple's Max level of processors, but those are with memory types where the individual channels are typically 16 bits.)
The vast majority of any x86 laptop or desktops are 128 bits wide. Often 2x64 bit channels up till last year or so, now 4x32 bit DDR5 in the last year or so. There are some benefits to 4 channels over 2, but generally you are still limited by 128 bits unless you buy a Xeon, Epyc, or Threadripper (or Intel equiv) that are expensive, hot, and don't fit in SFFs or laptops.
So basically the PC world is crazy behind the 256, 512, and 1024 bit wide memory busses apple has offered since the M1 arrived.
> This is still half the speed of a consumer NVidia card, but the large amounts of memory is great, if you don't mind running things more slowly and with fewer libraries.
But it has more than 2x longer battery life and a better keyboard than a GPU card ;)
Bandwidth (GB/s) = (Data Rate (MT/s) * Channel Width (bits) * Number of Channels) / 8 / 1000
(8800 MT/s * 64 bits * 8 channels) / 8 / 1000 = 563.2 GB/s
This is still half the speed of a consumer NVidia card, but the large amounts of memory is great, if you don't mind running things more slowly and with fewer libraries.