It's not "DDR5" on its own, it's a few factors. Bandwidth (GB/s) = (Data Rate (M...

wtallis · on Oct 30, 2024

> (8800 MT/s * 64 bits * 8 channels) / 8 / 1000 = 563.2 GB/s

Was this example intended to describe any particular device? Because I'm not aware of anything that operates at 8800 MT/s, especially not with 64-bit channels.

sliken · on Oct 30, 2024

M4 max in the MBP (today) and in the Studio at some later date.

wtallis · on Oct 30, 2024

That seems unlikely given the mismatched memory speed (see the parent comment) and the fact that Apple uses LPDDR which is typically 16 bits per channel. 8800MT/s seems to be a number pulled out of thin air or bad arithmetic.

sliken · on Oct 30, 2024

Heh, ok, maybe slightly different. But apple spec claims 546GB/sec which works out to 512 bits (64 bytes) * 8533. I didn't think the point was 8533 vs 8800.

I believe I saw somewhere that the actual chips used are LPDDR5X-8533.

Effectively the parents formula describes the M4 max, give or take 5%.

sliken · on Oct 30, 2024

Fewer libraries? Any that a normal LLM user would care about? Pytorch, ollama, and others seem to have the normal use cases covered. Whenever I hear about a new LLM seems like the next post is some mac user reporting the token/sec. Often about 5 tokens/sec for 70B models which seems reasonable for a single user.

vid · on Oct 30, 2024

Is there a normal LLM user yet? Most people would want their options to be as wide as possible. The big ones usually get covered (eventually), and there are distinct good libraries emerging for Mac only (sigh), but last I checked the experience of running every kit (stable diffusion, server-class, etc) involved overhead for the Mac world.

cjbprime · on Oct 30, 2024

Right, the nvidia card maxes out at 24GB.

vid · on Oct 31, 2024

A 24gb model is fast and ranks 3. A 70b model is slow and 8.

A top tier hosted model is fast and 100.

Past what specialized models can do, it's about a mixture/agentic approach and next level, nuclear power scale. Having a computer with lots of relatively fast RAM is not magic.

manaskarekar · on Oct 30, 2024

Thanks, but just to put things into perspective, this calculation has counted 8 channels which is 4 DIMMs and that's mostly desktops (not dismissing desktops, just highlighting that it's a different beast).

Most laptops will be 2 DIMMS (probably soldered).

wtallis · on Oct 30, 2024

Desktops are two channels of 64 bits, or with DDR5 now four (sub)channels of 32 bits; either way, mainstream desktop platforms have had a total bus width of 128 bits for decades. 8x64 bit channels is only available from server platforms. (Some high-end GPUs have used 512-bit bus widths, and Apple's Max level of processors, but those are with memory types where the individual channels are typically 16 bits.)

sliken · on Oct 30, 2024

I think you are confusing channels and dimms.

The vast majority of any x86 laptop or desktops are 128 bits wide. Often 2x64 bit channels up till last year or so, now 4x32 bit DDR5 in the last year or so. There are some benefits to 4 channels over 2, but generally you are still limited by 128 bits unless you buy a Xeon, Epyc, or Threadripper (or Intel equiv) that are expensive, hot, and don't fit in SFFs or laptops.

So basically the PC world is crazy behind the 256, 512, and 1024 bit wide memory busses apple has offered since the M1 arrived.

Y-bar · on Oct 30, 2024

> This is still half the speed of a consumer NVidia card, but the large amounts of memory is great, if you don't mind running things more slowly and with fewer libraries.

But it has more than 2x longer battery life and a better keyboard than a GPU card ;)