Q4 comes out to be ~26GB but Apple doesn't let you load it on a 32GB Mac machine...

endymi0n · on Dec 20, 2023

There’s a brand new hybrid quantization for Mixtral out that uses 4b for shared neurons and 2b for experts, which does not bleed much perplexity, but fits it into a 32G machine. Haven’t had it in hand yet and no link here on mobile, but can’t wait to try.

astrange · on Dec 20, 2023

sysctls aren't a hack exactly, it's there so you can change it.

As for why it's not the default, it's mostly because wiring all your memory will crash the computer pretty fast.