In a MoE model with experts_per_token = 2 and each expert having 7B params, afte... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		rishabhjain1198 on Dec 8, 2023 \| parent \| context \| favorite \| on: Mistral "Mixtral" 8x7B 32k model [magnet] In a MoE model with experts_per_token = 2 and each expert having 7B params, after picking the experts it should run as fast as the slowest 7B expert, not a comparable 14B model.

nullc on Dec 9, 2023 [–]

Only assuming it's able to hide the faster one in free parallelism.

moffkalast on Dec 9, 2023 | [–]

My CPU trying its best to run inference: parallelwhat?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact