Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
rishabhjain1198
on Dec 8, 2023
|
parent
|
context
|
favorite
| on:
Mistral "Mixtral" 8x7B 32k model [magnet]
In a MoE model with experts_per_token = 2 and each expert having 7B params, after picking the experts it should run as fast as the slowest 7B expert, not a comparable 14B model.
nullc
on Dec 9, 2023
[–]
Only assuming it's able to hide the faster one in free parallelism.
moffkalast
on Dec 9, 2023
|
parent
[–]
My CPU trying its best to run inference: parallelwhat?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: