Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Slurm is definitely still dominant, but OpenAI has been using k8s for training for many years now¹, and there are various ways to run slurm on top of Kubernetes, including the recent SUNK from coreweave²

at my company we use slurm "directly" for static compute we rent or own (i.e. not in a public cloud), but are considering using Kubernetes because that's how we run the rest of the company, and we'd rather invest more effort into being better at k8s than becoming good slurm admins.

¹: https://openai.com/index/scaling-kubernetes-to-2500-nodes/

²: https://www.coreweave.com/blog/sunk-slurm-on-kubernetes-impl...



Very cool! Thank for this, claytonjy!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: