Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Virtual machines perform extremely poorly, so you must take metal instances. These will cost you the same as buying the hardware outright after 3 months of usage.

And you're still stuck on a non-deterministic high-latency network you can't get rid of, and with very limited hardware configurations.

It's more like a grid than a HPC cluster.

There are only two possible advantages:

- you want a lot of hardware very quickly rather than wait for it to be delivered.

- you don't have the desire/capability to be/hire a network engineer.



When you say "virtual machines perform extremely poorly", on what do you base that?

(note: I've worked in supercomputing and HPC for over two decades"

The network I was talking about is called UltraCluster which have an extremely high bandwidth and low latency, designed to get great scaling on MPI jobs (as well as ML). Typical instances used with UC are p5, which have 8 H100 nvidia GPUs, 192 vCPUs, 2TB RAM, 3.2Tbps bandwidth PER MACHINE, 900GB/sec between GPU peers, and 8 3.84TB SSDs. They are not marketed as metal instances.

No, it's not like a grid. Your thinking is dated and not representative of how people do HPC on AWS, Azure, or Google.


Azure has RDMA, though with slightly high quoted latency (I don't know the message rate), and tightly-coupled stuff appears to scale: https://techcommunity.microsoft.com/t5/azure-compute-blog/hp...

It seems how people do HPC on AWS is limited by what AWS can do (and maybe costs). Our experience was that even the elastic feature wasn't, and we often couldn't get resources anyway.

Maybe dated, but for context, we had 2TB and 128 real cores a decade ago, and I currently work with Summit-type hardware; I'd rather not admit after how long.


> 3.2Tbps bandwidth PER MACHINE

Looking into the Ultraclusters page you linked to in a sibling comment, it seems like the host machines pretty much fill out their PCIe connections with Infiniband networking to reach that figure:

    EFA is also coupled with NVIDIA GPUDirect RDMA (P5, P4d) and
    NeuronLink (Trn1) to enable low-latency accelerator-to-accelerator
    communication between servers with operating system bypass.
https://aws.amazon.com/ec2/ultraclusters/


I base that on running code on m5-type instances.

If you care about correct NUMA and HyperThreading usage, and even more so if you care about latency on the CPU (for example for real-time trading), the only things that perform well are either metal of full-machine-but-with-hypervisor.


Obviously nobody is going to run workloads that need to exploit these kinds of things on ECS instances. But these workloads are niche, not normal. Most code that's written and deployed to some notion of "production" is not CPU bound, it is I/O bound.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: