Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Does it mainly happen at global corps that run essential internet infrastructure?

Being on-call happens at scale because machines are not to be trusted, and the limits of our skulls meet their match when machines vastly outnumber people. When I worked at Amazon S3, we had rotations where people had physical pagers. On-call was a duty, and everyone shared a rotation.

Honestly, I thought it was an awesome responsibility that there I was at 2 AM and some stupid shit happened in dublin and I got to fix it. (Or at least try to fix it as my first on-call shift was rough)

I think Amazon is one of the few companies that get this right with their cultures because you can own efforts that people use. It's not going to be something discarded after a year unless a massive rewrite is planed. The massive rewrites can be super fun (but hard) as well because you have to do it in-flight.

The reason Amazon gets this right is that you can get a bit of the stake on the outcome, and this is awesome. For instance, I designed how Amazon S3 does URL rewrites on the website endpoint. It's a small feature, but I lead the effort from start to finish. For all practical purposes, that feature is most likely going to last until the end of time. However, its that relationship between myself as an engineer with the customer that makes it worth it (at least emotionally).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: