Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I monitor periods between naps. The longer I get naps the happier I am :)

Seriously though, the server itself is not the part that matters, what matters is the application(s) running on the server. So it depends heavily on what the application(s) care about.

If I'm doing some CPU heavy calculations on one server and streaming HTTPS off a different server, I'm going to care about different things. Sure there are some common denominators, but for streaming static content I barely care about CPU stuff, but I care a lot about IO stuff.

I'm mostly agnostic to push vs pull, they both have their weaknesses. Ideally I would get to decide given my particular use case.

The lazy metrics, like you mentioned, are not that useful, like another commenter mentioned, "free" ram is mostly a pointless number, since these days most OS's, wisely use it for caching. But information on the OS level caching can be very useful, depending on the work-loads I'm running on the system.

As for agents, what I care about is how stable, reliable and resource intensive it is. I want it to take zero resources, rock solid and reliable. Many agents fail spectacularly at all 3 of those things. Crowdstrike is the most recent example of failure here with agent based monitoring.

The point of monitoring systems to me are two-fold:

  *  Trying to spot problems before they become problems(i.e. we have X days before disk is full given current usage patterns).

  *  Trying to track down a problem as it is happening(i.e. App Y is slow in X scenario all of a sudden, why?).
Focus on the point of monitoring and keep your agent as simple, solid and idiot proof as possible. Crowdstrike's recent failure mode was completely preventable had the agent been written differently. Architect your agent as much as possible to never be another Crowdstrike.

Yes I know Crowdstrike was user machines, not servers, but server agent failures happen all the time too, in roughly the same ways, they just don't make the news quite as often.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: