Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wrote an extensive reply to this but unfortunately the HN servers restarted and lost it.

The TL;DR was that from where I stand, you’re doing nothing wrong.

In a previous client we ran Prometheus for months, then Thanos, and eventually we implemented Victoria Metrics and everyone was happy. It became an order of magnitude cheaper due to using spinning rust for storage and still getting better performance. It was infinitely and very easily scalable, mostly automatically.

The “non-compliant” bits of the query language turned out to be fixes to the UX and other issues. Lots of new functions and features.

Support was always excellent.

I’m not affiliated with them in any way. Was always just a very happy freeloading user.



I have deployed lots of metrics systems, starting with cacti and moving through graphite, kairosdb (which used Cassandra under the hood), Prometheus, Thanos and now Mimir.

What I've realised is that they're all painful to scale 'really big'. One Prometheus server is easy. And you can scale vertically and go pretty big. But you need to think about redundancy, and you want to avoid ending up accidentally running 50 Prometheus instances, because that becomes a pain for the Grafana people. Unless you use an aggregating proxy like Promxy. But even then you have issues running aggregating functions across all of the instances. You need to think about expiring old data and possibly aggregating it down into into a smaller set so you can still look at certain charts over long periods. What's the Prometheus solution here? MOAR INSTANCES. And reads need to be performant or you'll have very angry engineers during the next SEV1, because their dashboards aren't loading. So you throw in an additional caching solution like Trickster (which rocks!) between Grafana and the metrics. Back in the Kairosdb days you had to know a fair bit about running Cassandra clusters, but these days it's all baked into Mimir.

I'm lucky enough to be working for a smaller company right now, so I don't have to spend a lot of time tending to the monitoring systems. I love that Mimir is basically Prometheus backed by S3, with all of the scalability and redundancy features built in (though you still have to configure them in large deployments). As long as you're small enough to run their monolithic mode you don't have to worry about individually scaling half a dozen separate components. The actual challenge is getting the $CLOUD side of it deployed, and then passing roles and buckets to the nasty helm charts while still making it easy to configure the ~10 things that you actually care about. Oh and the helm charts and underlying configs are still not rock solid yet, so upgrades can be hairy.

Ditto all of that for logging via Loki.

It's very possible that Mimir is no better than Victoria Metrics, but unless it burns me really badly I think I'll stick with it for now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: