You can limit both the CPU shares and memory available to the app. (And network ...

tyingq · on May 15, 2017

Serving up http errors is more polite to the end clients. Also, probably results in fewer people trying reload to get it working.

viraptor · on May 15, 2017

That part already exists in nginx. Set up backend timeout and make a custom 502 page. It will kick in when the app is too busy.

tyingq · on May 15, 2017

There's an avalanche problem though. Serving them up immediately when you know you're unhealthy shortens the recovery time.

I don't know that this module is perfect for every situation, but I do know that timeouts don't work great when there's a general overload. Timeouts leave a higher number of concurrent connections there.

I'd prefer a sort of priority queue setup where http clients that already "got in", "stay in", and newer connections are pushed away. Rate limiting per client first might be better as well.

viraptor · on May 15, 2017

I used the timeout as a general term. You can have a timeout of 0 and N app workers. (see fail_timeout) If they're all busy, you would serve 502 immediately. Nginx can do the static file serving likely orders of magnitude faster than your app.

tyingq · on May 15, 2017

You could do that if there's a strong correlation of "N app workers" always being the edge of overloaded. Many apps, though, don't work that way...some app requests are heavyweight, others aren't.

I take your point that this module isn't a panacea. For several things I've seen neither your approach or this approach would work well. An api gateway that throttled per client would make more sense. This sysguard module, though, came from Alibaba...I imagine they are pretty sharp and not prone to making things they don't need.

viraptor · on May 15, 2017

That's a specific architecture I'm describing, sure. We could go much deeper with workers per route, greenthreaded apps, and many other things. But I still don't think that affects my main question: given a choice between reacting to crossing some threshold and enforcing a threshold you cannot cross - what's the benefit of the first one?

tyingq · on May 15, 2017

I don't think they are mutually exclusive. This module can be applied per path too. So I could set it up to turn off something heavyweight, but optional, like a search function, when CPU is high.

It also appears to be able to kick off any action, not just 503. So you could have it conditionally enact limit_req on heavyweight requests. That seems to match your desired approach.