with regard to the epoll-and-accept method - why is the LIFO queue bad? sure, you're seeing more load on one core, but it must be finishing requests by the time it starts working on the next one - i.e not causing requests to stall as a result of the imbalance.
put that in the context of a multi-socket system, then core-caching and NUMA issues (even turbo clocking on an individual core) would seem to imply that LIFO is in fact exactly what you want, no?
Imagine a HTTP server supporting HTTP keepalives. Say 10 new connections come in. Say on average they will all go to a single worker. Then say all of the connections request a heavy asset.
You will end up with one worker handling these 10 connections/requests and spinning at 100% CPU while other workers idle. I'm not saying this bad load balancing is a problem affecting everyone, it very much depends on your load pattern.
If the first worker isn't actually idle, then the scheduler will assign the next incoming request to an idle worker. Am I mistaken? If not, what's the problem here?
Load assignment, with this architecture, happens only when a connection is first opened. HTTP keep-alive means there's a disconnect between when it's opened, and when it becomes expensive.
I.e. it's possible for one worker to first serve ten tiny requests (e.g. index.html), then wait while the clients chew on it, then have all ten clients simultaneously request a large asset.
I am not sure it's possible to solve this problem generally. Doing so would require the kernel be able to predict the future.
Also, IIRC this is a proxy. If you run out of CPU copying data between file descriptors before you run out of bandwidth, I'd be very surprised. I think sendfile(2) makes it especially cheap.
Sure there is. Pass around the connection socket as needed, or have a layer in front of the processing layer to only hand out the work when it is good to go.
Can you point to a working example of this that actually solves the problem? i.e., that is demonstrably more efficient for any given request than the FIFO wakeup method?
That's why I said "it's not possible to solve this problem generally." That is, there's no general solution to the problem, one that is optimal for all workloads.
As already pointed out, I imagine there could be some subtle benefit to delivering a majority of connections to a single process while it isn't overloaded, so even perfectly even load balancing can have some disadvantages.
majke, the benchmark you did with SO_REUSEPORT, was that against a NUMA system / recent kernel? I've completely lost track of it, but I think I remember reading that there has been some work on mapping the NIC Queues, Cores, and Workers together, allowing a flow to worker being able to be processed on the same core. I'm just curious if the benchmark shown was able to take advantage of that or not (and as an aside how much of a benefit mapping this together really has).
http isn't particularly my thing, so if i'm wrong, please correct me! however, i was of the impression that a keep-alive session wouldn't return to the accept stage - the socket is still open, surely?
Correct. The point is the worker may be relatively idle when it accepts() plenty of connections and the load only happens after that. When the requests on the connections start to flow.
You can imagine a situation when a single worker gets most of the traffic and runs out of cpu (while other workers idle).
Once solution would be that the workers pass each other the socket descriptor (using sendmsg()) once they notice that there are idle brethren and they have a lot of work in the queue.
So I think the magic piece of information missing from the article is that workers can have multiple open per-client sockets "on the go"; so if one worker gets all the client sockets, then you're not getting parallelism.
put that in the context of a multi-socket system, then core-caching and NUMA issues (even turbo clocking on an individual core) would seem to imply that LIFO is in fact exactly what you want, no?