I've read all this and I saw no description of failure modes and operationally h...

mrkeen · on March 13, 2024

To be fair, I think it's fine to ignore some of the implementation details about restarting a failed node. You can probably assume some kind of replicated log that all distributed systems use.

And you can also give the benefit of the doubt when allowing some number (less than the quorum) of nodes to fail (and letting them restart and catch up, etc.) while the system still makes progress ("CA mode"). After all, that's the point of distributing a system in the first place - there's no one master which can die and bring down the system.

But yeah, at this point I think OP is just going to keep implying that partitions don't happen or something...