I remember Jim Woodcock as really inspirational - he was working with my PhD supervisor in 1987. We were working on a variant of Z for specifying what, today, we would call CRDTs. I was also lucky enough to meet Tony Hoare the same year and discuss those concepts.
Jim is an amazing guy. One of the rare people who are absolutely brilliant in their respective field, and are equally good at teaching the subject. He's also just a really kind, nice person who is delightful to chat with, though that's true of pretty much anyone in York [1].
I also think his book "Software Engineering Mathematics" [2] is an extremely approachable book for any engineer who wants to learn a bit more theory.
As I said, my dropped PhD is not a failure in any capacity from my advisors or the school, mostly just life juggling stuff.
[1] I don't know why exactly, but of all the places I've been, York has the highest percentage of "genuinely nice" people. It's one of my favorite spots in the UK as a result.
Don't remember that, but I do remember the one for Gupta SQLWindows. I had a discussion about with its designer on reddit some years back, and he admitted it was a horrible mistake. Programs are text, folding is not a great way to represent text.
Yes it's true. At Ably we support websockets, SSE and comet fallbacks (simple long-polling and streamed long-polling). It's less and less common but there are firewalls that fail to handle websockets correctly, or simply block them. I can't name specific companies/examples, but call centers are one example - the network and desktop environments are fully locked down.
We also see in these cases that streamed HTTP can also be broken by the firewall - for example a chunked response can be held back by the firewall and only forwarded to the client when the request ends, as a fixed-length response. Obviously that breaks SSE and means you can't just use streamed comet as a fallback when websockets don't work.
Disclaimer: I work for Ably.
I agree that it's not valid to claim that websockets are the best fit in every situation. If the requirement is simply to stream server to client events, in an application where that doesn't depend at all on having any per-client state, then you can serve those things statelessly with SSE or HTTP/2. However, much of the time, the thing that makes websocket implementations complex isn't anything to do with websockets per se, but the fact that the server has per-client state; this state needs to exist somewhere in the backend, and it needs to survive reconnections (and so can't simply be some ephemeral session state in the specific server handling a connection). In this case, you could be using HTTP/2, or comet over HTTP1, or websockets, and the main technical issues you will face scaling to millions of users are going to be essentially the same. Websockets (at least, until web transport is here) just give you a more convenient and efficient primitive.
This is perfect example for my remark. With HTTP/2 and SSE you use the exact same feature of your framework to have server side state that would solve this between requests or in any other situation, there is no new problem to solve! Apart from that fact, server side per connection state is usually an anti-pattern, not a common situation as you put it. As far as i can think right now, the only situations where this is required and cannot be solved with better caching would be the same bi directional high frequency streaming situations already mentioned.
So then how would you solve it without state? For someone like me who is new to the reading these comments I hear a lot of bickering / arguments but no provided solution
Disclaimer: I work for Ably. I agree in principle, so the libraries that handle websockets and also fallback transports using comet (eg SocketIO) are still widely used for that reason, and the commercial pub/sub service providers generally also support comet fallbacks. However, we now find that it is really very rare that clients are unable to use wss.
WSS breaking is surprisingly common for clusters of our enterprise users, to the order of 5-10%. Specifically, when business users connect through security networks like Zscaler, often their employers will MITM all connections (similar to how AdGuard works) but in a way that breaks WSS. We have rigorous monitoring for both frontend and backend, and can trace these failures with accuracy - consistently, both the frontend and the ingress firewall think that the other one cancelled the connection attempt, so some network hop in-between did that. Every other network connection during that time works as expected (long polling/SSE over H2), but WSS get interrupted and sometimes can't reconnect. We still love the aliveness of web sockets, but have to architect data fetching in a way that doesn't depend on them working.
China and LATAM are definitely overrepresented. We deal with conglomerates and manufacturers, so it's vaguely industrial + heavy industry, but all these users are corporate folks using standard-issue BigCo laptops (almost certainly loaded with standard "security" layers from a dozen different vendors, making detailed diagnosis nearly impossible for us as a SaaS provider).
This ^ is my favourite writeup on the question of how you implement SOC2. I wish I had read that before we started - after going through the Type 1 and Type 2 process, we've ended up with the same conclusions. I've lost count of the number of times I've recommended that.
Our experience (global b2b customers, heavily skewed to NA) is that SOC2 Type 2 is the most frequently requested/expected standard, and if you have that, not having ISO is very rarely a dealbreaker. Neither makes the security questionnaires go away; they continue to be mandatory, require expert input, and are a significant drain on time. However, having SOC2 and/or ISO does mean that you've already thought of the answers to the questions and you'll have a defensible position, backed up by a track record of independent audits, when your particular approach doesn't meet the "gold" standard implied by the questionnaire. (Edit: typo)
As a ton of engineers have transitioned to kube already. Like years ago, you have just made ramping up new engineers to your custom setup a pain point for scaling. But you probably already knew that. =)
We've been experimenting with Liftbridge at a range of scales and workloads, and clusters that are distributed across a single region perform very well. However, it's not really suitable for operating a cluster distributed across multiple regions because the latency adversely impacts the bandwidth of the Raft protocol that's used for cluster-wide coordination; this impacts the rate at which streams can be created, and the rate at which changes to In-Sync Replica sets (ISRs) can be propagated. As suggested, a federation of clusters, each covering a single region, would be required.
I can't speak for NATS streaming or Akka but it does look like liftbridge can fit that niche very well, depending on what features and semantics you need. Your services that publish can be assured that a message has been persisted to a quorum of replicas (strictly, the in-sync replicas) before receiving an acknowledgement, so you are able to assure that all onward processing to your downstream services occurs at least once for all acknowledged messages.
If you require fanout to your stream consumers then this is supported simply by having multiple consumers subscribe to each stream. If you require a competing consumers model then this isn't directly supported, but you can have load-balance groups which you could use to achieve something similar, but with a static configuration - that is, your consumer population can't resize dynamically, but you can distribute messages to a fixed set of streams, each of which has a consumer.
liftbridge doesn't yet support durable consumer-offset checkpointing, although this is advertised as being on the roadmap. In the meantime, if you wanted the consumer offset to survive consumer restarts, then the consumer itself would need to take care of persisting its current offset.
This is correct. Also, I'm planning to introduce a mode that runs NATS "embedded" within the Liftbridge server, such that NATS basically becomes an implementation detail of Liftbridge.
Re: consumer-offset checkpointing, this will be implemented in the form of consumer groups, similar to how Kafka works where a consumer group checkpoints its position in the log by writing it to the log itself.