I think the two projects focus on different things.
Consul provides features like health checking, failure detection besides its consistent key-value store. It aims to provide an all-in-one solution[0].
etcd focuses on the consistent key-value store. The key-value store has more advanced features like multi-version keys, reliable watches, and provides better performance. People build additional features on top of etcd's key-value store/raft.
Note that the testing environments are not exact the same in the doc I listed, but comparable. Also Consul performance is improved after a few releases.
So we did the benchmark internally on the same environment. The result is still comparable to what I listed in the two official docs.
The best way to compare performance is still probably to run the benchmark on your own environment.
I've used Consul but I haven't used Etcd directly, besides using via Kubernetes. One benefit IMO is the number of high-profile eyeballs on the project due to the success of K8s and the pedigree of engineers working on it.
Does someone have any benchmarks or comparisons with Zk? We have run it for many yrs without a problem and are very happy with it. Would be interested in hearing from anyone who switched from zk->etcd for distributed locking, presence, leader election type idioms and ran it in prod for a few months/yrs and their takeaways.
etcd3 has similar performance compared to ZK for small scale. For large dataset, etcd3 does better since it does incremental snapshot/ smaller memory footprint, when ZK does full snapshot that takes a lot of resources. For watches, etcd3 does streaming + TCP multiplexing to save memory.
I too have been using ZK for many years now, and it's pretty great.
Etcd can provide faster election notifications but it comes at the cost of etcd still being pretty new, so be prepared to get cut by the bleeding edge :)
Apples all the way. Etcd is pretty much a clone of ZooKeeper in Go; they both support hiearchical keys, atomic autoincrement, watches, though Etcd uses the Raft consensus algorithm, whereas ZooKeeper uses its own homegrown algorithm, and there are other minor differences. Both are intended for configuration management and coordination. (Of course, both are ultimately clones of Google's internal tool, Chubby.)
As I understand it, both Paxos and Zab will result in similar performance characteristics, since writes need, by design, to be coordinated with peers and serialized in a strict manner. In this sense, Etcd and ZK are very much alike, irrespective of how they are implemented internally. I wouldn't be surprised if Etcd was found to be faster and more scalable than ZK, however.
It really depends on how you view the problem. Yes, the latency of agreeing a proposal is similar, which is limited by physical (network latency + disk io). However, there are ways to put more stuff into one proposal (batching) and submitting proposals continuously (pipelining). These optimizations highly depend on the implementations.
Your point is well-taken. This qn comes up often enough for similar use cases and hence I asked if someone has data to support an arg. Considering the criticality of Zk in the larger stack and our success and operational expertise with it, it has to take significant convincing to switch to something like etcd.
The following two paragraphs from the etcd project site seem to hint that they're trying to target overlapping use cases:
"Your applications can read and write data into etcd. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. These values can be watched, allowing your app to reconfigure itself when they change.
Advanced uses take advantage of the consistency guarantees to implement database leader elections or do distributed locking across a cluster of workers."
Great question. The upgrade path is a rolling upgrade from v2.3.y series to v3.0.0 series. This is how all of the etcd upgrades have worked since the start of v2.x.y.
Seems worth noting as well that this only upgrades the version of the cluster. Data populated via the v2 API will not magically be available via the v3 API as they have separate data stores/keyspaces. https://github.com/coreos/etcd/blob/v3.0.0/Documentation/op-... talks about how to migrate data that was stored with v2 to v3's data store.
Etcd looks more and more promising as its usage and development activity increases. Anyone using it internally as a standalone part in the system (e.g. not just for k8s or coreos)?
Using e.g. gRPC shows great promise, but systems like ZooKeeper still play nicer in more traditional Java shops, or do they? How hard is it to use etcd from the JVM?
There's a project called zetcd[0] that acts as a translation proxy in front of etcd to let you use the ZooKeeper API. I don't know if it's production ready, but I do know it works and is a pretty cool idea.
I would tend to agree, knowing what zookeeper has been doing and actually using zookeeper (and etcd) I can say that the API and the primitives offered by zookeeper are IMHO better (although this multi-version concurrency control model is interesting) and more mature.
It feels like etcd is 'still discovering itself' for lack of better words.
Are there high level APIs for etcd like there are for ZooKeeper? I'm the main author of Apache Curator and I know that writing "recipes" is not trivial.
It would be great if you can provide opinions, comments or help on these high level APIs. We also might move these to an internal proxy layer, so that other clients in other language can use it more easily.
We are working on it (https://github.com/coreos/etcd/issues/5067). Probably we could work together on the Java client first? It should not be hard given that gRPC supports Java.
etcd is based on Raft. Raft has a TLA+ spec. But do note that the implementation usually diverges from its algorithm [0].
For etcd, we try to keep the core algorithm as self-contained and deterministic (no I/O, no timer) as possible. So it can be very close to the pure algorithm. We are very confident about it since we throughly tested it, and the implementation is shared with other consistent large scale database systems too (cockroachdb, tikv).
ZooKeeper uses ZAB [1] under the hood. I do not think there is a TLA+ for ZAB.
Do you (or people you know?) plan on publishing peer reviewed papers for how the lease algorithm works, how the multi-version concurrency control model works (and so on)?
None of these are afaik things built in to raft but are add-ons that etcd has created (that may or may not even use raft).
It'd be great to instill confidence in these add-on features by having them peer-reviewed in a similar manner as raft is/was/has been.
I'm aware that implementations diverge or do not directly synthesize from their specifications. At least not yet.
I'll be poking around more to see if the etcd team has published their own specs for the novel parts of the system. I'm particularly interested in seeing if/how OSS projects are adopting more rigorous "engineering" practices.
I'd be very interesting in hearing more about this.
I tried using both the Consul and etcd Raft implementations as libraries. I found the Consul library much easier to interface with. But it was my impression that the etcd library was much more tested in the real-world, with big projects like Kubernetes and with the library being embedded into projects like CockroachDB. I also wasn't sure if the details that the Consul implementation was hiding were actually important.
This doesn't look like feature creep, but coreos bought into systemd big time, and with etcd being used in more places, the temptation of feature creep grows... and that's worrying.
Last time I looked into Etcd, I got the impression that it was great at handling a high-volume of read operations (theoretically read-scalability) but bad for handling high-volume of write operations (since every write has to be propagated to every node in the cluster via Raft)? Is this still the case in v3?