Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Etcd v3: increased scale and new APIs (coreos.com)
189 points by philips on June 30, 2016 | hide | past | favorite | 53 comments


Sounds interesting! What are the benefits of using Etcd over Consul? (https://www.consul.io/)


I think the two projects focus on different things.

Consul provides features like health checking, failure detection besides its consistent key-value store. It aims to provide an all-in-one solution[0].

etcd focuses on the consistent key-value store. The key-value store has more advanced features like multi-version keys, reliable watches, and provides better performance. People build additional features on top of etcd's key-value store/raft.

(I work on etcd)

[0] https://www.consul.io/intro/vs/index.html


Do you have proof of the "better performance" claim?


Consul: https://github.com/hashicorp/consul/blob/master/bench/result...

etcd: https://github.com/coreos/etcd/blob/master/Documentation/op-...

Note that the testing environments are not exact the same in the doc I listed, but comparable. Also Consul performance is improved after a few releases.

So we did the benchmark internally on the same environment. The result is still comparable to what I listed in the two official docs.

The best way to compare performance is still probably to run the benchmark on your own environment.


I've used Consul but I haven't used Etcd directly, besides using via Kubernetes. One benefit IMO is the number of high-profile eyeballs on the project due to the success of K8s and the pedigree of engineers working on it.


k8s works with etcd only. Consul is better in term of features.


Does someone have any benchmarks or comparisons with Zk? We have run it for many yrs without a problem and are very happy with it. Would be interested in hearing from anyone who switched from zk->etcd for distributed locking, presence, leader election type idioms and ran it in prod for a few months/yrs and their takeaways.


etcd3 has similar performance compared to ZK for small scale. For large dataset, etcd3 does better since it does incremental snapshot/ smaller memory footprint, when ZK does full snapshot that takes a lot of resources. For watches, etcd3 does streaming + TCP multiplexing to save memory.


I too have been using ZK for many years now, and it's pretty great.

Etcd can provide faster election notifications but it comes at the cost of etcd still being pretty new, so be prepared to get cut by the bleeding edge :)


etcd is a key value store that has features of ZK. ZooKeeper is strictly distributed coordination. Apples and oranges, no?


Apples all the way. Etcd is pretty much a clone of ZooKeeper in Go; they both support hiearchical keys, atomic autoincrement, watches, though Etcd uses the Raft consensus algorithm, whereas ZooKeeper uses its own homegrown algorithm, and there are other minor differences. Both are intended for configuration management and coordination. (Of course, both are ultimately clones of Google's internal tool, Chubby.)


I agree that they are similar in some ways, but under the covers they are fundamentally different beasts in almost all ways!!!

Etcd uses Raft and ZooKeeper uses it's own protocol called Zab [0]. Zab shares some characteristics with Paxos but certainly IS NOT Paxos.

[0] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs...


As I understand it, both Paxos and Zab will result in similar performance characteristics, since writes need, by design, to be coordinated with peers and serialized in a strict manner. In this sense, Etcd and ZK are very much alike, irrespective of how they are implemented internally. I wouldn't be surprised if Etcd was found to be faster and more scalable than ZK, however.


It really depends on how you view the problem. Yes, the latency of agreeing a proposal is similar, which is limited by physical (network latency + disk io). However, there are ways to put more stuff into one proposal (batching) and submitting proposals continuously (pipelining). These optimizations highly depend on the implementations.


I believe ZK implements Pacis. Maybe someone not on mobile can correct me / provide reference.


ZK is zab - similar to Paxos but not Paxos


Your point is well-taken. This qn comes up often enough for similar use cases and hence I asked if someone has data to support an arg. Considering the criticality of Zk in the larger stack and our success and operational expertise with it, it has to take significant convincing to switch to something like etcd.

The following two paragraphs from the etcd project site seem to hint that they're trying to target overlapping use cases: "Your applications can read and write data into etcd. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. These values can be watched, allowing your app to reconfigure itself when they change.

Advanced uses take advantage of the consistency guarantees to implement database leader elections or do distributed locking across a cluster of workers."


It's not mentioned in the blog post—is there a document that explains the migration plan for the etcd that ships on the host in CoreOS?

Edit: They haven't been published to the docs on the CoreOS website yet, but there are two documents listed under "upgrading and compatibility" at the bottom of https://github.com/coreos/etcd/blob/v3.0.0/Documentation/doc...


Great question. The upgrade path is a rolling upgrade from v2.3.y series to v3.0.0 series. This is how all of the etcd upgrades have worked since the start of v2.x.y.

Doc is here: https://github.com/coreos/etcd/blob/master/Documentation/upg...


Seems worth noting as well that this only upgrades the version of the cluster. Data populated via the v2 API will not magically be available via the v3 API as they have separate data stores/keyspaces. https://github.com/coreos/etcd/blob/v3.0.0/Documentation/op-... talks about how to migrate data that was stored with v2 to v3's data store.


Etcd looks more and more promising as its usage and development activity increases. Anyone using it internally as a standalone part in the system (e.g. not just for k8s or coreos)?

Using e.g. gRPC shows great promise, but systems like ZooKeeper still play nicer in more traditional Java shops, or do they? How hard is it to use etcd from the JVM?


There's a project called zetcd[0] that acts as a translation proxy in front of etcd to let you use the ZooKeeper API. I don't know if it's production ready, but I do know it works and is a pretty cool idea.

[0]: https://github.com/chzchzchz/zetcd


It should not be hard. For v2 API, there are several Java bindings (https://github.com/jurmous/etcd4j).

For the gRPC API, it is easy to generate a Java gRPC client based on the defined service. We have plans to make that experience better.


Etcd looks more promising?

Zookeeper has been around now for over 5 years with an extremely large install base.

Kafka, Hadoop, Solr, Mesos and Hbase projects that leverage Zookeeper distributed coordination.

Zookeeper has already delivered.


I would tend to agree, knowing what zookeeper has been doing and actually using zookeeper (and etcd) I can say that the API and the primitives offered by zookeeper are IMHO better (although this multi-version concurrency control model is interesting) and more mature.

It feels like etcd is 'still discovering itself' for lack of better words.

Btw: https://issues.apache.org/jira/browse/ZOOKEEPER-2169 (this is the equivalent of TTLs for zookeeper).


Seems pretty easy to use etcd from a JVM, particularly now that it is gRPC based.


I'm not sure if I can disclose our product details, but we're using etcd2 as a consensus algorithm for a data storage cluster.


Are there high level APIs for etcd like there are for ZooKeeper? I'm the main author of Apache Curator and I know that writing "recipes" is not trivial.


Yes. Here: https://godoc.org/github.com/coreos/etcd/clientv3/concurrenc...

It would be great if you can provide opinions, comments or help on these high level APIs. We also might move these to an internal proxy layer, so that other clients in other language can use it more easily.

(some more here: https://github.com/coreos/etcd/tree/master/contrib/recipes)


I assume there's a Java API for etcd? If so, it would be interesting to try to port Curator.


We are working on it (https://github.com/coreos/etcd/issues/5067). Probably we could work together on the Java client first? It should not be hard given that gRPC supports Java.


Do they publish formal specifications of these distributed algorithms?

Has anyone verified the implementation?

Zookeeper is at least based on Paxos which has a TLA+ model one can check.


etcd is based on Raft. Raft has a TLA+ spec. But do note that the implementation usually diverges from its algorithm [0].

For etcd, we try to keep the core algorithm as self-contained and deterministic (no I/O, no timer) as possible. So it can be very close to the pure algorithm. We are very confident about it since we throughly tested it, and the implementation is shared with other consistent large scale database systems too (cockroachdb, tikv).

ZooKeeper uses ZAB [1] under the hood. I do not think there is a TLA+ for ZAB.

[0] https://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/...

[1] http://www-users.cselabs.umn.edu/classes/Spring-2016/csci821...


Do you (or people you know?) plan on publishing peer reviewed papers for how the lease algorithm works, how the multi-version concurrency control model works (and so on)?

None of these are afaik things built in to raft but are add-ons that etcd has created (that may or may not even use raft).

It'd be great to instill confidence in these add-on features by having them peer-reviewed in a similar manner as raft is/was/has been.


Raft - cool!

I'm aware that implementations diverge or do not directly synthesize from their specifications. At least not yet.

I'll be poking around more to see if the etcd team has published their own specs for the novel parts of the system. I'm particularly interested in seeing if/how OSS projects are adopting more rigorous "engineering" practices.


I took a look at the etcdv3 raft.go code, it is indeed pretty nicely written! Definitely better than any other one i've seen.

(though i'm pretty excited about the raft impl my summer intern and I are hacking on currently :) )


In Haskell, I presume? Will it be open sourced? :)


Actually doing in Agda first. Coinductive programming is a delight with Agda 2.5.

But yes that's the plan.


Consul's Raft implementation is leagues better than etcd's, unfortunately.


I'd be very interesting in hearing more about this.

I tried using both the Consul and etcd Raft implementations as libraries. I found the Consul library much easier to interface with. But it was my impression that the etcd library was much more tested in the real-world, with big projects like Kubernetes and with the library being embedded into projects like CockroachDB. I also wasn't sure if the details that the Consul implementation was hiding were actually important.


Is that still the case with Etcd v3? I remember reading somewhere that they were rewriting the Raft implementation in Etcd a good while ago.



We also do a ton of functional testing as code is merged to test lots of different partitions and faults. You can read more on this post: https://coreos.com/blog/new-functional-testing-in-etcd/


Zookeeper does not use Paxos. It uses its own protocol, Zab.

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs...


This doesn't look like feature creep, but coreos bought into systemd big time, and with etcd being used in more places, the temptation of feature creep grows... and that's worrying.


Will CORS ever be enabled for the https://discovery.etcd.io/new endpoint?


This is the GitHub issue for this service. Please +1 the thing on GitHub and we will try and get it fixed in prod: https://github.com/coreos/discovery.etcd.io/issues/12


Wait... there are maintainers that encourage +1s all over their GitHub issues?


Probably referring to the new "reactions" feature. https://github.com/blog/2119-add-reactions-to-pull-requests-...


It's possible to +1 comments now without adding a new comment (and notifying maintainers)


Last time I looked into Etcd, I got the impression that it was great at handling a high-volume of read operations (theoretically read-scalability) but bad for handling high-volume of write operations (since every write has to be propagated to every node in the cluster via Raft)? Is this still the case in v3?


[flagged]


We detached this subthread from https://news.ycombinator.com/item?id=12011247 and marked it off-topic.


Cracks you up?

This a feature that has been missing since etcd's introduction and this article on a major new version makes no mention of it.

Are we really so speech-restricted here that we can only discuss a press release in the company's own terms?

The question is highly relevant to the etcd product and its future.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: