Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know people running ES at PB scale. You definitely need to know what you are doing of course but it is entirely possible. There are definitely companies doing this. Any operations at this scale need planning. So, I don't think you are being entirely fair here.

I'm not saying upgrading will solve all your problem but I sure know of a lot of problems I've had with 1.7 that are much less of a problem or not a problem at all with 6.x.

Reindexing is indeed time consuming. However, if you engineer your cluster properly, you should be able to do it without downtime. For example, I've used index aliases in the past to manage this. Reindex in the background, swap over to the new index atomically when ready. Maybe don't reindex all your indices at once. Also they have a new reindex API in es 6 to support this. At PB scale of course this is going to take time. I've also considered doing re-indexing on a separate cluster and using e.g. the snapshot API to restore the end result.



In our case at Meltwater we also have new documents as well as updates to old documents coming in. Meaning older indexes are actively being modified with low latency requirements on visibility and consistency of those modifications. This makes it more tricky to both reindex as well as doing a seamless/no-downtime-whatsoever upgrade to a new major version. It's not infeasible at all though, and we're working on it. Reindexing a PB of data should be possible on a matter of weeks based on our estimates, but we shall see!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: