This looks really interesting, I wonder how they will monetize it though. As an ...

fulmicoton · on May 7, 2021

We really need to make this clear in our next blog post. This is not grep here. We are using the same datastructure that are used in Elasticsearch or google.

We just adapted them to be object storage friendly. I would not call Object Storage dumb by any mean. They are a very powerful bottom-up abstraction.

We do manage to get SSD-like throughput from them. The latency is the big issue. We had to redesign our search to reduce the number of random read in the critical to the bear minimum.

heipei · on May 7, 2021

Appreciate the response. I wasn't trying to say this is grep, I fully understand that this is an inverted index which is way more interesting to build on top of S3.

I merely wanted to say that by using S3 within AWS you always have the fallback option of brute-force "grep" across your semi-structured "data lake" or whatever it's called thanks to the aggregate bandwidth and Athena.

fulmicoton · on May 7, 2021

Ah my bad! Yes, Humio (and Loki) are opting for this approach.

This does decouple compute and storage in a trivial manner. There is indeed a realm in which this brute force approach is the best approach.

We could probably make a 4D chart with QPS, data size, latency, and retention period and define regions where the elastic/SOLR approach, Humio, and quickwit are the most relevant.