Samza's author really opened my mind related to how useful stream process is to high performance data processing [1] and I think Samza's only bummer (for me, personally, today) is it's lack of support for non-JMV languages [2]
I totally agree! I can't wait for Martin Kleppmann's "Designing Data-Intensive Applications" to be complete. I've read the chapters available through the early release and highly recommend the book based on what I've seen so far!
I think this is the first post I've seen about Samza from a non-LinkedIn team. I'd love to hear any details about the Samza experience - it seems like it should be the logical choice for Kafka users, but there's not much out there about it.
I don't think Confluent really counts as "non-LinkedIn team" :)
"Jay is co-founder and CEO at Confluent. Prior to Confluent, Jay Kreps was the initial developer on several open source projects, including Apache Kafka, Apache Samza, Voldemort. He was the lead architect for data infrastructure at LinkedIn."
This is long long long overdue in SOLR. The percolator feature is what has made me stick with ElasticSearch for the past 3 years, and has contributed to its increasing popularity over SOLR.
Bit offtopic, but does anyone have suggestions for simple full text search engines? I basically want something for names, just a couple of thousand, nothing fancy. Setting up something like ElasticSearch seems like overkill (and is quite hungry for specs as well). I was thinking about simply hacking something together with Redis and Python, but i suppose someone might have a better solution.
Sorry if come off as naive (as I don't really understand what a 'full text search' is), but for a couple thousand names wouldn't grep with regex wildcards suffice?
That doesn't sound like a bad suggestion at all. I think it might get bigger over time, or i might have aliases for the names, and in the end using just regex i'll probably hit a ceiling sooner than later.
If you're using PostgreSQL you can take advantage of its full text search support. When doing so, make sure to save the text search vectors in a physical column as otherwise queries will be quite slow.
I would definitely take that option if we would use PostgreSQL, but we don't. PostgreSQL sounds like a wonderful piece of software, but they need to work on making it more accessible and user friendly. I tried getting something up and running, but given that there isn't a proper native client for Mac i gave up :(
ElasticSearch isn't that difficult and the new guide they published a few months ago walks you through getting it set up without going too in-depth [1]. As a beginner I had it running on a $5 digitalocean droplet within a couple hours and indexed way over "a couple thousand" documents before the end of the day.
But if your needs are really that simple, MySQL does support full-text search [2].
Yes, i've looked into ES before, and if my usecase would be anything more complicated i would definitely go for that route. However, it's still quite some work to setup, and they recommend at least 8GB of RAM for a working instance [1].
I didn't know MySQL had fulltext search as well. I'll look into that. Thanks.
I've worked a bit with a nice search library in Go called bleve[1]. That said, it is a library and you would have to implement a server component yourself.
offtopic - I've seen a number of blog posts using this hand drawn diagram style recently, does anyone happen to know how it's done? (answers other than "by hand" appreciated)
Pencil (by FiftyThree) works great, of course. And unlocks the additional features in the Paper app. I'm actually quite impressed by how well it works.
[1] https://www.youtube.com/watch?v=fU9hR3kiOK0
[2] http://samza.apache.org/learn/documentation/0.7.0/comparison...