Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Real-Time Full-Text Search with Luwak and Samza (confluent.io)
85 points by nehanarkhede on April 13, 2015 | hide | past | favorite | 30 comments


Samza's author really opened my mind related to how useful stream process is to high performance data processing [1] and I think Samza's only bummer (for me, personally, today) is it's lack of support for non-JMV languages [2]

[1] https://www.youtube.com/watch?v=fU9hR3kiOK0

[2] http://samza.apache.org/learn/documentation/0.7.0/comparison...


I totally agree! I can't wait for Martin Kleppmann's "Designing Data-Intensive Applications" to be complete. I've read the chapters available through the early release and highly recommend the book based on what I've seen so far!

http://dataintensive.net/


Apache Storm has non-jvm languages https://storm.apache.org/


Yap! Also, Spark added a Python API for Spark Streaming after v1.2 [1]

[1] https://spark.apache.org/docs/1.2.0/streaming-programming-gu...


I think this is the first post I've seen about Samza from a non-LinkedIn team. I'd love to hear any details about the Samza experience - it seems like it should be the logical choice for Kafka users, but there's not much out there about it.


I don't think Confluent really counts as "non-LinkedIn team" :)

"Jay is co-founder and CEO at Confluent. Prior to Confluent, Jay Kreps was the initial developer on several open source projects, including Apache Kafka, Apache Samza, Voldemort. He was the lead architect for data infrastructure at LinkedIn."

(Martin Kleppmann also has a LinkedIn background)


Ahh, thanks - I checked that the speakers weren't currently working for LinkedIn, but I didn't look further into their backgrounds.

Oh, well. I still hope to one day read about someone else using Samza in production.


Here are a few production users: https://cwiki.apache.org/confluence/display/SAMZA/Powered+By

The Metamarkets team wrote a nice post on their use of Samza a few days ago: https://metamarkets.com/2015/simplicity-stability-and-transp...


Interesting, I would be very interested in learning more about Metamarkets transition to Samza, as 1 yr ago they were using Storm instead [1] [2]

Or may be they did not transition and are actually using both, I don't know

[1] https://youtu.be/3Qb_2GGRz24?t=20m24s

[2] https://storm.apache.org/documentation/Powered-By.html


This is long long long overdue in SOLR. The percolator feature is what has made me stick with ElasticSearch for the past 3 years, and has contributed to its increasing popularity over SOLR.


Bit offtopic, but does anyone have suggestions for simple full text search engines? I basically want something for names, just a couple of thousand, nothing fancy. Setting up something like ElasticSearch seems like overkill (and is quite hungry for specs as well). I was thinking about simply hacking something together with Redis and Python, but i suppose someone might have a better solution.


Sorry if come off as naive (as I don't really understand what a 'full text search' is), but for a couple thousand names wouldn't grep with regex wildcards suffice?


That doesn't sound like a bad suggestion at all. I think it might get bigger over time, or i might have aliases for the names, and in the end using just regex i'll probably hit a ceiling sooner than later.


If you're using PostgreSQL you can take advantage of its full text search support. When doing so, make sure to save the text search vectors in a physical column as otherwise queries will be quite slow.


I would definitely take that option if we would use PostgreSQL, but we don't. PostgreSQL sounds like a wonderful piece of software, but they need to work on making it more accessible and user friendly. I tried getting something up and running, but given that there isn't a proper native client for Mac i gave up :(


It is really easy to install using http://brew.sh/


The command-line part and executables are indeed, but it's lacking in visual tools. Say, the equivalents of Sequel Pro and Phpmyadmin.



What exactly are you trying to do?

ElasticSearch isn't that difficult and the new guide they published a few months ago walks you through getting it set up without going too in-depth [1]. As a beginner I had it running on a $5 digitalocean droplet within a couple hours and indexed way over "a couple thousand" documents before the end of the day.

But if your needs are really that simple, MySQL does support full-text search [2].

[1] http://www.elastic.co/guide/en/elasticsearch/guide/current/i... [2] https://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html


Yes, i've looked into ES before, and if my usecase would be anything more complicated i would definitely go for that route. However, it's still quite some work to setup, and they recommend at least 8GB of RAM for a working instance [1].

I didn't know MySQL had fulltext search as well. I'll look into that. Thanks.

[1]: http://www.elastic.co/guide/en/elasticsearch/guide/master/ha...


I've worked a bit with a nice search library in Go called bleve[1]. That said, it is a library and you would have to implement a server component yourself.

Bleve comes with a few command line utils: https://github.com/blevesearch/bleve/tree/master/utils

[1] http://www.blevesearch.com/


Thanks!


Python has a search-engine: whoosh


Whoosh sounds pretty much what i'm looking for. Thanks!


Also, if you are using Python, sqlite3 has some pretty advanced text searching capabilities that will scale past your "few thousand" with no problem.


Great write-up.

If you're interested in this and you live in (or would like to live in) London, we're hiring. Email's in my profile.


offtopic - I've seen a number of blog posts using this hand drawn diagram style recently, does anyone happen to know how it's done? (answers other than "by hand" appreciated)


Looks like Paper, by FiftyThree:

https://appsto.re/us/KfqkE.i


Correct. If you're going to try it, I recommend getting a stylus for your iPad, since handwriting with your fingertip doesn't work very well.


Pencil (by FiftyThree) works great, of course. And unlocks the additional features in the Paper app. I'm actually quite impressed by how well it works.

http://www.fiftythree.com/pencil




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: