Feels very quick and snappy. Could you comment on how you did it? BTW, the "full...

brudolph · on Nov 9, 2011

Sure, I'm going to be writing a longer blog post on how I made it, but for now here's a short summary:

I made a script that scrapes all the links from hacker news every 15 minutes. I then open the links and process the text using python's nltk package (deciding what words are important and useful). Then I used a suffix tree in a mongodb backend to store the important words in such a way that once it looks up a word you can get the set of documents pertaining to the word. This way the search is linear in the length of the query and not the number of documents. The rest was just some jquery ajax calls and parsing of the search query.

I'll look into a new design, maybe make the orange's white and the white's orange.

Legend · on Nov 9, 2011

Great and snappy! I would really love the longer blog post...! Two questions if you have a minute: 1) Why suffix trees and not suffix arrays? 2) How are you implementing them? Did you do the tree building yourself or is there a good library that you recommend? Thanks.

brudolph · on Nov 9, 2011

I used a suffix tree over a suffix array because I hadn't heard of suffix arrays, but after glancing at the wikipedia page for suffix arrays it seems those might have been a good choice too. I'll look more into it. I did all the tree building myself, and I'll explain that in my post. The post should be ready by tomorrow.

Legend · on Nov 9, 2011

Awesome! Thank you so much...

3pt14159 · on Nov 9, 2011

For this type of thing I've had much better results with LDA in the python package gensim. It is less prone to mis-matches based on similar keywords (since it is context based) the problem with LDA is that for it to be most effective you need to have a taxonomy available for the documents, but you might be able to build a corpus or two out of sites like stack overflow.

brudolph · on Nov 9, 2011

http://pennyhacks.com/2011/11/09/introducing-hninstant/

Here you go

xtacy · on Nov 9, 2011

Great! Thanks!

BTW, it's not just "British people" who write "colour"; the people can also be from British colonies. ;)

sravfeyn · on Nov 9, 2011

From how many days have I been procrastinating doing exactly this :(. Now here it's already here! congrats :)

Rinum · on Nov 9, 2011

It strains my eyes as well. The reverse (darkorange text on a white background) would be better. Maybe try that?