Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm working on a text indexing/retrieval program, like locate (http://www.openbsd.org/cgi-bin/man.cgi?query=locate) but for content and not just filenames, and with an index <=2% the size of the indexed data.

It's very nearly together (integrating individually working parts now), and is currently ~1,500 lines (according to sloccount).

Adding support for indexing Unicode text, more configuration, composite search queries (A and B near C and not D), etc. will no doubt make the source expand a bit, but it's still pretty small.

If you're interested in trying it out once it's ready, contact info is in my profile. I'm shooting for within a week or two for a beta vulgaris. (Requires Unix. ANSI C, strung together with sh and/or awk to avoid dependencies.)



Does the index include the dictionary too, in your calculation? I'd be interested in seeing this indexing/retrieval program of yours, I hope you release it soon!


Yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: