I'm working on a text indexing/retrieval program, like locate (http://www.openbsd.org/cgi-bin/man.cgi?query=locate) but for content and not just filenames, and with an index <=2% the size of the indexed data.
It's very nearly together (integrating individually working parts now), and is currently ~1,500 lines (according to sloccount).
Adding support for indexing Unicode text, more configuration, composite search queries (A and B near C and not D), etc. will no doubt make the source expand a bit, but it's still pretty small.
If you're interested in trying it out once it's ready, contact info is in my profile. I'm shooting for within a week or two for a beta vulgaris. (Requires Unix. ANSI C, strung together with sh and/or awk to avoid dependencies.)
Does the index include the dictionary too, in your calculation? I'd be interested in seeing this indexing/retrieval program of yours, I hope you release it soon!
It's very nearly together (integrating individually working parts now), and is currently ~1,500 lines (according to sloccount).
Adding support for indexing Unicode text, more configuration, composite search queries (A and B near C and not D), etc. will no doubt make the source expand a bit, but it's still pretty small.
If you're interested in trying it out once it's ready, contact info is in my profile. I'm shooting for within a week or two for a beta vulgaris. (Requires Unix. ANSI C, strung together with sh and/or awk to avoid dependencies.)