Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem with this project is that it doesn't solve the valuable problem, which is ranking/relevance.

It's effectively just a very simple "indexer", if you can even really call it that because it's only chunking audio for semantic search rather than actually indexing it.

Search engines are hard because of ranking and scale. This project does not solve either of those problems.

Personally if I was going to build this I would exclusively focus on the data side and use a pre-built traditional search engine like meilisearch, typesense, elasticsearch, etc to handle the indexing and search side. Adding semantic search into the mix upfront makes your life so much harder.



Yeah I agree to an extent. Using a traditional search engine would be simpler and easier to implement but wouldn't able to accurately contextualise the actual content of the video based on the users question which is the focus on the tool. However, I do agree that there is a lot of space for growth and adding a traditional form of full text fuzzy search which will help with some of the ranking problems and it is part of the plans to mix the best of both worlds :)

Ranking is a huge topic by itself, which beyond similarity/text matching, other topics like SEO, popularity, etc plays a huge part and those are aspects that I'm looking forward to understand better and see how the community can contribute as well!


> but wouldn't able to accurately contextualise the actual content of the video based on the users question which is the focus on the tool

Are you sure this is what users want? Semantic search is not appropriate for all situations.

> adding a traditional form of full text fuzzy search which will help with some of the ranking problems

Full text fuzzy search is NOT a performant search engine and is NOT related to ranking or relevance. Ranking is an independent process after finding matching results.

Semantic search would make more sense in the context of ranking rather than pure search. E.g. you use traditional search to find matching documents then semantic search on matches to rank them.

> Ranking is a huge topic by itself, which beyond similarity/text matching, other topics like SEO, popularity, etc plays a huge part

Based on my mediocre understanding of ranking, basic ranking is generally not about any of these factors. Presumably because they are too slow and computationally intensive. Maybe there are multiple layers of ranking for these different features though.

My understanding is that basic ranking is/was more about metrics like TF-IDF. I’m sure there are more advanced modern techniques, but also likely more complicated.

Search is a ridiculously big and complex topic. If you want this to be more than a toy project I think it would be wise to focus on much smaller sliver and have a much clearer value prop.

You are currently trying to tackle multiple big problems simultaneously.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: