One of logstash's main draws is as a data transformation pipeline. You can do lookups via dns or a json or csv file, for example. From what I can tell vector is just a simple log shipper.
Vector is a source -> transform -> sink pipeline as well. There are no transforms that do lookups or joins available now but the functionality is supported if someone writes a custom transformer middleware.
There is toshi-search https://github.com/toshi-search/Toshi who is trying to be a drop-in replacement for Elasticsearch. My understanding is that Bayard is trying to achieve the same use cases as Elasticsearch but with a different API
Elasticseach is notoriously hard to roll out and develop against (for smaller companies especially), and so I am happy to see smaller projects in this space.
It's been around for a couple of years now, and have a few happy customers who have had great success in replacing $X0,000/year popular hosted search with Typesense!
Agreed. We've been using this successfully for awhile now.
I'm curious though at what point something like this or ES itself would make sense for primarily text search. Is speed the biggest thing, or is it more flexibility to tweak and get better search results?
Postgres is fine if your search problem is mostly a recall problem. If N is large enough or you have small N with enough overlapping keywords (long documents) then precision becomes important. That is when you need things like BM25, PageRank, machine learning, etc and Postgres just doesn't cut it anymore. Additionally spell check, high-quality autocomplete, multiple languages are better supported and much easier to implement in ES/Solr.
Postgres FTS is very basic and requires separate methods and extensions to do things like fuzzy matching. It also doesn't support modern relevance and ranking algorithms.
Search is somewhat embarrassingly parallel right? So Postgres is great until you want to shard all of your queries. Which is (of course) possible, but then you're using attributes of a tool that aren't specifically tailored to your problem space?
Postgres full text search is fast and easy, but it doesn't seem much more scaleable/reliable/resilient than a clustered search solution that is shaped to the problem at hand.
I see the author did the same search engine in Go a while ago. So I suppose the project being a side project to learn a new language. Or is there a different reason?
That is a good observation. The author might also need flexible search options at work. In any case, I have some interest in Rust but don’t actively use it. I found reading through the main server.rs file interesting as example code.
Why would go and c which have static typing suffer from type errors ? Why would go that has a data race checker and quite good concurrency primitives be more prone that rust ?
Rust seems to have its merits but I find the parent post more level headed in that it tries to characterize language runtime, admittedly subjectively but not in a rust==good, rest==bad way.
You can easily cause a C/Go program to crash/abort due to runtime type errors. If you've written C, you've miscast something at some point. Go in particular relies extremely heavily on runtime reflection. Both languages have poor type systems.
Go's race detector is useful and we've caught bugs using it, but it's nicer when the compiler prevents you from having those bugs in the first place.
I'm not a Rust fanboy, I simply like languages where it's more difficult to represent invalid state (Haskell, Ocaml, Rust, etc.) than in the mainstream languages I suffer with in my job on a daily basis. Rust happens to have great tooling and is the most likely to make inroads on these issues I care about (runtime stability + code correctness via good type system).
Go doesn’t rely heavily on runtime reflection. Putting it in the same category as C or JS is disingenuous. Rust has a great static type system, but I’ve never seen a Go program fail on a runtime/reflection type error before.
I'm not sure what to say. Go code (stdlib, popular libraries, etc.) uses runtime reflection _everywhere_. I've been using Go since 1.2 and I've seen too many runtime crashes due to type errors to count.
It is used for things like printf and JSON marshaling. I’ve been using Go regularly since 2012 and this isn’t a real problem. The reflection in the standard lib and popular third party libs (and most unpopular third party libs for that matter) is rock solid. Moreover, reflection probably accounts for less than 1% of Go code—not sure where you’re getting “everywhere” from. So I guess I’m calling your bluff.
Even if your claim of "probably ... less than 1% of Go code" is accurate, appearing somewhat less often than 1 out of every 100 lines of code is _everywhere_ to me...
To be clear, I didn't say "less than 1 out of every 100 lines", but in any case I don't consider yours to be a reasonable definition of "everywhere". Especially since such a low frequency doesn't support your claim that "you see runtime type errors too many times to count". Perhaps you're using libraries that are far, far below the ecosystem's average quality?
Gosh, Rust evangelists did a really good job brainwashing the community. Take a look at big Rust projects that do anything useful, like Tokyo. You'll find dozens of unsafe blocks everywhere, so much for static analysis. True, in the most simple cases linear types will slap you on the wrist, but most bugs happen in the complex parts of code anyways.
I'm not part of the Rust "community" and don't participate in any of its fora, I just like a lot of the ideas it's pushing forward (see sibling comment). "unsafe" doesn't dIsAbLe AlL pRoTeCtIoNs like the anti-Rust crowd likes to say. I've written a few small (5,000+ LOC) programs in Rust over the last five years or so and I've only personally had to dip into unsafe once in my own code. I didn't claim that Rust prevents all bugs, I pointed out that Rust programs are more stable in the case of _runtime type errors and data races_ (which they are).
Red is a property of a car totally unrelated to its performance characteristics. The language software is implemented in absolutely has an impact on the performance and behavior of the software.
This is more like "Carbon is faster than aluminum." Yeah, it's a generalization, but it's a useful one.
As someone writing software in Rust myself I am always interested in knowing about projects using Rust for multiple reasons.
1) In the case of libraries (crates), it might be something I can make use of in the future.
2) I can look at how they solved the problem they are solving and compare that with how I'd do it and maybe learn something new that can be useful to me in my future projects.
3) I want Rust to thrive and I want people to be aware of projects using Rust because the more people that are aware of Rust the bigger is the probability that I can work for more companies in the future writing software for them in Rust.
Your 2) is by far the most important to me. Not only does it allow to learn about solutions I could repurpose and their used patterns; it can also give me that last missing piece I was looking for that blocked me from building something.
That said, I'm especially looking for software "written in Rust" because I know the build process is standardized. I may need some dependencies, but I know how the build will work (cargo build -> if at all, all build instructions are in build.rs). I compile all rust-based open projects myself, and I have yet to stumble over a non-binding-specific package that won't compile.
Fast rewind/forward n years, and you could replace "Rust" in your post with the name of the language /du jour/.
My original comment was because as an old guy who saw hype around so many languages come and go, I am getting tired of those projects who try to sell themselves only because they are written in Rust.
I've seen many languages with mix and match syntax differences and feature sets that didn't bring anything significant to the table. Rust is not that. It has memory safety while having memory efficiency, high performance, good package management, easy interface with C FFI, and excellent support for parallelism. The list of things it gets "right" from my point of view dwarfs other languages-du-jour.
The biggest downside IMO is you have to get people past some conceptual hurdles before they can be productive with it. (e.g. the borrow checker) Despite how often you see it come up on HN, it doesn't have a powerful marketing force behind it. It also doesn't fit the trend of turning developers into commodities by being as easy as possible.
The excitement around Java existed for two reasons: many people were new to the ideas in Java, and many people were learning it for the first time. Now, every programmer knows about Java's ideas (with the possible exception of interfaces which aren't in Python) and the number of people learning Java is capped at the number of kids being born. If Rust is successful then eventually every programmer who wants to learn it will already know it, and every programmer will know how to use a borrow checker. The nobody will post "made in rust" stuff on HN, but that will be an indication of Rust's victory.
Rust is probably here to stay seeing as it was picked up by Firefox. I still think it could have been done better, but perfect is the enemy of good enough.
Rust wasn't just "picked up" by Firefox - it was developed by Mozilla for the very purpose of using it in the "next generation" of Firefox technologies.
(The original Rust project as designed by Graydon Hoare was even quite different from the Rust of today, e.g. the C/C++-like focus, with little ot no runtime, is something that only came about around the Rust 1.0 release.)
Note that Rust intentionally lacks some features that would be useful for Firefox development (e.g. developing for the DOM involves object inheritance which isn’t present in Rust), so it’s not like it’s designed to solve exactly their problems and nobody else’s.
Rust has trait inheritance, but not object inheritance. Servo uses code generation and macros to get around this. (I’m literally writing an operating system in Rust, I know what I’m talking about.)
"Trait" inheritance is just interface inheritance, which is indistinguishable from composition+delegation. It doesn't have the pitfalls of actual implementation inheritance, which essentially involves an extra "trick" of dispatching on the actual type of your object even when calling base-class methods. It's not that Rust cannot do this - heck, people do it all the time in C. But it has to be done by writing things out explicitly, it's not automatic in any sense. And for good reason - letting base-class code access methods that have been redefined in a derived class can create all sorts of pitfalls when you're not very careful about what that implies.
And was probably said about Fortran and C. Our tools are getting better. I can only hope Rust becomes so widely used that I can recommend big companies use it since they'll have a trillion developer hiring pool.
I like knowing what language something is coded in. It makes me more likely to look into the project. If it's written in something I'm not interested in I may click through, but not be as thorough, and some languages I save the link for later because I have no interest in them professionally or on my time off. I like looking at all projects eventually because some people come up with amazing pieces of software in all types of languages, but others might not care to look at a Ruby, PHP, NodeJS, Python, C, C++, Rust etc project.
So per your rationale, why not listing in the title other pertinent information about the project?
I am saying that because the programming language is not what defines a project. It could be a pile of junk even if it written in the greatest language ever made.
Rust is an interesting language both for its technical characteristics, which is a direct appeal as other commenters have noted, but it can also be worth noting because Rust interoperates almost as well as C. If I announce a cool Python module, someone who primarily uses Ruby is probably going to ignore it because the level of effort to use it would be more than it's worth. If I announce a cool Rust module, they might think “you know, it's pretty easy to build a wrapper…”.
What codetrotter and ccccc0 said, also Rust as a language and a community has a strong focus on correctness, which makes me more interested in actually using the project.
In general I think the title of a project on a news aggregator should basically be a 80 character sales pitch, " in rust" is 8 characters that signal a lot more than most 8 characters could (to me).
Rust is young enough that you can read this as an ad for the language, not the project. "Rust is a language in which people write full-text search and indexing"
I think it's relevant for open source projects because people might want to contribute to them or just read through the code to see how things work. And of course people will be more interested in doing those things with languages they have experience using.
Vespa is great, but is much harder to use than Elasticsearch. It's also very much geared towards ranking and not filtering.
For example, you can't do exact string matches (!). All string matches are case insensitive. You also cannot index nested fields (e.g. a map or array of maps) for search. In the end, you have to munge your data considerably to make it fit Vespa's data model.
It also feels odd and antiquated in many ways, with XML configurations all over the place.
Is this an opening of a mature project that has been coded in private somewhere? Is this just a code drop on the community?
Note: this comes from a developer in Japan. Tantivy's main developer is also based in Japan. @fulmicoton, is there any interaction between the projects?
Not all projects are birthed in public. It may have been extracted from a larger private project which may have issues sharing its exact history of development.
Lots of downvotes but no replies. Rust is the language de jour on HN at the moment. Lots of comments about it being a young language so you’ll just have to wait.
Libraries weren’t maintained or in a state of development for months without regular commits. Statements like “sometimes this doesn’t work” or commits that totally redesigned an api. Meaning you have to rewrite your code.
I need stability and some maturity in personal projects due to time constraints. Not a jab at Rust, I’m still in wait and watch mode. Looking to learn a low level language in the next year and Rust is my top choice.
I'm looking for an easy to use typeahead/autocomplete search solution. javascript lib for frontend paired with easy to manage, lightweight server. something modern.
The dataset isn't huge. e.g. 1 million strings of no more than 512 utf-8 chars each and not reindexed more than once a day or week. clusters, sharding etc unnecessary.
I keep hoping to stumble on a fully baked solution...any ideas?
Interesting. Since the underlying engine(Tantivy) is faster than lucene - at least in their benchmarks - it should be faster that solr. Seems like the author is exploring a faster alternative to solr. I never got around to explore elasticsearch since our solr instances are running so smoothly.
- it's interesting to other developers (which are HN's main audience) to see.
They don't sell some shrink wrapped software, where the language doesns't matter. Nor some already established package you just download and use as is like Postgres or Bash, or whatever.
- it matters for those looking for compatible stuff for their own projects (for libraries, reusable packages, etc.)
- it offers certain guarantees other languages do no (e.g. memory safety, native binaries) which can be an important criterium for those looking for a project
- it's important for possible collaborators to know the language (the project being Open Source and everything).
- in a field where a Java based project (Lucene/Elastic Search) dominates, it is important to advertise that you offer a non-Java alternative for people who want to avoid Java/Oracle/etc.
- Rust is also currently on the rise (!= meme), and thus gets new programmers, and new greenfield projects. And since those people are trying the language, they want to advertise their involvement to the community, talk about how they found the experience, etc.
Does it really, though? Unsafe rust exists, and while the language is certainly built to strongly encourage certain safer programming practices, I don't really see it as offering any guarantees at all. If a project is open source I can go and investigate for myself, but who has the time for that?
The guarantees that safe rust provides are very good for me as a developer, because it kills a large class of potential errors and will therefore theoretically make the dev process easier. But I don't really feel any trust in these 'guarantees' when I switch roles to a user of someone else's libraries or products.
Most of the rest of your points I agree with, but I also agree with the original comment that it starts to make me feel just a little bit eye-rolly every time I see "...written in Rust." (And I do like the language.)
everything needs unsafe code to run, since it has to interact with the OS/CPU/outside world. Rust's main contribution is that provides ways for you to clearly section off and declare that code is "unsafe" and needs extra examination to uphold its invariants.
as an example, std::vec::Vec is implemented with quite a bit of unsafe code, but all Rust consumers can be confident that it is vetted and the abstraction presented around it is safe.
of course, this isn't a perfect solution, but it's much better than e.g. C/C++, where you basically treat every line as "unsafe".
1. Toshi[1] - alternative to Elasticsearch
2. Sonic[2] - alternative to Elasticsearch
3. Vector[3][4] - alternative to Logstash
4. native_spark[5] - alternative to Apache Spark
[1] https://github.com/toshi-search/Toshi
[2] https://github.com/valeriansaliou/sonic
[3] https://vector.dev/
[4] https://github.com/timberio/vector
[5] https://github.com/rajasekarv/native_spark