This really is the fastest growing technology of all time. Do you feel the curve?
I remember Mixtral8x7b dominating for months; I expected data bricks to do the same! but it was washed out of existence in days, with 8x22b, llama3, gemini1.5...
WOW.
I must be missing something because the output from two years ago feels exactly the same as the output now. Any comment saying the output is significantly better can be equally pared with a comment saying the output is terrible/censored/"nerfed".
How do you see "fastest growing technology of all time" and I don't? I know that I keep very up to date with this stuff, so it's not that I'm unaware of things.
I do massive amounts of zero shot document classification tasks, the performance keeps getting better. It’s also a domain where there is less of a hallucination issue as it’s not open ended requests.
It strikes me as unprecedented that a technology which takes arbitrary language-based commands can actually surface and synthesize useful information, and it gets better at doing it (even according to extensive impartial benchmarking) at a fairly rapid pace. It’s technology we haven’t really seen before recently, improving quite quickly. It’s also being adopted very rapidly.
I’m not saying it’s certainly the fastest growth of all time, but I think there’s a decent case for it being a contender. If we see this growth proceeding at a similar rate for years, it seems like it would be a clear winner.
> unprecedented that a technology [...] It’s technology we haven’t really seen before recently
This is what frustrates me: First that it's not unprecedented, but second that you follow up with "haven't really" and "recently".
> fairly rapid pace ... decent case for it being a contender
Any evidence for this?
> extensive impartial benchmarking
Or this? The last two "benchmarks" I've seen that were heralded both contained an incredible gap between what was claimed and what was even proven (4 more required you to run the benchmarks even get the results!)
What is the precedent for this? The examples I’m aware of were fairly bad at what GPTs are now quite good at. To me that signals growth of the technology.
By “haven’t really seen until recently” I mean that similar technologies have existed, so we’ve seen something like it, but they haven’t actually functioned well enough to be comparable. So we can say there’s a precedent, but arguably there isn’t in terms of LLMs that can reliably do useful things for us. If I’m mistaken, I’m open to being corrected.
In terms of benchmarks, I agree that there are gaps but I also see a clear progression in capability as well.
Then in terms of evidence for there being a decent case here, I don’t need to provide it. I clearly indicated that’s my opinion, not a fact. I also said conditionally it would seem like a clear winner, and that condition is years of a similar growth trajectory. I don’t claim to know which technology has advanced the fastest, I only claim to believe LLMs seem like they have the potential to fit that description. The first ones I used were novel toys. A couple years later, I can use them reliably for a broad array of tasks and evidence suggests this will only improve in the near future.
I put my hands out, count to the third finger from the left, and put that finger down. I then count the fingers to the left (2) and count the fingers to the right (2 + hand aka 5) and conclude 27.
I have memorised the technique, but I definitely never memorised my nine times table. If you’d said ‘6’, then the answer would be different, as I’d actually have to sing a song to get to the answer.
100% of the time when I post a critique someone replies with this. I tell them I've used literally every LLM under the sun quite a bit to find any use I can think of and then it's immediately crickets.
RT-2 is a vision language model fine tuned on the current vision input and actuator positions as the output. Google uses a bunch of TPUs to produce a full response at a cycle rate of 3 Hz and the VLM has learned the kinematics of the robot and knows how to pick up objects according to given instructions.
Given the current rate of progress, we will have robots that can learn simple manual labor from human demonstrations (e.g. Youtube as a dataset, no I do not mean bimanual teleoperation) by the end of the decade.
Usually when I encounter sentiment like this it is because they only have used 3.5 (evidently not the case here) or that their prompting is terrible/misguided.
When I show a lot of people GPT4 or Claude, some percentage of them jump right to "What year did Nixon get elected?" or "How tall is Barack Obama?" and then kind of shrug with a "Yeah, Siri could do that ten years ago" take.
Beyond that you have people who prompt things like "Make a stock market program that has tabs for stocks, and shows prices" or "How do you make web cookies". Prompts that even a human would struggle greatly with.
For the record, I use GPT4 and Claude, and both have dramatically boosted my output at work. They are powerful tools, you just have to get used to massaging good output from them.
That is not the reality today. If you want good results from an LLM, then you do need to speak LLM. Just because they appear to speak English doesn't mean they act like a human would.
People don’t even know how to use traditional web search properly.
Here’s a real scenario: A Citrix virtual desktop crashed because a recent critical security fix forced an upgrade of a shared DLL. The output is a really specific set of errors in a stack trace. I watched with my own two eyes an IT professional typed the following phrase into Google: “Why did my PC crash?”
Then he sat there and started reading through each result… including blog posts by random kids complaining about Windows XP.
I wish I could say this kind of thing is an isolated incident.
I mean, you need to speak German to talk to a German. It’s not really much different for LLM, just because the language they speak has a root in English doesn’t mean it actually is English.
And even if it was, there’s plenty of people completely unintelligible in English too…
You see no difference between non-RLHFed GPT3 from early 2022 and GPT-4 in 2024? It's a very broad consensus that there is a huge difference so that's why I wanted to clarify and make sure you were comparing the right things.
What type of usages are you testing? For general knowledge it hallucinates way less often, and for reasoning and coding and modifying its past code based on English instructions it is way, way better than GPT-3 in my experience.
It's fine, you don't have a use for it so you don't care. I personally don't spend any effort getting to know things that I don't care about and have no use for; but I also don't tell people who use tools for their job or hobby that I don't need how much those tools are useless and how their experience using them is distorted or wrong.
Usually people who post such claims haven’t used anything beyond gpt3. That’s why you get questions.
Also, the difference is so big and so plainly visible that I guess people don’t know how to even answer someone saying they don’t see it. That’s why you get crickets.
The difference matters as generally in my experience, Llama 3, by virtue of its giant vocabulary, generally tokenizes text with 20-25% less tokens than something like Mistral. So even if its 18% slower in terms of tokens/second, it may, depending on the text content, actually output a given body of text faster.