This really is the fastest growing technology of all time. Do you feel the curve...

krainboltgreene · on April 30, 2024

I must be missing something because the output from two years ago feels exactly the same as the output now. Any comment saying the output is significantly better can be equally pared with a comment saying the output is terrible/censored/"nerfed".

How do you see "fastest growing technology of all time" and I don't? I know that I keep very up to date with this stuff, so it's not that I'm unaware of things.

HeatrayEnjoyer · on April 30, 2024

The best we had two years ago was GPT-3, which was not even instruction tuned and hallucinated wildly.

hehdhdjehehegwv · on April 30, 2024

I do massive amounts of zero shot document classification tasks, the performance keeps getting better. It’s also a domain where there is less of a hallucination issue as it’s not open ended requests.

krainboltgreene · on April 30, 2024

I didn't ask what you do with LLMs, I asked how you see "fastest growing technology of all time".

steve_adams_86 · on April 30, 2024

It strikes me as unprecedented that a technology which takes arbitrary language-based commands can actually surface and synthesize useful information, and it gets better at doing it (even according to extensive impartial benchmarking) at a fairly rapid pace. It’s technology we haven’t really seen before recently, improving quite quickly. It’s also being adopted very rapidly.

I’m not saying it’s certainly the fastest growth of all time, but I think there’s a decent case for it being a contender. If we see this growth proceeding at a similar rate for years, it seems like it would be a clear winner.

krainboltgreene · on April 30, 2024

> unprecedented that a technology [...] It’s technology we haven’t really seen before recently

This is what frustrates me: First that it's not unprecedented, but second that you follow up with "haven't really" and "recently".

> fairly rapid pace ... decent case for it being a contender

Any evidence for this?

> extensive impartial benchmarking

Or this? The last two "benchmarks" I've seen that were heralded both contained an incredible gap between what was claimed and what was even proven (4 more required you to run the benchmarks even get the results!)

steve_adams_86 · on April 30, 2024

What is the precedent for this? The examples I’m aware of were fairly bad at what GPTs are now quite good at. To me that signals growth of the technology.

By “haven’t really seen until recently” I mean that similar technologies have existed, so we’ve seen something like it, but they haven’t actually functioned well enough to be comparable. So we can say there’s a precedent, but arguably there isn’t in terms of LLMs that can reliably do useful things for us. If I’m mistaken, I’m open to being corrected.

In terms of benchmarks, I agree that there are gaps but I also see a clear progression in capability as well.

Then in terms of evidence for there being a decent case here, I don’t need to provide it. I clearly indicated that’s my opinion, not a fact. I also said conditionally it would seem like a clear winner, and that condition is years of a similar growth trajectory. I don’t claim to know which technology has advanced the fastest, I only claim to believe LLMs seem like they have the potential to fit that description. The first ones I used were novel toys. A couple years later, I can use them reliably for a broad array of tasks and evidence suggests this will only improve in the near future.

harryp_peng · on April 30, 2024

But humans aren't 'original' ourselves. How do you do 3*9? You memorized it. It's striking how humans could reason at all.

oarsinsync · on April 30, 2024

> How do you do 3*9? You memorized it

I put my hands out, count to the third finger from the left, and put that finger down. I then count the fingers to the left (2) and count the fingers to the right (2 + hand aka 5) and conclude 27.

I have memorised the technique, but I definitely never memorised my nine times table. If you’d said ‘6’, then the answer would be different, as I’d actually have to sing a song to get to the answer.

hehdhdjehehegwv · on April 30, 2024

I didn’t say that?

cma · on April 30, 2024

Are you trying the paid gpt or just free 3.5 chatgpt?

krainboltgreene · on April 30, 2024

100% of the time when I post a critique someone replies with this. I tell them I've used literally every LLM under the sun quite a bit to find any use I can think of and then it's immediately crickets.

imtringued · on April 30, 2024

RT-2 is a vision language model fine tuned on the current vision input and actuator positions as the output. Google uses a bunch of TPUs to produce a full response at a cycle rate of 3 Hz and the VLM has learned the kinematics of the robot and knows how to pick up objects according to given instructions.

Given the current rate of progress, we will have robots that can learn simple manual labor from human demonstrations (e.g. Youtube as a dataset, no I do not mean bimanual teleoperation) by the end of the decade.

Workaccount2 · on April 30, 2024

Usually when I encounter sentiment like this it is because they only have used 3.5 (evidently not the case here) or that their prompting is terrible/misguided.

When I show a lot of people GPT4 or Claude, some percentage of them jump right to "What year did Nixon get elected?" or "How tall is Barack Obama?" and then kind of shrug with a "Yeah, Siri could do that ten years ago" take.

Beyond that you have people who prompt things like "Make a stock market program that has tabs for stocks, and shows prices" or "How do you make web cookies". Prompts that even a human would struggle greatly with.

For the record, I use GPT4 and Claude, and both have dramatically boosted my output at work. They are powerful tools, you just have to get used to massaging good output from them.

parineum · on April 30, 2024

> or that their prompting is terrible/misguided.

This is the "You're not using it right" defense.

It's an LLM, it's supposed to understand human language queries. I shouldn't have to speak LLM to speak to an LLM.

Filligree · on April 30, 2024

That is not the reality today. If you want good results from an LLM, then you do need to speak LLM. Just because they appear to speak English doesn't mean they act like a human would.

jiggawatts · on April 30, 2024

People don’t even know how to use traditional web search properly.

Here’s a real scenario: A Citrix virtual desktop crashed because a recent critical security fix forced an upgrade of a shared DLL. The output is a really specific set of errors in a stack trace. I watched with my own two eyes an IT professional typed the following phrase into Google: “Why did my PC crash?”

Then he sat there and started reading through each result… including blog posts by random kids complaining about Windows XP.

I wish I could say this kind of thing is an isolated incident.

Aeolun · on May 1, 2024

I mean, you need to speak German to talk to a German. It’s not really much different for LLM, just because the language they speak has a root in English doesn’t mean it actually is English.

And even if it was, there’s plenty of people completely unintelligible in English too…

cma · on April 30, 2024

You see no difference between non-RLHFed GPT3 from early 2022 and GPT-4 in 2024? It's a very broad consensus that there is a huge difference so that's why I wanted to clarify and make sure you were comparing the right things.

What type of usages are you testing? For general knowledge it hallucinates way less often, and for reasoning and coding and modifying its past code based on English instructions it is way, way better than GPT-3 in my experience.

harryp_peng · on April 30, 2024

I always use GPT4 to write boiler plate code etc. It probably automates 50% of my tasks, pretty good.

Eisenstein · on April 30, 2024

It's fine, you don't have a use for it so you don't care. I personally don't spend any effort getting to know things that I don't care about and have no use for; but I also don't tell people who use tools for their job or hobby that I don't need how much those tools are useless and how their experience using them is distorted or wrong.

kolinko · on April 30, 2024

Usually people who post such claims haven’t used anything beyond gpt3. That’s why you get questions.

Also, the difference is so big and so plainly visible that I guess people don’t know how to even answer someone saying they don’t see it. That’s why you get crickets.

hehdhdjehehegwv · on April 30, 2024

Funny thing is I’m still in love with Mistral 7B as it absolutely shreds on a nice GPU. For simple tasks it’s totally sufficient.

qeternity · on April 30, 2024

Llama3 8B is for all intents and purposes just as fast.

minosu · on April 30, 2024

Mistral 7b inferences about 18% faster for me as a 4bit quantized version on an A100. Thats definitely relevant when running anything but chatbots.

tmostak · on April 30, 2024

Are you measuring tokens/sec or words per second?

The difference matters as generally in my experience, Llama 3, by virtue of its giant vocabulary, generally tokenizes text with 20-25% less tokens than something like Mistral. So even if its 18% slower in terms of tokens/second, it may, depending on the text content, actually output a given body of text faster.