To me, it seemed a bit better than GPT-4 at some coding task, or at least less inclined to just give the skeleton and leave out all the gnarly details, like GPT-4 likes to do these days. What frustrates me a bit is that I cannot really say if GPT-4, as it was in the very beginning when it happily executed even complicated and/or large requests for code, wasn't on the same level as this model actually, maybe not in terms of raw knowledge, but at least in term of usefulness/cooperativeness.
This aside, I agree with you that it does not feel like a leap, more like 4.x.
If you had used GPT-4 from the beginning, the quality of the responses would have been incredibly high. It also took 3 minutes to receive a full response.
And prompt engineering tricks could get you wildly different outputs to a prompt.
Using the 3xx ChatGPT4 model from the API doesn't hold a candle to the responses from back then.
I hear this somewhat often from people (less so nowadays) but before and after prompt examples are never provided. Do you have some example responses saved from the olden days, by chance? It would be quite easy to demonstrate your point if you did.
This aside, I agree with you that it does not feel like a leap, more like 4.x.