When's the last time you "went through the loop" ? I feel like with this stuff I have to update my priors about every three or four months.
I've been using AI regularly since GPT 4 first came out a couple years ago. Over that time, various models from Sonnet to Gemini to 4o have generally been good rubber ducks. Good to talk to and discuss approaches and tradeoffs, and better in general than google + stack overflow + pouring over verbose documentation.
But I couldn't really "hand the models the wheel." They weren't trustworthy enough, easily lost the plot, failed to leverage important context right in front of them in the codebase, etc. You could see that there was potential there, but it felt pretty far away.
Something changed this spring. Gemini 2.5 Pro, Claude 4 models, o3 and o4-mini -- I'm starting to give the models the wheel now. They're good. They understand context. They understand the style of the codebase. And they of course bring the immense knowledge they've always had.
It's eerie to see, and to think about what comes with the next wave of models coming very soon. And if the last time you really gave model-driven programming a go was 6 months or more ago, you probably have no idea what's about to happen.
Just one person's opinion: I can't get into the mode of programming where you "chat" with something and have it build the code. By the time I have visualized in my head and articulated into English what I want to build and the data structures and algorithms I need, I might as well just type the code in myself. That's the only value I've found from AI: It's a great autocomplete as you're typing.
To me, programming is a solo activity. "Chatting" with someone or something as I do it is just a distraction.
A good part of my career has been spent pair programming in XP-style systems, so chatting away with someone about constraints, what we're trying to do, what we need to implement, etc, might come a bit more naturally to me. I understand your perspective though.
That may be one of the reason for the conflict of opinions. I usually build the thing mentally first, then code it, and then verify it. With tools like linters and tests, the shorter feedback make the process faster And editor fluency is a good boost.
By the time, I'm about to prompt, I usually have enough information to just code it away. Coding is like riding a bicycle downhill. You just pay enough focus to ride it. It's like how you don't think about the characters and the words when you're typing. You're mostly thinking about what you want to say.
When there's an issue, I switch from coding to reading and thinking. And while the latter is mentally taxing, it is fast as I don't have to spell it out. And a good helper to that is a repository of information. Bookmarks to docs, documentation browser, code samples,.. By the times the LLM replies with a good enough paragraph, I'm already at the Array page on MDN.
Having it write unit tests has been one place it is reliably useful for me. Easily verifiable that it covers everything I’d thought of, but enough typing involved that it is faster than doing it myself - and sometimes it includes one I hadn’t thought of.
I’ve read somewhere, IIRC, that you moslty need to test three things: Correct input, incorrect input, and input that are on the fence between the two. By doing some intersection stuff (with the set of parameters, behavior that are library dependent), you mostly have a few things left to test. And the actual process of deciding on which case to test is actually important as that is how you highlight edge cases and incorrect assumptions.
Also writing test cases is how you experience the pain of having things that should not be coupled together. So you can go refactor stuff instead of having to initialize the majority of your software.
I'm not chatting with the LLM – I'm giving one LLM in "orchestrator mode" a detailed description of my required change, plus a ton of "memory bank" context about the architecture of the app, the APIs it calls, our coding standards, &c. Then it uses other LLMs in "architect mode" or "ask mode" to break out the task into subtasks and assigns them to still other LLMs in "code mode" and "debug mode".
When they're all done I review the output and either clean it up a little and open a PR, or throw it away and tune my initial prompt and the memory bank and start over. They're just code-generating machines, not real programmers that it's worth iterating with – for one thing, they won't learn anything that way.
That's my basic doubt, too. When developing software, I'm translating the details, nuances and complexities of the requirements into executable code. Adding another stage to this process is just one more opportunity for things to get lost in translation. Also, getting an LLM to generate the right code would require something else than the programming languages we know.
Interesting point, I agree that things change so fast that experience from a few months ago is out of date. I'm sceptical there has been a real step change (especially based on the snippets I see claude 4 writing in answer to questions) but it never hurts to try again.
My most recent stab at this was Claude code with 3.7, circa March this year.
To be fair though, a big part of the issue for me is that having not done the work or properly thought through how a project is structured and how the code works, it comes back to bite later. A better model doesn't change this.
There has been a big change with Claude 4.0 in my opinion. Probably depends on your environment, but it’s the first time I’ve been able to get hundreds of lines of Python that just works when vibe coding a new project.
It’s still slower going as the codebase increases in size, and this is my hypothesis for the huge variance; I was getting giddy at how fast I blew through the first 5 hours of a small project (perhaps in 30 mins with Claude) but quickly lost velocity when I started implementing tests and editing existing code.
If you give it another try, my goto right now is Sonnet 4 Thinking. There's a pretty massive difference in intelligence by switching from just plain 4 to 4 Thinking. It's still pretty fast, and I think hits the right balance between speed and useful intelligence.
However, at least in my experience, nothing beats o3 for raw intelligence. It's a little too slow to use as the daily driver though.
It's kind of fun seeing the models all have their various pros and cons.
> To be fair though, a big part of the issue for me is that having not done the work or properly thought through how a project is structured and how the code works, it comes back to bite later. A better model doesn't change this.
Yes, even as I start to leverage the tools more, I try to double down on my own understanding of the problem being solved, at least at a high level. Need to make sure you don't lose the plot yourself.
Do you get very far at work with contrarian takes backed up without evidence or detail? How does dismissing other people’s experience so condescendingly work out for you in real life?
I’m curious if/how you’ve built a career this way. Or is this account just a stress relief “shit-posting” persona?
I've been using AI regularly since GPT 4 first came out a couple years ago. Over that time, various models from Sonnet to Gemini to 4o have generally been good rubber ducks. Good to talk to and discuss approaches and tradeoffs, and better in general than google + stack overflow + pouring over verbose documentation.
But I couldn't really "hand the models the wheel." They weren't trustworthy enough, easily lost the plot, failed to leverage important context right in front of them in the codebase, etc. You could see that there was potential there, but it felt pretty far away.
Something changed this spring. Gemini 2.5 Pro, Claude 4 models, o3 and o4-mini -- I'm starting to give the models the wheel now. They're good. They understand context. They understand the style of the codebase. And they of course bring the immense knowledge they've always had.
It's eerie to see, and to think about what comes with the next wave of models coming very soon. And if the last time you really gave model-driven programming a go was 6 months or more ago, you probably have no idea what's about to happen.