Somehow this article explains perfectly, visually, how AI generated code differs from human generated code as well.
You see the exact same patterns. AI uses more code to accomplish the same thing, less efficiently.
I'm not even an AI hater. It's just a fact.
The human then has to go through and cleanup that code if you want to deliver a high-quality product.
Similarly, you can slap that AI generated 3D model right into your game engine, with its terrible topology and have it perform "ok". As you add more of these terrible models, you end up with crap performance but who cares, you delivered the game on-time right? A human can then go and slave away fixing the terrible topology and textures and take longer than they would have if the object had been modeled correctly to begin with.
The comparison of edge-loops to "high quality code" is also one that I mentally draw. High quality code can be a joy to extend and build upon.
Low quality code is like the dense mesh pictured. You have a million cross interactions and side-effects. Half the time it's easier to gut the whole thing and build a better system.
Again, I use AI models daily but AI for tools is different from AI for large products. The large products will demand the bulk of your time constantly refactoring and cleaning the code (with AI as well) -- such that you lose nearly all of the perceived speed enhancements.
That is, if you care about a high quality codebase and product...
"High-quality code can be a joy to extend and build upon." I love the analogy here. It is a perfect parallel to how a good 3D model is a delight to extend. Some of the better modelers we've worked with return a model that is so incredibly lightweight, easily modifiable, and looks like the real thing that I am amazed each time.
The good thing about 3D slop vs. code slop is that it is so much easier to spot at first glance. A sloppy model immediately looks sloppy to nearly any untrained eye. But on closer look at the mesh, UVs, and texture, a trained eye is able to spot just how sloppy it truly is. Whereas with code, the untrained eye will have no idea how bad that code truly is. And as we all know now, this is creating an insane amount of security vulnerabilities in production.
We will get an interesting effect if AI plateaus around where it does now, which is that AI code generation will bring "the long run" right down to "the medium run" if not on to the longer side of the short run. AI can take out technical debt an order of magnitude faster than human developers, easily, and I'm still waiting for it to recognize that an abstraction is necessary and invest into putting on in the code rather than spending the ones already present.
Of course if AI continues to proceed forward and we get to the point where the AIs can do that then they really will be able to craft vast code bases at speeds we could never keep up with on our own. However, I'm not particularly convinced LLMs are going to advance past this particular point, to a large degree because their training data contains so much of this slop approach to coding. Someone's going to have to come up with the next iteration of AI tech, I think.
I wonder about heavy curation of data sets, and then only senior level developers in the Alignment/RLHF phases, such that the expertise of a senior level developer were the training. The psychology of those senior level developers would be interesting, because they would knowingly be putting huge numbers of their peers, globally, out of work. I wonder about if it would, then if course it will, and then I question if we're really that desperate.
debt doesn't harm you until the carrying costs become to high v profits. Just have to hit that point (if is exists, maybe growth accelerates forever if you are optimistic).
If you only knew how the enterprise space does stuff you'd realize how little a priority maintainability is.
I'm grateful we had Java when this stuff was taking off; if any enterprise applications were written in anything else available at the time (like C/C++) we'd all suffer even more memory leaks, security vulnerabilities, and data breaches than we do now.
Now that's interesting, because I come from a world where enterprise level stuff was all done in C/C++ until quite recently, and with the shift to :web technologies" the quality of virtually everything has dropped through the floor, including the knowledge and skill level of the developers working on the tech. It is rare that I see people that have been working in excess of 10 years post graduation, if they went to college. The college grads have been pushed out by lower quality and lower skilled React developers that really do not belong in the industry at all. It's really a crime how low things have gotten, in such a short time: 10 to 15 years ago there were 2-3 decades of experienced people all over the place. Not anymore.
After 2 days of giving it a go, I find that Gemini CLI is still considerably worse than both Codex and Claude Code.
The model itself also has strange behaviors that seem like it gets randomly replaced with Gemini-3-Flash or something else. I'll explain.
Once agentic coding was a bust, I gave it a run as a daily driver for AI assistant. It performed fairly well but then began behaving strangely. It would lose context mid conversation. For instance, I said "In san francisco I'm looking for XYZ". Two turns later I'm asking about food and it gives me suggestions all over the world.
Another time, I asked it about the likelihood of the pending east coast winter storm of affecting my flight. I gave it all the details (flight, stops, time, cities).
Both GPT-5.2 and Claude crunched and came back with high quality estimations and rationale. Gemini 3.1 Pro... 5 times, returned a weather forecast widget for either the layover or final destination. This was on "Pro" reasoning, the highest exposed on the Gemini App/WebApp. I've always suspected Google swaps out models randomly so this.. wasn't surprising.
I then asked Gemini 3.1 Pro via the API and it returned a response similar to Claude and GPT-5.2 -- carefully considering all factors.
This tells me that a Google AI Ultra subscription gives me a sub-par coding agent which often swaps in Flash models, a sub-par web/app AI experience that also isn't using the advertised SOTA models, and a bunch of preview apps for video gen, audio gen (crashed every time I attempted), and world gen (Genie was interesting but a toy).
This will be a quick cancel as soon as the intro rate is done.
It's like Google doesn't ACTUALLY want to be the leader in AI or serve people their best models. They want to generate hype around benchmarks and then nerf the model and go silent.
Gemini 3 Pro Preview went from exceptional in the first month to mediocre and then out of my rotation within a month.
Gemini 3.1 (and Gemini 3) are a lot smarter than Claude Opus 4.6
But...
Gemini 3 series are both mediocre at best in agentic coding.
Single shot question(s) about a code problem vs "build this feature autonomously".
Gemini's CLI harness is just not very good and Gemini's approach to agentic coding leaves a lot to be desired. It doesn't perform the double-checking that Codex does, it's slower than Claude, it runs off and does things without asking and not clearly explaining why.
My experience is that on large codebases that get tricky problems, you eventually get an answer quicker if you can send _all_ the context to a relevant large model to crunch on it for a long period of time.
Last night I was happily coding away with Codex after writing off Gemini CLI yet again due to weirdness in the CLI tooling.
I ran into a very tedious problem that all of the agents failed to diagnose and were confidently patching random things as solutions back and forth (Claude Code - Opus 4.6, GPT-5.3 Codex, Gemini 3 Pro CLI).
I took a step back, used python script to extract all of the relevant codebase, and popped open the browser and had Gemini-3-Pro set to Pro (highest) reasoning, and GPT-5.2 Pro crunch on it.
They took a good while thinking.
But, they narrowed the problem down to a complex interaction between texture origins, polygon rotations, and a mirroring implementation that was causing issues for one single "player model" running through a scene and not every other model in the scene. You'd think the "spot the difference" would make the problem easier. It did not.
I then took Gemini's proposal and passed it to GPT-5.3-Codex to implement. It actually pushed back and said "I want to do some research because I think there's a better code solution to this". Wait a bit. It solved the problem in the most elegant and compatible way possible.
So, that's a long winded way to say that there _is_ a use for a very smart model that only works in the browser or via API tooling, so long as it has a large context and can think for ages.
My hopes are on harness engineering allowing cheaper (but still large) models to shine. I'm evaluating DeepSeek because it would allow insane agent armies. Although DeepSeek charges for thinking tokens, something easy to overlook.
DeepSeek has the tendency to think... a lot!. Without a good harness I can't evaluate it well; time will tell.
OpenAI doesn't; it's embedded into the price, I think.
Cheap = we can run 10x the workloads, bigger imagination = innovation. Maybe 10 dumb agents in a loop can beat 1 Opus? Haha.
This is something that I was thinking about today. We're at the point where anyone can vibe code a product that "appears" to work. There's going to be a glut of garbage.
It used to be that getting to that point required a lot of effort. So, in producing something large, there were quality indicators, and you could calibrate your expectations based on this.
Nowadays, you can get the large thing done - meanwhile the internal codebase is a mess and held together with AI duct-tape.
In the past, this codebase wouldn't scale, the devs would quit, the project would stall, and most of the time the things written poorly would die off. Not every time, but most of the time -- or at least until someone wrote the thing better/faster/more efficiently.
How can you differentiate between 10 identical products, 9 of which were vibecoded, and 1 of which wasn't. The one which wasn't might actually recover your backups when it fails. The other 9, whoops, never tested that codepath. Customers won't know until the edge cases happen.
It's the app store affect but magnified and applied to everything. Search for a product, find 200 near-identical apps, all somehow "official" -- 90% of which are scams or low-effort trash.
To play devil's advocate, if you were serious about building a product, whether it was hand-coded or vibe-coded, you would iterate through the work and implement functionalities step-by-step.
But with vibe-coding, you might not give enough thoughts about the product to think of use cases. I think you can still build good software with varying degrees of AI assistance, but it takes the same effort of testing and user feedback to make it great.
The other element here is that the vibecoder hasn't done the interesting thing, they've pulled other people's interesting things.
Let's see, how to say this less inflamatory..
(just did this) I sit here in a hotel and I wondered if I could do some fancy video processing on the video feed from my laptop to turn it into a wildlife cam to capture the birds who keep flying by.
I ask Codex to whip something up. I iterate a few times, I ask why processing is slow, it suggests a DNN. I tell it to go ahead and add GPU support while its at it.
In a short period of time, I have an app that is processing video, doing all of the detection, applying the correct models, and works.
It's impressive _to me_ but it's not lost on me that all of the hard parts were done by someone else. Someone wrote the video library, someone wrote the easy python video parsers, someone trained and supplied the neural networks, someone did the hard work of writing a CUDA/GPU support library that 'just works'.
I get to slap this all together.
In some ways, that's the essence of software engineering. Building on the infinite layers of abstractions built by others.
In other ways, it doesn't feel earned. It feels hollow in some way and demoing or sharing that code feels equally hollow. "Look at this thing that I had AI copy-paste together!"
To me, part of what makes it feel hollow is that if we were to ask you about any of those layers, and why they were chosen or how they worked, you probably would stumble through an answer.
And for something that is, as you said, impressive to you, that's fine! But the spirit of Show HN is that there was some friction involved, some learning process that you went through, that resulted in the GitHub link at the top.
I knew i could do better so i made a version that is about 15kb and solves a fundamental issue with web gl context limits while being significantly faster.
AI helped do alot of code esp around the compute shaders. However, i had the idea of how to solve the context limits. I also pushed past several perf bottlenecks that were from my fundamental lack of webgpu knowledge and in the process deepened my understanding of it. Pushing the bundle size down also stretched my understanding of js build ecosystems and why web workers still are not more common (special bundler setting for workers breaks often)
Btw my version is on npm/github as chartai. You tell me if that is ai slop. I dont think it is but i could be wrong
If you visualize it as AI Agents throwing a rope to wrangle a problem, and then visualize a dozen of these agents throwing their ropes around a room, and at each other -- very quickly you'll also visualize the mess of code that a collections of agents creates without oversight. It might even run, some might say that's the only true point but... at what cost in code complexity, performance waste, cascading bugs, etc.
Is it possible? Yes, I've had success with having a model output a 100 step plan that tried to deconflict among multiple agents. Without re-creating 'Gas town', I could not get the agents to operate without stepping on toes. With _me_ as the grand coordinator, I was able to execute and replicate a SaaS product (at a surface level) in about 24hrs. Output was around 100k lines of code (without counting css/js).
Who can prove that it works correctly though? An AI enthusiasts will say "as long as you've got test coverage blah blah blah". Those who have worked large scale products know that tests passing is basically "bare minimum". So you smoke test it, hope you've got all the paths, and toss it up and try to collect money from people? I don't know. If _this_ is the future, this will collapse under the weight of garbage code, security and privacy breaches, and who knows what else.
Right but, do you or the founder have actual responses to the story posted? It seemed to give RentAhuman the benefit of the doubt every step of the way. The site doesn't work as advertised, appears to be begging for hype, got a reporter to check it out, and it didn't work.
That's life. Can't win them all. Lesson here is the product wasn't ready for primetime and you were given a massive freebie for free press both via Wired _and_ this crosspost.
Better strategy is to actually layout what works, what's the roadmap so anyone partially interested might see it when they stumble into this post.
Or jot it down as a failed experiment and move on.
You see the exact same patterns. AI uses more code to accomplish the same thing, less efficiently.
I'm not even an AI hater. It's just a fact.
The human then has to go through and cleanup that code if you want to deliver a high-quality product.
Similarly, you can slap that AI generated 3D model right into your game engine, with its terrible topology and have it perform "ok". As you add more of these terrible models, you end up with crap performance but who cares, you delivered the game on-time right? A human can then go and slave away fixing the terrible topology and textures and take longer than they would have if the object had been modeled correctly to begin with.
The comparison of edge-loops to "high quality code" is also one that I mentally draw. High quality code can be a joy to extend and build upon.
Low quality code is like the dense mesh pictured. You have a million cross interactions and side-effects. Half the time it's easier to gut the whole thing and build a better system.
Again, I use AI models daily but AI for tools is different from AI for large products. The large products will demand the bulk of your time constantly refactoring and cleaning the code (with AI as well) -- such that you lose nearly all of the perceived speed enhancements.
That is, if you care about a high quality codebase and product...
reply