The New York Times ran a headline this week: "A.I. Has Created a Code Overload." Meanwhile, the App Store saw an 84% surge in new apps. Everyone is celebrating. I have been shipping production software for 14 years and I have never been more worried about where this industry is headed.
What Problem Were AI Coding Tools Supposed to Solve?
The pitch was simple. Developers spend too much time writing boilerplate. AI can generate code faster than humans can type. Give every engineer a copilot and watch productivity soar.
It sounded reasonable. It was also wrong.
Every study on how software engineers actually spend their time tells the same story. A Microsoft Research study found that developers spend only 5% of their time writing new code. A University of Victoria study put the number at under 10%. The rest? Reading existing code, understanding requirements, debugging, reviewing pull requests, attending meetings, documenting decisions, and staring at architecture diagrams trying to figure out why the system does what it does.
AI coding tools took that 5% and made it 10x faster. Congratulations. You just optimized five percent of the job.
Why Did Nobody Question the Premise?
Because generating code feels productive. You type a comment, Cursor fills in forty lines, you hit Tab, and dopamine fires. You feel like you are shipping. The feedback loop is instant and addictive.
But feeling productive and being productive are different things. A developer who generates 500 lines of AI code in an hour has not produced 500 lines of value. They have produced 500 lines of liability that someone — probably them, probably at 2 AM during an incident — will need to understand, debug, and maintain.
“Generating code is the easiest part of software engineering. It always was. The hard part is knowing what code to write, where to put it, and what happens when it breaks at scale.”
The GitClear 2025 code quality report showed that AI-assisted codebases have 39% more code churn — code that gets written, then rewritten or deleted within two weeks. That is not productivity. That is a team running in circles while the velocity dashboard shows green.
What Does the Code Overload Actually Look Like?
I run an engineering consultancy. We audit codebases. Here is what we are seeing in 2026:
- Codebases growing 3-5x faster than the team’s ability to comprehend them
- Pull requests with 800+ lines where the author cannot explain what half the code does
- Duplicate logic scattered across files because the AI did not know about existing utilities
- Test suites that pass but test the wrong things — high coverage, low confidence
- Architectural decisions made by the model, not the engineer, with no documentation of why
Every one of these is a maintenance time bomb. And maintenance is where the actual engineering hours go. You do not spend 14 years in this industry without learning that the code you write today is the code someone curses at in 18 months.
The METR study on AI coding tool productivity found that experienced open-source developers using AI tools were actually 19% slower on real tasks than without AI assistance. Not because the tools generated bad code, but because the developers spent more time reviewing, debugging, and re-prompting than they would have spent just writing the code themselves.
Read that again. The tools made experienced developers slower. The people who should benefit the most from AI code generation — those who can evaluate output quality — ended up worse off because the overhead of managing the AI exceeded the time saved by not typing.
Is the App Store Surge Evidence That AI Tools Work?
No. It is evidence that AI tools generate apps. Generating an app and shipping a product are not the same thing.
Apple’s App Store saw an 84% surge in new app submissions in Q1 2026. The AI coding tools crowd is waving this around like a trophy. But look at the retention data: the average new app in 2026 has a 30-day retention rate of 4.1%, down from 6.8% in 2024. More apps, worse apps.
This is the Cambrian explosion of software slop. Replit Agent, Lovable, Bolt, and a dozen other tools can scaffold an app in minutes. But none of them can tell you whether the app should exist, who it serves, or how it handles the edge case where a user submits the same form twice on a flaky connection.
The bottleneck was never "I cannot build this app fast enough." The bottleneck was always "I do not understand the problem well enough to build the right thing."
What Are AI Coding Tools Actually Good At?
I am not an AI doomer. I use Claude Code every day. I use Cursor. I pay for both. They are genuinely useful for a narrow set of tasks:
- Boilerplate generation for well-understood patterns (API routes, CRUD, config files)
- Language translation — converting working code from one language to another
- Exploration — quickly prototyping an approach to validate feasibility
- Documentation — generating inline docs and README content from existing code
- Regex and one-off scripts — tasks where correctness is immediately verifiable
Notice the pattern. These are all tasks where the human already knows what the output should look like and can verify correctness instantly. The AI is acting as a fast typist, not a thinking partner.
The moment you need the AI to make a judgment call — which database pattern to use, how to handle partial failures, whether to denormalize for read performance — you are in trouble. Not because the AI gives wrong answers, but because it gives confident answers to questions that require context it does not have.
“An AI that confidently picks the wrong architecture is worse than no AI at all, because now you have conviction without understanding.”
What About Autonomous AI Agents That Code for Hours?
This week, Z.ai announced GLM-5.1, an AI coding agent that can "run autonomously for hours." Devin promises similar autonomous capability. The pitch: set it loose on a task and come back to a finished feature.
This is the logical conclusion of optimizing for code generation speed, and it terrifies me.
An AI agent that writes code for three hours unsupervised will generate a lot of code. It will also generate a lot of decisions. Each line is a micro-decision about naming, structure, error handling, and control flow. After three hours, you have thousands of decisions that no human reviewed, no human understood, and no human can efficiently audit.
You have not automated engineering. You have automated the creation of a legacy system.
The clean code movement spent two decades arguing that code is read 10x more than it is written. AI coding tools respond by making writing 10x faster while making reading 10x harder. That is not progress. That is an inversion of priorities.
What Should AI Coding Tools Actually Optimize For?
If writing code is 5% of the job, what is the other 95%? And why is nobody building tools for it?
- Understanding existing code — 35% (reading, tracing, comprehending)
- Requirements and design — 20% (meetings, specs, architecture decisions)
- Debugging and incident response — 15% (root cause analysis, firefighting)
- Code review and knowledge transfer — 10% (PRs, pairing, documentation)
- Testing and validation — 10% (writing tests, manual QA, staging verification)
- Writing new code — 5% (the part AI tools optimized)
The biggest slice — understanding existing code — is where AI could genuinely transform productivity. Not by generating more code, but by explaining the code that already exists. Show me the call graph. Tell me why this function was written this way. Surface the three places this change will break something. Map the data flow from user input to database write.
Some tools are starting to do this. Cursor’s codebase indexing is a step. Claude Code’s ability to read and reason about files is useful. But these are side features, not the product. The product is still "type less, generate more." The marketing is still "10x your code output." The metrics are still lines generated per hour.
We are measuring the wrong thing and optimizing for the measurement.
Is This the Software Industry’s Goodhart’s Law Moment?
Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.
The industry decided that code output volume was a proxy for developer productivity. AI tools maximized that proxy. Now we have more code than ever and no evidence that we are building better software, shipping faster to users, or reducing the cost of maintenance.
The New York Times piece on code overload quotes a VP of Engineering at a Fortune 500 company: "We are generating code faster than we can understand it. Our velocity metrics have never looked better. Our incident rate has never been higher."
That quote should be on a billboard outside every AI coding tool company’s office.
Velocity metrics up. Incident rates up. These two facts cannot coexist in a world where AI tools are making engineering genuinely more productive. One of them is lying, and it is not the incident rate.
“If your velocity metrics went up and your incident rate also went up, your velocity metrics are lying to you. Incidents do not lie.”
What Is the Steel-Man Argument for AI Coding Tools?
The strongest counterargument goes like this: AI coding tools lower the barrier to entry. More people can build software. Democratization of creation is inherently good. Even if individual app quality drops, the aggregate value created by millions of new builders exceeds the cost of lower average quality.
This argument has merit. I have seen non-technical founders prototype ideas in Lovable that would have cost them $50K and three months with a dev shop. That is real value creation. Shortening the feedback loop between idea and working prototype is genuinely transformative for product discovery.
But this argument confuses prototyping with production engineering. A prototype that validates an idea in a weekend is wonderful. That same prototype running in production serving real users with real data and real money on the line is negligence. The AI coding tool ecosystem has no good answer for the gap between "it works on my machine" and "it works at scale, reliably, for years."
What Needs to Change?
Three things.
First: the metrics need to change.
Stop measuring lines of code generated. Start measuring time-to-comprehension: how long does it take a new engineer to understand this module? Measure code half-life: how long does code survive before it needs to be rewritten? Measure incident attribution: what percentage of production incidents trace back to AI-generated code that was merged without full human understanding?
Second: the tools need to change.
Build AI tools that help engineers understand, not just generate. I want a tool that reads a 10,000-line module and gives me a five-minute briefing on its architecture, its failure modes, and its implicit assumptions. I want a tool that watches a PR and says "this changes the retry behavior in the payment flow, here are the three downstream services that depend on the current behavior." I want comprehension amplification, not generation acceleration.
Third: the culture needs to change.
Engineering leaders need to stop rewarding output volume and start rewarding understanding. The engineer who reads the codebase for two hours before writing ten lines of surgical code is more productive than the engineer who generates 500 lines in the same time. We knew this before AI tools existed. We seem to have forgotten.
Where Does This Leave Us?
The AI coding tools industry built a $10 billion market around making the easy part of software engineering easier. Writing code was already the fastest, most well-understood, most automatable part of the job. And they automated it. Bravo.
Now we have more code than ever, more apps than ever, more repositories than ever, and roughly the same number of engineers who actually understand what any of it does. The comprehension debt is compounding. Every AI-generated line that ships without deep human understanding is a withdrawal from the maintenance budget of the future.
The correction is coming. Not because AI coding tools are bad — they are genuinely useful within their scope. But because the industry oversold what that scope is, and the bill for the overselling comes due when the code needs to be maintained, debugged, and evolved by humans who did not write it and cannot read it.
“We did not have a code generation problem. We had a code comprehension problem. AI tools made the first one trivially easy and the second one significantly harder. That is not progress. That is a trade we should not have made.”
Fourteen years in this industry taught me one thing that has never stopped being true: the hardest part of software engineering is understanding. Understanding the problem, understanding the system, understanding the code, understanding the humans who will use it and the humans who will maintain it.
AI coding tools optimized for everything except understanding. And we are about to find out what that costs.





