AI makes faster both engineer and chaos
What industry data on AI for productivity actually tells us
Somewhere around mid-2025, I started noticing something across our engineering org that felt off.
Like everyone else, we were enthusiastically adopting AI coding tools. Team leads were excited. Throughput was up. Sprint delivery and speed numbers looked great on dashboards.
And I was getting more and more nervous.
Not because anyone was doing anything wrong - the enthusiasm was genuine and well-intentioned. But because the signals I was seeing didn't match the celebration. PR sizes growing. More code shipping without meaningful review. Stability metrics dipping. Rework going up. The kind of early warnings that, if you've been in engineering leadership long enough, you know are worth paying attention to.
When I dug in, I realized the enthusiasm was simply outpacing the measurement. The focus was on throughput gains - and those were real - but the metrics for what those gains were doing to review quality, stability, and long-term architecture hadn't caught up yet.
It's a pattern I've since heard about across the entire industry. And honestly, in some cases things are quite bad and look a lot like cargo cult engineering - slapping an "AI in our SDLC" badge on the org and celebrating throughput numbers, without actually thinking through long-term implications or developing a real transformation strategy. A headless chicken with an "AI-powered" label stuck to its side. Running faster than ever. No idea where it's going.
In an application area where quality issues can bring severe reputational damage, that gap between enthusiasm and measurement is something you can't afford to leave open for long.
AI is an amplifier, not (yet) a silver bullet
The 2025 DORA Report - based on nearly 5,000 technology professionals - frames it better than I can: AI's primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.
That's the sentence I wish every CEO would read before forwarding the next "Microsoft writes 30% of code with AI" headline to their CTO.
And by the way - in this interview, a CTO of a developer productivity measurement company traced that "30% of code" claim back to accepted autocomplete suggestions. As she puts it: your linter has been running on 100% of your PRs for years. Can you imagine the headline? "Acme Corp only ships code to production that's been read by robots." We could have said our code was "machine generated" back when IDE autocomplete was filling in class names.
The data is more nuanced than the headlines suggest. Yes, 90% of developers now use AI as part of their work. Yes, AI adoption now improves software delivery throughput - a shift from 2024, where it didn't. But AI adoption still increases delivery instability. Teams are adapting for speed. Their underlying systems have not evolved to safely manage AI-accelerated development.
Faros AI's analysis of over 10,000 developers across 1,255 teams found the same pattern from a different angle: while team-level changes are measurable, there is no significant correlation between AI adoption and improvements at the company level. Throughput, lead time, incident resolution - all flat when you zoom out.
So here's the uncomfortable truth. If your org was already good at delivery - clean architecture, fast feedback loops, strong testing - AI probably makes you better. If your org was already struggling with those things, AI just makes you struggle faster.
The numbers most teams aren't watching
I want to share some specific numbers, because this is where it gets concrete and a bit alarming.
PR size is inflating. Faros AI found a 154% increase in average PR size associated with high AI adoption. Jellyfish, analyzing data from 500+ companies, found that going from low to high AI adoption corresponded to an 18% increase in lines added per PR. And here's why that matters: PRs over 1,000 lines have a 70% lower defect detection rate compared to smaller ones. Detection drops from 87% for small PRs (under 100 lines) to just 28% for large ones.
So we're generating bigger PRs that are harder to review. And predictably...
Review time is ballooning. PR review time increased by 91% in teams with high AI adoption, according to Faros AI. Developers are merging 98% more pull requests - but the human review process hasn't gotten any faster. It's become the bottleneck.
Bugs are going up. A 9% increase in bugs per developer. Not dramatic in isolation. But combine it with bigger PRs and overwhelmed reviewers and you have a compounding quality problem.
Code quality is eroding in ways that don't show up immediately. GitClear analyzed 211 million lines of code (2020-2024) and found that copy-pasted code surged from 8.3% to 12.3%, while code refactoring dropped from 24.1% to just 9.5%. Code churn - new code requiring revision within two weeks - nearly doubled, from 3.1% to 5.7%. Copy-paste exceeded refactored code for the first time in the dataset's history.
And a Stanford-backed study of 100,000+ developers found that AI productivity gains ranged from 30-40% for simple greenfield tasks down to 0-10% for complex brownfield work. For complex tasks in less popular languages, AI actually decreased productivity.
These aren't opinions. This is data from multiple independent research groups, across hundreds of companies and tens of thousands of developers, all pointing in the same direction: AI generates more code, faster. But more code is not the same as better outcomes.
One expert's take on this stuck with me: "Source code is a liability." We're now in a world where it's trivially easy to produce a tremendous amount of it. That should make us more careful, not less.
The real shift: from writing code to reviewing it
Here's the thing most people miss about AI in development, and it's what I've come to think of as the most important mental model for engineering leaders to internalize.
AI shifts the developer's cognitive focus from writing to reviewing.
Typing speed was never the bottleneck. On the best day, developers spend maybe 20-25% of their time actually writing code (there's an AWS study that found 20% for their average engineer). AI makes that 20% faster - great - but it doesn't make the other 80% disappear.
What actually happens: AI generates code at superhuman speed, but someone still has to review that code, understand it, verify it's correct, and make sure it fits the existing architecture. That "someone" is your developer, who now spends less time in the creative act of writing (which, by the way, is the part most developers enjoy) and more time in the cognitively demanding act of reviewing AI-generated output.
DORA's research found that many developers actually feel less satisfied after AI adoption because AI accelerates the parts they enjoy - and what's left is more toil, more meetings, more review work. Which, if you think about it, is kind of the opposite of the promise.
And this isn't just a feeling. The METR study found that experienced open-source developers took 19% longer to complete tasks with AI, despite believing they were 20% faster. The cognitive overhead of reviewing and correcting AI output ate the time savings and then some.
So here I formulated for myself another important rule I try to follow: AI makes everything faster. Also, it makes chaos faster. If your developers are generating 2x the code but your review process hasn't evolved, you're not being more productive. You're accumulating risk at 2x the rate.
What data-driven AI adoption actually looks like
So what should you actually do? I want to share a reasoning framework rather than a prescriptive checklist, because every org is different. But these are the questions I've learned (sometimes the hard way) to ask.
Look honestly at your codebase. Is your primary language popular and well-supported by AI models - Python, JavaScript, TypeScript, Java? Or are you working in something more niche? The Stanford data shows this matters enormously. Are you mostly greenfield or brownfield? If you're maintaining a large, mature codebase (which most enterprises are), set your expectations accordingly. AI won't deliver the 30-40% gain you saw in the demo.
Be especially careful with your critical path. That 5% of your code that can break everything, the last 1% of performance optimization on your mobile app, the core domain logic your business depends on - these are exactly the areas where AI's gains are smallest and the cost of errors is highest. Use AI freely for boilerplate. Be very deliberate about using it for the stuff that really matters.
Decompose your cycle time and watch the review phase. This is probably the single most actionable metric. If coding time is going down but review time is going up - and overall cycle time isn't improving - AI is just moving the bottleneck, not eliminating it. That's a signal to invest in review process improvements, not to celebrate faster code generation.
Watch your change failure rate and quality metrics. If CFR is climbing alongside AI adoption, that's your canary. The Faros AI data showed a 9% increase in bugs per developer - for some orgs, that's acceptable. For others (like mine), it's not.
Track PR size and code review quality together. If PRs are growing and meaningful review comments per PR are shrinking, your review process is being overwhelmed. The data on reviewer fatigue with large PRs is pretty clear - extra-large PRs receive fewer meaningful comments, not more.
Validate that planned architecture actually shipped. This one is less about dashboards and more about discipline. AI is very good at generating code that works right now and very bad at maintaining long-term architectural coherence. If you have no way to verify that generated code actually follows your intended architecture, you'll discover the drift six months later when something breaks and nobody understands why.
The organizations getting this right - and this is consistent across DORA's research, industry case studies, and I can also see other reports share that one trait: they treat AI adoption as an experiment, not a mandate. They set a baseline, define hypotheses, measure the results. They don't just hand out licenses and hope.
To conclude, I still don't have all of this figured out for myself yet, as the more long-term impact is yet to be seen. Not to mention how fast the industry changes. But I think I've seen enough to know that the teams measuring are the ones making progress - and the teams riding the hype are the ones who'll be untangling the mess six months from now.