Six months ago, we added AI-powered code review to every pull request. We tried three tools -- CodeRabbit, Sourcery, and a custom Claude-based solution -- tracked every comment, and measured whether developers followed them.
The summary: AI code review is genuinely useful for a narrow set of tasks and actively harmful for everything else.
What works well. Consistency violations: the AI catches naming convention deviations, missing error handling patterns, and deprecated code patterns that human reviewers miss from tedium. Our Claude reviewer caught an average of 2.3 consistency issues per PR. Security issues: hardcoded secrets, SQL injection vulnerabilities, missing input validation. It found four genuine security issues in six months that made it past human review. Documentation gaps: flagging complex code that lacks explanatory comments.
What fails. Architectural feedback: the AI cannot tell you your approach is fundamentally wrong. It operates at the line level, not the system level. Context-dependent logic: it does not understand your business domain or authorization model. Nuance: in month one, 40 percent of AI comments were noise. We reduced that to 15 percent through heavy prompt customization, but that still means one to two useless comments per PR.
Our workflow: high-confidence comments (security issues, definite bugs) block the PR. Medium-confidence comments (consistency, documentation) appear as dismissible suggestions. Low-confidence comments are collapsed by default. Monthly cost is roughly eighty dollars for sixty PRs per week.
One thing we did not expect: the AI review improved our own coding habits. Knowing that the AI would flag consistency issues made us more disciplined about following our own conventions. It is an accountability mechanism as much as a review tool.
The verdict: AI code review supplements human review, it does not replace it. It catches the boring stuff so humans focus on the important stuff. If you use it to reduce human review time, code quality drops. We use it to make human review more focused, not shorter. The AI handles the checklist. The human handles the judgment. That split is where the real value lies.
About the Author
Fordel Studios
AI-native app development for startups and growing teams. 14+ years of experience shipping production software.
Not every feature needs AI. We developed a framework for evaluating whether an AI-powered approach delivers enough value over traditional logic to justify the complexity and cost.
The industry is fixated on chatbots. Meanwhile, the highest-ROI AI features we have shipped are multimodal applications that combine vision, text, and structured data extraction.

While everyone debates GPT-4o vs Claude, we quietly moved most of our production workloads to Gemini Flash Lite. The performance-to-cost ratio is unmatched for structured tasks.
We love talking shop. If this article resonated, let's connect.
Start a ConversationTell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.
Start a Conversation