12 Things to Check Before You Merge AI-Generated Code

AI coding tools produce incorrect code 25% of the time. Here are the 12 specific checks every engineer should run before merging AI-generated code — from hallucinated imports to tautological tests.

Abhishek Sharma· Founder, Fordel Studios

April 5, 2026Updated May 8, 20267 min read

12 Things to Check Before You Merge AI-Generated Code

You have probably merged AI-generated code this week. Here is the uncomfortable question: did you review it the same way you review human code, or did you skim it because it looked clean?

How often does AI-generated code actually cause problems?

The data is not reassuring. A University of Waterloo study published in early 2026 found that AI coding assistants produce incorrect or subtly flawed code in roughly one out of every four outputs. Amazon recently acknowledged that AI-generated code has not delivered the productivity gains they expected, largely because of downstream quality issues. The pattern is consistent: AI code looks syntactically correct, passes a cursory glance, and then breaks in production under edge cases the model never considered.

The core problem is not that AI writes bad code. It is that AI writes plausible code — code that reads well, follows conventions, and hides its bugs behind familiar patterns. Human-written bugs tend to be obviously ugly. AI-written bugs tend to be quietly wrong.

25%AI code error rateUniversity of Waterloo, 2026 — across Copilot, Cursor, and Claude Code

···

What should you check in every AI-generated pull request?

1. Verify every import actually exists in your dependency tree. AI models hallucinate package names and API methods with alarming confidence. Run your build before anything else. If it references a package you have never seen, check npm or PyPI — it may not exist, or worse, it may be a typosquatted package that does exist and contains malware.

2. Trace every function call to its actual signature. AI frequently calls functions with the wrong number of arguments, incorrect parameter types, or deprecated overloads. Do not trust that the method signature is correct just because the model used it confidently. Open the source or the docs and verify.

3. Read the error handling paths, not just the happy path. AI-generated code almost always handles the success case correctly. The failure modes are where it falls apart — empty catch blocks, swallowed errors, missing null checks, or try-catch blocks that catch too broadly and mask real failures.

4. Check for hardcoded values that should be configuration. Models love to inline API URLs, timeout values, retry counts, and feature flags directly into the code. These work in development and become production incidents the moment your environment changes. Every magic number and string literal needs scrutiny.

5. Validate that database queries use parameterized inputs. AI-generated SQL and ORM code sometimes interpolates user input directly into query strings. This is not a theoretical concern — it is the most common security vulnerability in AI-generated backend code. If the query touches user input, check for injection vectors.

···

What about the less obvious failure modes?

6. Test the boundary conditions the AI did not mention. If the function handles a list, what happens with an empty list? A list of one? A list of ten million? AI rarely generates code that accounts for scale or degenerate inputs. Write at least three edge-case tests for any non-trivial function the AI produced.

7. Confirm the code does not duplicate logic that already exists in your codebase. AI has no memory of your existing utilities, helpers, or shared modules. It will happily rewrite your date formatting function, your error handler, or your API client from scratch — slightly differently each time, creating maintenance nightmares.

8. Audit the types for semantic correctness, not just compilation. TypeScript will tell you if the types compile. It will not tell you if a field typed as string should actually be a branded type, an enum, or a union. AI-generated types tend to be overly permissive — string where you need EmailAddress, number where you need PositiveInteger.

9. Check that async operations have proper cancellation and cleanup. AI generates async/await code that starts operations but rarely cleans them up. Look for missing AbortController signals, unclosed database connections, event listeners that are never removed, and intervals that are never cleared.

10. Verify the code respects your existing error and logging conventions. Every codebase has patterns — structured logging, specific error classes, error codes, monitoring integrations. AI will use console.log where you need your structured logger. It will throw generic Errors where you need your domain-specific AppError. Consistency matters more than correctness here.

“AI writes plausible code. Your job is to verify it is actually correct, not just convincing.”

Abhishek Sharma

···

What process changes make AI code reviews sustainable?

11. Run your full test suite, not just the tests the AI wrote. AI-generated tests have a well-documented tendency to test what the code does rather than what it should do — they are tautological. A function that returns the wrong result will have an AI-generated test that asserts the wrong result. Your existing tests are the safety net. Run them all.

12. Diff against the actual requirement, not just the previous code. The most insidious AI code review failure is accepting code that is well-written but solves the wrong problem. Before approving, re-read the ticket or spec and confirm the code actually addresses what was asked for — not what the AI interpreted from your prompt.

Is reviewing AI code really that different from reviewing human code?

Yes, and the difference is directional. When you review human code, you are looking for mistakes a tired person might make — typos, off-by-one errors, forgotten edge cases. When you review AI code, you are looking for mistakes a confident pattern-matcher makes — plausible but wrong API usage, subtly incorrect logic that reads beautifully, and solutions that ignore the context of your specific codebase.

The engineers who catch these bugs are not the ones who read AI output faster. They are the ones who maintain healthy skepticism about code that looks too clean. Treat every AI pull request as code from a brilliant but unreliable contractor who has never seen your codebase before — because that is exactly what it is.

AI Code Review Checklist — Summary

Verify every import exists in your actual dependency tree
Trace function calls to their real signatures and docs
Read error handling paths — AI hides bugs in catch blocks
Flag hardcoded values that should be environment config
Check all database queries for parameterized inputs
Write edge-case tests the AI did not generate
Search for duplicated logic that already exists in your codebase
Audit types for semantic correctness, not just compilation
Confirm async operations have cleanup and cancellation
Verify logging and error conventions match your patterns
Run your existing test suite, not just AI-generated tests
Diff against the requirement, not just the previous code

Part of: Fordel pillar guide

AI Agent Testing & Eval

Fordel's pillar guide to agent QA — eval harnesses, regression detection, and testing software that makes its own decisions.

Read the full guide →

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

Keep Reading

All articles

12 Things to Check Before You Merge AI-Generated Code

How often does AI-generated code actually cause problems?

What should you check in every AI-generated pull request?

What about the less obvious failure modes?

What process changes make AI code reviews sustainable?

Is reviewing AI code really that different from reviewing human code?

AI Agent Testing & Eval

Related articles

How We Built CI Quality Gates for AI-Generated Code at Fordel (And What We’d Do Differently)

AI Coding Tools Solved the Wrong Problem and the Industry Is About to Find Out

What Actually Happened With the App Store’s 84% Surge in New Apps

37,000 Lines of AI Code Per Day Is Not Productivity. It Is Hoarding.

AI Tools Are Making Great Engineers Better and Bad Engineers Invisible