Code Review

Mar 2026

Reviewing AI-Generated Code: Different PRs Need Different Patterns

Eddie Wangengineering

How AI-generated code fails differently
Why your current review process misses these
A review checklist for AI-generated code
1. Verify every external API and method call
2. Check for duplicated patterns
3. Scrutinize the security surface
4. Test the boundaries the AI probably skipped
5. Look for license and attribution risks
6. Ask: "Does the author understand this code?"
What automated reviewers can catch that humans miss
Team policies that actually work
The real risk isn't AI-generated code. It's unreviewed AI-generated code.

You've probably noticed it already. The PRs hitting your review queue look different than they did two years ago. The code is syntactically clean, well-structured, follows naming conventions, and includes reasonable comments. It also sometimes calls APIs that don't exist.

According to Stack Overflow's 2024 Developer Survey, 63% of professional developers now use AI tools in their development workflow. That number is only going up. The code your team submits for review increasingly originates from Copilot, Cursor, or similar assistants. And the failure modes of that code are fundamentally different from what humans produce on their own.

Traditional code review heuristics don't catch these failure modes. If you're still reviewing AI-assisted PRs the same way you review human-written code, you're likely approving bugs that would have been obvious with a different lens.

How AI-generated code fails differently

Human developers make messy mistakes. They forget semicolons, misspell variable names, leave debug logging in production code. These are easy to spot. A quick scan catches them.

AI-generated code has the opposite problem. The surface is polished. The bugs hide underneath. Here are the patterns that show up repeatedly in AI-assisted PRs:

Hallucinated APIs and methods. This is the most AI-specific failure mode. The assistant generates a call to response.json().getField("name") or fs.readFileAsync() — methods that look plausible but don't exist in the library being used. The code reads well. It just won't run. Worse, some hallucinated package names actually exist on npm or PyPI, sometimes as typosquatted malware packages.

Outdated API usage. LLMs are trained on historical code. They'll generate Express.js middleware using patterns from v3 when your project runs v5. They'll use deprecated AWS SDK v2 calls when you've migrated to v3. The code compiles, the tests might even pass if they're shallow enough, and six months later you hit a deprecation wall.

Plausible but wrong logic. The AI writes a sorting function that handles the happy path perfectly but silently drops null values. It generates a retry loop that looks correct but never actually backs off. It implements pagination that works for pages 1 through 10 but breaks when the total exceeds the offset type's max value. These aren't syntax errors. They're logic errors wrapped in clean, confident-looking code.

Security anti-patterns presented with confidence. A Stanford/CCS study by Perry et al. found that developers using AI assistants wrote significantly less secure code than those working without them. The kicker: the AI-assisted group was more likely to believe their code was secure. AI-generated code will happily use MD5 for password hashing, build SQL queries with string concatenation, or store secrets in environment variables that get logged to stdout.

Excessive duplication. GitClear's 2025 analysis of 211 million changed lines of code found that copy/pasted code rose from 8.3% to 12.3% between 2021 and 2024, while refactoring lines dropped from 25% to under 10%. AI assistants are very good at generating new code. They're poor at recognizing when existing code should be reused instead. The result is codebases that grow faster in volume but slower in quality.

Why your current review process misses these

Most code review habits evolved around human-written code. Reviewers scan for style inconsistencies, check that the PR addresses the ticket, look at edge cases in the business logic, and verify test coverage. That process assumes the code author understood what they wrote and made deliberate choices.

With AI-assisted code, that assumption breaks down. The person submitting the PR may not fully understand every line the assistant generated. They accepted a suggestion that looked right, maybe tweaked a variable name, and moved on. The code passed linting and type checks. But nobody actually verified that the third-party method being called exists in the version pinned in package.json.

Here's what traditional review tends to skip:

Verifying imports actually resolve. Reviewers assume the author wouldn't import something that doesn't exist. With AI-generated code, that's no longer safe.
Checking for duplicated logic. The diff looks clean. But the same validation logic now exists in three places because the AI wrote fresh code instead of calling the existing utility.
Testing boundary conditions the AI assumed away. AI-generated tests tend to cover the happy path thoroughly and ignore the edges. If the tests pass and the diff is clean, reviewers approve.
Questioning confident-looking patterns. Human code that looks uncertain gets scrutinized. AI code that's wrong looks just as polished as AI code that's right. The visual signal that something needs a closer look is gone.

A review checklist for AI-generated code

This isn't about replacing your existing review process. It's a supplementary pass specifically designed for the failure modes AI introduces. Run through these when you suspect — or know — a PR includes AI-generated code.

1. Verify every external API and method call

For any new library import or method call, confirm it exists in the exact version your project uses. Don't just check that the package exists — check that the specific method signature matches. AI models frequently blend APIs across versions or invent convenience methods that seem like they should exist.

Quick test: search the project's node_modules (or equivalent) for the method name. If it doesn't appear in the library's type definitions or source, you've found a hallucination.

2. Check for duplicated patterns

AI assistants don't have a deep understanding of your codebase's existing abstractions. They'll write a new email validation regex instead of calling validators.isEmail() from your shared utils. When reviewing, search the codebase for similar functionality. If the PR adds something that already exists elsewhere, request that the author use the existing implementation.

3. Scrutinize the security surface

Given the Perry et al. findings, treat any security-adjacent AI-generated code with extra suspicion. Specifically look for:

Weak or deprecated cryptographic functions (MD5, SHA1 for security purposes)
SQL queries built with string interpolation instead of parameterized queries
Secrets or tokens hardcoded or logged
Missing input validation on user-facing endpoints
Overly permissive CORS configurations or file permissions

4. Test the boundaries the AI probably skipped

AI-generated code tends to handle the common case well and assume away the edges. When reviewing, mentally run through: What happens with empty input? Null? An array with 10 million items? A string containing unicode? A network timeout halfway through? If the PR doesn't address these cases and the tests don't cover them, that's a flag.

5. Look for license and attribution risks

AI models trained on open-source code can reproduce substantial portions of GPL or AGPL-licensed code verbatim. If a chunk of code in the PR looks unusually specific or complex for the task at hand, consider running it through a code similarity search. This is less about paranoia and more about protecting your project's licensing position.

6. Ask: "Does the author understand this code?"

This is the most important question, and the hardest to automate. Ask the PR author to explain a specific implementation choice. Not as a gotcha — as a genuine check. If they can explain why they chose a particular data structure or error handling approach, great. If they can't, the code needs more scrutiny regardless of how clean it looks.

What automated reviewers can catch that humans miss

Some of these checks are tedious for humans but trivial for machines. Verifying that every imported method exists in the right library version? That's a grep, not a judgment call. Scanning for known insecure patterns like SQL string concatenation or weak hashing? Pattern matching. Detecting that a new function duplicates existing logic elsewhere in the repo? Static analysis.

This is where AI code review tools create real value. Not as replacements for human reviewers, but as a first pass that catches the mechanical issues so humans can focus on the judgment calls: Is this the right approach? Does the architecture make sense? Will this be maintainable in six months?

The most effective setup pairs automated review (catching hallucinated APIs, security anti-patterns, duplication) with human review (evaluating design decisions, verifying business logic, assessing maintainability). Neither alone is sufficient for the AI-assisted codebase era.

Team policies that actually work

Banning AI tools isn't realistic and probably isn't desirable. But "use Copilot however you want" isn't a policy — it's the absence of one. Here are concrete guidelines that teams have found effective.

Require human-written tests for AI-generated implementations. If the implementation came from an AI assistant, the tests should be written by the developer. This forces the author to think through the edge cases the AI glossed over. It's also a natural check on understanding — you can't write good tests for code you don't understand.

Label AI-assisted PRs. Not as a scarlet letter, but as a signal that tells reviewers to apply the expanded checklist. A simple GitHub label or a line in the PR description is enough. This works best when it's normalized across the team — when everyone uses AI tools, there's no stigma in flagging it.

Gate security-sensitive paths. Authentication, authorization, cryptography, payment processing — for these areas, require that a security-aware reviewer signs off, and explicitly check for the patterns from the security section above. Some teams go further and require these modules to be written without AI assistance, though enforcing that is tricky.

Run automated dependency and API verification in CI. Add a CI step that checks whether all imports resolve and all referenced methods exist. TypeScript's tsc --noEmit catches most of these in typed projects, but dynamic languages need additional tooling. This is the single cheapest defense against hallucinated APIs.

Track code duplication metrics over time. Set up a duplication threshold in your CI pipeline and monitor the trend. If copy/paste percentages start climbing after your team adopts AI tools, you'll want to catch that early rather than discovering it during a painful refactoring sprint six months later.

The real risk isn't AI-generated code. It's unreviewed AI-generated code.

AI coding assistants are a productivity multiplier. That's not in question. But the productivity gains disappear if the generated code introduces subtle bugs, security vulnerabilities, or maintenance debt that review should have caught.

The fix isn't complicated. Acknowledge that AI-generated code has different failure modes. Adjust your review checklist. Automate the mechanical checks. Keep the human judgment for the parts that matter most. Your review process adapted when your team started using TypeScript, or when you moved to microservices. This is the same kind of adaptation, just for a different shift in how code gets written.