Diagram illustrating the difference between pull request review and CI systems, showing how code review and automated testing workflows operate separately.

Marina RivosecchiMar 20, 2026

9 min read

Code QualityAI Code ReviewCI/CDDeveloper Experience

PR Review and CI Are Two Different Systems

TL;DR

CI validates what the code does. Review validates what the code means. Conflating them is why bugs pass both gates undetected.

What CI Actually Checks
What Review Actually Checks
The Overlap Is Smaller Than You Think
The Category Error of Automating Review Inside CI
The Scale Problem Makes It Worse
Review Needs to Be a Separate System
How to Structure CI + Review for Maximum Signal
Two Systems, Two Jobs

Author

Marina Rivosecchi

TL;DR

CI validates what the code does. Review validates what the code means. Conflating them is why bugs pass both gates undetected.

Every PR goes through two gates before it lands in main. CI runs your tests, linters, and type checks. A reviewer reads the diff, leaves comments, and clicks approve. Both gates exist to catch problems. But they're catching fundamentally different kinds of problems, and most teams don't think about that distinction at all.

CI validates what the code does. Review validates what the code means. These are parallel systems making independent guesses about quality, and treating them as interchangeable is why so many bugs slip through both.

What CI Actually Checks

A CI pipeline is a deterministic system. You give it inputs (your code, your config, your dependencies) and it produces binary outputs: pass or fail. There's no ambiguity. Either the tests pass or they don't. Either the types check or they don't. Either the linter is happy or it isn't.

This is CI's greatest strength and its hard boundary. It can tell you that a function returns the wrong value for a given input. It can't tell you that the function shouldn't exist in the first place.

Here's what CI is genuinely good at:

Correctness against known cases. Your test suite encodes expectations about behavior. CI enforces those expectations on every commit.
Type safety. Does this code compile? Do the interfaces match? This is purely mechanical.
Style enforcement. Linters and formatters ensure consistent syntax. No human should spend review cycles on tabs vs. spaces.
Security scanning. Known vulnerability patterns, dependency audits, secret detection. Pattern matching against databases of known bad things.
Build verification. Does the artifact actually produce a deployable result?

All of these share a property: they're answerable with a boolean. The answer is yes or no, and the system doesn't need context about your business, your team, or your roadmap to produce it.

What Review Actually Checks

Code review is a judgment system. It takes the same input (the diff) but produces something completely different: an opinion about whether this change is a good idea and whether it's implemented well.

A good reviewer evaluates things that CI can't see:

Design intent. Does this abstraction make sense? Is the responsibility boundary in the right place? Should this be a separate service?
Semantic correctness. The code compiles and passes tests, but does it actually solve the right problem? A function that correctly returns the wrong thing won't trigger a CI failure.
Future maintenance cost. This works today, but will it be readable in six months? Will the next person who touches this file understand why it's structured this way?
Missing context. The diff doesn't show what's not there. A reviewer might notice that a critical edge case has no test, or that an error path is silently swallowed.
Organizational knowledge. "We tried this approach two years ago and it broke under load" is information that lives in people, not in pipelines.

None of these can be reduced to pass/fail. They require understanding what the team is trying to build, where the codebase has been, and where it's going.

The Overlap Is Smaller Than You Think

There's a narrow band where CI and review overlap. A reviewer might catch a bug that tests missed because the test coverage is thin. CI might flag a style issue that a reviewer would have mentioned. But these overlaps are incidental, not fundamental.

The real danger is the gap between them. Consider a PR that:

Passes all tests (CI says it's correct)
Passes type checking (CI says it's sound)
Passes linting (CI says it's clean)
Introduces a caching layer that will silently serve stale data under concurrent writes

CI is green across the board. The problem is semantic: the caching strategy is wrong for the concurrency model, and no test covers this because the author didn't realize it was a risk. Only a reviewer who understands the system's concurrency patterns would catch it.

This class of bug is invisible to CI by design. CI answers "does the code do what the tests say it should?" It can't answer "are the tests asking the right questions?"

The Category Error of Automating Review Inside CI

This is where the industry keeps making the same mistake. A lot of AI code review tools bolt onto your CI pipeline. They run as a GitHub Action or a pipeline step, they produce pass/fail results, and they block merges when they find "issues."

The problem? They're forcing a judgment system into a deterministic frame.

When you stuff review logic into CI, you get predictable failure modes:

False confidence. CI passed, the AI review passed, so the code must be fine. But the AI review, running as a CI step, has no more context about your system than the linter does. It's just a fancier pattern matcher.
Noise that erodes trust. CI-embedded reviewers tend to generate mountains of low-signal comments. "Consider renaming this variable." "This function could be shorter." Teams learn to ignore them, and when a real issue surfaces, it drowns in the noise.
Wrong failure model. CI blocking a merge because an AI thought a variable name was suboptimal is a category error. Review feedback isn't a gate. It's a conversation. Making it binary (pass/fail) strips out the nuance that makes review valuable.
Slower pipelines. Running an LLM-based review inside your CI adds latency to every pipeline run. Your tests might finish in two minutes, but now you're waiting another three for an AI to form opinions about your naming conventions.

Microsoft's own engineering team, which runs AI-powered review across 600,000+ PRs per month, learned this the hard way. Their internal tool doesn't replace CI or run inside it. It operates as a separate reviewer that leaves comments in the PR thread, like a human would. The AI handles repetitive checks (null references, error handling gaps) so human reviewers can focus on architecture and design intent. That separation is load-bearing.

The Scale Problem Makes It Worse

AI coding assistants are making this distinction more urgent, not less. According to data from Faros AI covering 10,000+ developers across 1,255 teams, teams with high AI adoption complete 21% more tasks and merge 98% more pull requests. But PR review time increased by 91%.

Think about what that means. The volume of code going through both systems is exploding. CI handles the volume fine because it scales with compute. You buy more runners, you run more pipelines. Review doesn't scale the same way. You can't just buy more reviewers. And the review load is getting harder, not easier, because AI-generated code requires more scrutiny, not less.

The response from much of the industry has been: "Just automate review too. Put it in CI." But this collapses the distinction between the two systems. It takes the one gate that requires judgment and turns it into another deterministic check. You end up with two CI systems and zero review.

Review Needs to Be a Separate System

If you accept that CI and review are solving different problems, the architectural implication is clear: they should be separate systems with different interfaces, different feedback loops, and different failure modes.

CI should:

Run fast (minutes, not tens of minutes)
Produce unambiguous pass/fail results
Block merges when it fails
Cover tests, types, linting, security scanning, and build verification
Never generate opinions

Review should:

Operate as a conversation, not a gate
Have context about the codebase, the team's patterns, and what's been tried before
Focus on design, semantics, and implications rather than syntax
Stay silent when there's nothing meaningful to say
Never slow down the pipeline

This is exactly the positioning Tenki takes with its AI code review. It doesn't live inside your CI pipeline. It doesn't add steps to your workflow YAML. It operates as a separate system that reviews PRs with codebase-level context, leaves comments where they matter, and stays quiet when there's nothing worth saying. Your CI stays fast and deterministic. Your review stays contextual and high-signal.

How to Structure CI + Review for Maximum Signal

If you're rethinking how these two systems work together, here's a practical framework.

Move everything deterministic into CI. If a human reviewer is spending time on formatting, naming conventions, or catching type errors that a compiler would find, your CI is underpowered. Add the linter rule. Add the type check. Add the security scanner. Every minute a reviewer spends on something CI could have caught is a minute they're not spending on design.

Don't make review a blocker in the same way CI is. CI failures are objective: the build is broken, you can't merge. Review comments are subjective: "I think this abstraction is wrong" is an argument, not a fact. The review process should be mandatory, but the outcome shouldn't be binary. A reviewer might approve with comments that the author addresses in a follow-up PR. That's healthy.

Use AI review as a force multiplier for humans, not a replacement. AI review tools are most useful when they handle the mechanical work that currently falls through the cracks between CI and human review: missing error handling, potential null dereferences, inconsistencies with existing patterns. This frees up human reviewers to focus on the things only they can evaluate: is this the right approach, does this fit our architecture, will this be maintainable.

Keep the feedback loops separate. CI feedback should appear in the checks tab. Review feedback should appear as comments. Don't mix them. When a developer sees a CI failure, they should know immediately that something is objectively broken. When they see a review comment, they should know it's a point for discussion.

Measure them differently. CI health is measurable: pass rate, flakiness, pipeline duration. Review quality is harder. Track time-to-first-review, number of review cycles, and post-merge defect rates. If defects are getting through despite green CI and approved reviews, the review process is the weak link, and the fix isn't more CI steps.

Two Systems, Two Jobs

CI and PR review aren't two versions of the same thing. They're two different systems making two different guesses about code quality. CI guesses whether the code works. Review guesses whether the code is right.

Most tools in the AI code review space have collapsed this distinction by embedding themselves in CI. That makes them easier to sell ("just add this action to your workflow") but harder to trust. The signal degrades because the tool is constrained to CI's model: binary outputs, no conversation, no codebase-level context.

The strongest setup treats each system for what it is. Let CI be fast, deterministic, and ruthlessly binary. Let review be contextual, conversational, and focused on meaning. Don't try to make one system do the other's job.