
AI Made Writing Code Cheap. Judgment Didn't Scale.

CI validates what the code does. Review validates what the code means. Conflating them is why bugs pass both gates undetected.
Every PR goes through two gates before it lands in main. CI runs your tests, linters, and type checks. A reviewer reads the diff, leaves comments, and clicks approve. Both gates exist to catch problems. But they're catching fundamentally different kinds of problems, and most teams don't think about that distinction at all.
CI validates what the code does. Review validates what the code means. These are parallel systems making independent guesses about quality, and treating them as interchangeable is why so many bugs slip through both.
A CI pipeline is a deterministic system. You give it inputs (your code, your config, your dependencies) and it produces binary outputs: pass or fail. There's no ambiguity. Either the tests pass or they don't. Either the types check or they don't. Either the linter is happy or it isn't.
This is CI's greatest strength and its hard boundary. It can tell you that a function returns the wrong value for a given input. It can't tell you that the function shouldn't exist in the first place.
Here's what CI is genuinely good at:
All of these share a property: they're answerable with a boolean. The answer is yes or no, and the system doesn't need context about your business, your team, or your roadmap to produce it.
Code review is a judgment system. It takes the same input (the diff) but produces something completely different: an opinion about whether this change is a good idea and whether it's implemented well.
A good reviewer evaluates things that CI can't see:
None of these can be reduced to pass/fail. They require understanding what the team is trying to build, where the codebase has been, and where it's going.
There's a narrow band where CI and review overlap. A reviewer might catch a bug that tests missed because the test coverage is thin. CI might flag a style issue that a reviewer would have mentioned. But these overlaps are incidental, not fundamental.
The real danger is the gap between them. Consider a PR that:
CI is green across the board. The problem is semantic: the caching strategy is wrong for the concurrency model, and no test covers this because the author didn't realize it was a risk. Only a reviewer who understands the system's concurrency patterns would catch it.
This class of bug is invisible to CI by design. CI answers "does the code do what the tests say it should?" It can't answer "are the tests asking the right questions?"
This is where the industry keeps making the same mistake. A lot of AI code review tools bolt onto your CI pipeline. They run as a GitHub Action or a pipeline step, they produce pass/fail results, and they block merges when they find "issues."
The problem? They're forcing a judgment system into a deterministic frame.
When you stuff review logic into CI, you get predictable failure modes:
Microsoft's own engineering team, which runs AI-powered review across 600,000+ PRs per month, learned this the hard way. Their internal tool doesn't replace CI or run inside it. It operates as a separate reviewer that leaves comments in the PR thread, like a human would. The AI handles repetitive checks (null references, error handling gaps) so human reviewers can focus on architecture and design intent. That separation is load-bearing.
AI coding assistants are making this distinction more urgent, not less. According to data from Faros AI covering 10,000+ developers across 1,255 teams, teams with high AI adoption complete 21% more tasks and merge 98% more pull requests. But PR review time increased by 91%.
Think about what that means. The volume of code going through both systems is exploding. CI handles the volume fine because it scales with compute. You buy more runners, you run more pipelines. Review doesn't scale the same way. You can't just buy more reviewers. And the review load is getting harder, not easier, because AI-generated code requires more scrutiny, not less.
The response from much of the industry has been: "Just automate review too. Put it in CI." But this collapses the distinction between the two systems. It takes the one gate that requires judgment and turns it into another deterministic check. You end up with two CI systems and zero review.
If you accept that CI and review are solving different problems, the architectural implication is clear: they should be separate systems with different interfaces, different feedback loops, and different failure modes.
CI should:
Review should:
This is exactly the positioning Tenki takes with its AI code review. It doesn't live inside your CI pipeline. It doesn't add steps to your workflow YAML. It operates as a separate system that reviews PRs with codebase-level context, leaves comments where they matter, and stays quiet when there's nothing worth saying. Your CI stays fast and deterministic. Your review stays contextual and high-signal.
If you're rethinking how these two systems work together, here's a practical framework.
Move everything deterministic into CI. If a human reviewer is spending time on formatting, naming conventions, or catching type errors that a compiler would find, your CI is underpowered. Add the linter rule. Add the type check. Add the security scanner. Every minute a reviewer spends on something CI could have caught is a minute they're not spending on design.
Don't make review a blocker in the same way CI is. CI failures are objective: the build is broken, you can't merge. Review comments are subjective: "I think this abstraction is wrong" is an argument, not a fact. The review process should be mandatory, but the outcome shouldn't be binary. A reviewer might approve with comments that the author addresses in a follow-up PR. That's healthy.
Use AI review as a force multiplier for humans, not a replacement. AI review tools are most useful when they handle the mechanical work that currently falls through the cracks between CI and human review: missing error handling, potential null dereferences, inconsistencies with existing patterns. This frees up human reviewers to focus on the things only they can evaluate: is this the right approach, does this fit our architecture, will this be maintainable.
Keep the feedback loops separate. CI feedback should appear in the checks tab. Review feedback should appear as comments. Don't mix them. When a developer sees a CI failure, they should know immediately that something is objectively broken. When they see a review comment, they should know it's a point for discussion.
Measure them differently. CI health is measurable: pass rate, flakiness, pipeline duration. Review quality is harder. Track time-to-first-review, number of review cycles, and post-merge defect rates. If defects are getting through despite green CI and approved reviews, the review process is the weak link, and the fix isn't more CI steps.
CI and PR review aren't two versions of the same thing. They're two different systems making two different guesses about code quality. CI guesses whether the code works. Review guesses whether the code is right.
Most tools in the AI code review space have collapsed this distinction by embedding themselves in CI. That makes them easier to sell ("just add this action to your workflow") but harder to trust. The signal degrades because the tool is constrained to CI's model: binary outputs, no conversation, no codebase-level context.
The strongest setup treats each system for what it is. Let CI be fast, deterministic, and ruthlessly binary. Let review be contextual, conversational, and focused on meaning. Don't try to make one system do the other's job.

AI Made Writing Code Cheap. Judgment Didn't Scale.

From Provision to Shutdown: The Lifecycle of a Tenki Runner

What Are GitHub Actions Runners? A Complete Beginner’s Guide to CI/CD Workflows
Get Tenki
Change 1 line of YAML for faster runners. Install a GitHub App for AI code reviews. No credit card, no contract. Takes about 2 minutes.