Namespace Runs the Agent. Tenki Reviews What It Wrote.

Eddie Wangengineering

What Namespace Devboxes actually solve
What Devboxes don't include
Where Tenki's review layer sits
Execution vs. output: two different problems
Why platform independence matters
What a combined setup looks like
The volume problem is coming

On May 19, Namespace shipped native support for running Claude Managed Agents inside Devboxes. It's a clean product move: give each agent session a fresh VM, defined by a Dockerfile, with Tailscale networking and ephemeral lifecycle baked in. The agent gets a real environment instead of a generic sandbox. Builds match CI. Internal services are reachable.

But here's the thing Namespace doesn't try to solve: what happens after the agent finishes? The agent writes code, opens a PR, and at that point the execution environment is gone. The Devbox is torn down. What remains is a diff sitting in a pull request, waiting for someone or something to decide whether it should merge.

That's where Tenki operates. Not inside the sandbox. At the CI boundary, reviewing the diff the agent produced.

What Namespace Devboxes actually solve

Anthropic's default sandbox for Claude Managed Agents lets you declare pip, npm, or apt packages. Fine for a script. Not great when your codebase needs a specific protobuf compiler, an internal code generator, or a base image that matches production.

Namespace's pitch is straightforward: replace the generic sandbox with your own Dockerfile-defined environment. The agent boots into the same toolchain your team uses in CI. Three capabilities stand out:

Custom Dockerfiles. Pin your compiler, bundle internal tools, match your production base image. The environment is defined in code and version-controlled. You build it once with the Devbox CLI, and every agent session boots from that pre-built image.
Tailscale private networking. The Devbox joins your tailnet at boot using a short-lived OIDC token. The agent can hit internal APIs, pull from private registries, and run tests against real infrastructure instead of stubs that drift.
Ephemeral by default. Each task gets a fresh VM. No state bleeds between runs. Parallel workstreams get fully isolated environments without branch conflicts or port collisions.

This matters because agent quality tracks environment quality. If a build takes three minutes because the toolchain doesn't match CI, the agent waits three minutes per iteration. If tests pass in the sandbox but fail in your pipeline because the environments diverge, you've gained nothing. Namespace is solving a real problem here.

What Devboxes don't include

Namespace gives the agent a controlled place to run. It doesn't review what the agent writes.

That's not a gap in their product. It's a different problem. Namespace is explicitly positioning as "the compute layer for code." They handle orchestration infrastructure: where the agent runs, what tools it has, how fast builds execute, and what networks it can reach. Once the agent finishes and pushes a branch, Namespace's job is done.

But from a team's perspective, the agent's output still needs scrutiny. An agent running in a perfect environment can still produce code with subtle logic errors, security issues, or patterns that don't match the codebase's conventions. The execution environment guarantees that tests ran against the right toolchain. It doesn't guarantee the code is good.

Most teams handle this today by assigning a human reviewer. That works at low volume. When you're running multiple agent sessions per day, each producing PRs that need review, the human bottleneck returns. You've automated the writing but not the review.

Where Tenki's review layer sits

Tenki runs as a GitHub Actions check. When a PR is opened, Tenki reviews the diff. It doesn't care whether the code was written by a human, a Claude agent running in a Namespace Devbox, an OpenAI Codex session, or someone typing in vim at 2 AM. The input is the same: a set of changed files in a pull request.

This is intentional. By operating at the CI boundary rather than inside any specific agent's sandbox, Tenki stays independent of the execution platform. You don't need to install anything in your Devbox image. You don't need to configure a webhook from Anthropic's orchestration layer. You add Tenki to your GitHub Actions workflow, and every PR that triggers that workflow gets reviewed.

The review is context-aware. Tenki understands the codebase and catches issues that pass tests but would trip up a careful human reviewer: accidental API changes, missing error handling, patterns inconsistent with the rest of the repo. At $0.50 per review, it's cheap enough to run on every PR without thinking about it.

Execution vs. output: two different problems

The clearest way to think about this is as two distinct layers in an agent-assisted workflow.

The execution layer is where the agent runs. It determines what tools are available, how fast builds complete, what services the agent can reach, and whether the environment matches production. Namespace Devboxes handle this. Anthropic's own sandbox handles a simpler version of it. E2B, Gitpod, and various DIY Docker setups handle it too. The execution layer answers: can the agent do its work correctly?

The review layer is what happens after the agent pushes its changes. It determines whether the output is safe, correct, and consistent with the rest of the codebase. This layer answers a different question: should this code merge?

You need both. A great execution environment that produces unreviewed merges is risky. A great review layer that catches bugs but only sees code from a broken sandbox is noisy. Namespace handles the first part. Tenki handles the second.

Why platform independence matters

Teams rarely standardize on a single agent platform for long. You might use Claude Managed Agents for backend refactoring tasks, Codex for quick fixes, and a custom agent for migration scripts. Each might run in a different sandbox: Namespace Devboxes for one, GitHub's own runners for another, a local Docker container for a third.

A review gate that's tied to one execution platform only covers a slice of your agent-generated code. You'd need a different review integration for each sandbox, or you'd have some PRs going through review and others skipping it.

Tenki avoids this by sitting at the one point every agent's code must pass through: the pull request. It doesn't matter if the code came from a Devbox in Frankfurt or a laptop in Denver. If there's a PR, Tenki reviews it. One integration point, consistent coverage across all your agents.

What a combined setup looks like

If you're using both, the flow is simple:

You define a Dockerfile for your agent's environment. Pin your toolchain, include internal dependencies, set up Tailscale access.
Claude Managed Agent spins up a Namespace Devbox, clones your repo, runs the task, and pushes a branch.
The agent opens a PR. GitHub triggers your CI workflow.
Tenki runs as a step in that workflow, reviews the diff, and posts findings as PR comments.
The Devbox is already torn down. The review happens against the code, not the environment.

No changes to your Devbox image. No Namespace-specific configuration on the Tenki side. The two tools don't need to know about each other because they operate at different stages of the workflow.

The volume problem is coming

Here's the trajectory. Tools like Namespace are making it easier to run agents at scale: better environments, faster builds, more isolation. Anthropic is making the agents themselves more capable. The inevitable result is more agent-generated PRs per day, not fewer.

Teams that set up their execution layer now (Devboxes, custom images, private networking) will hit the review bottleneck next. A senior engineer can review maybe 5-10 PRs per day with real attention. An agent can produce that many in an hour. Without an automated review gate, you either slow down your agents to match human review capacity, or you start rubber-stamping merges. Neither is great.

Adding Tenki as a CI review gate means your agents can keep producing PRs at whatever pace the execution layer supports, and every one of those PRs gets reviewed before it reaches main. The review doesn't replace human judgment for architecture-level decisions, but it catches the classes of issues that slow down code review in the first place: missed edge cases, inconsistent patterns, security oversights.

Namespace solved where agents run. That was the first bottleneck. The next one is reviewing what they produce. Different problem, different layer, and you'll want both covered before you let agents run at full speed.