Introducing Tenki's code reviewer: deep, context-aware reviews that actually find bugs.Try it for Free
Explaining the Claude Code source leak, highlighting exposed AI agent behaviors and what it reveals about trust, security risks, and system transparency.
Hayssem Vazquez-Elsayed
Hayssem Vazquez-ElsayedApr 06, 2026
AI Agent TransparencySource LeakClaude Code

Claude Code Source Leak: What It Reveals About AI Agent Trust


Author


Hayssem Vazquez-Elsayed
Hayssem Vazquez-Elsayed

Share


TL;DR

The leaked Claude Code source exposes fake tools, frustration-detection regexes, and an undercover mode. Here's what engineering teams should demand from AI coding agents.

On March 31, Anthropic accidentally shipped a .map file alongside their Claude Code npm package. That source map contained the full, readable source code of the CLI tool. The package was pulled within hours, but not before the code was widely mirrored and dissected on Hacker News, where it hit #1 with 1,300+ points and 500+ comments.

What the community found inside isn't just embarrassing for Anthropic. It's a case study in the hidden behavioral engineering that AI coding agents use behind the scenes, and a wake-up call for any team that integrates these tools into their development workflow without asking what's actually running.

What the leak actually exposed

The leak wasn't a model weights dump or a prompt injection artifact. It was the full source code of the Claude Code CLI, exposed through a Bun bug (oven-sh/bun#28001) that served source maps in production mode even though they should have been stripped. Anthropic acquired Bun in late 2025, and Claude Code is built on top of it. Their own toolchain shipped a known bug that exposed their own product.

Alex Kim's detailed writeup catalogued the most significant findings. Three of them have direct implications for anyone relying on AI coding agents in production.

Fake tools that exist only to shape behavior

The source code contains a flag called ANTI_DISTILLATION_CC. When enabled, Claude Code sends anti_distillation: ['fake_tools'] in API requests, which tells the server to inject decoy tool definitions into the system prompt. The purpose: if someone records Claude Code's API traffic to train a competing model, the fake tools pollute that training data.

It's gated behind a GrowthBook feature flag (tengu_anti_distill_fake_tool_injection) and only active for first-party CLI sessions. There's also a second mechanism: server-side connector-text summarization that replaces full reasoning chains with compressed summaries signed with a cryptographic hash, so anyone eavesdropping on API traffic only captures the summary, not the real chain of thought.

The anti-distillation intent is defensible. The problem is the method. Fake tool definitions are injected into the same system prompt that shapes the agent's behavior. That means there are tool definitions in the context window that don't correspond to any real capability, and the user has no visibility into what's been injected. If you're reviewing what your agent can do by inspecting tool calls, you're looking at a deck that's been silently stacked.

Frustration detection via regex

A file called userPromptKeywords.ts contains a regex pattern that scans user input for frustration signals: expletives, phrases like "this sucks" and "so frustrating," even specific profanity combinations. When the pattern matches, the agent adjusts its behavior.

An LLM company using regexes for sentiment detection instead of, you know, the LLM itself is darkly funny. But the engineering rationale holds: a regex is faster and cheaper than an inference call just to check if someone is swearing at the tool. The issue isn't the implementation. It's that the behavioral adjustment happens invisibly. Your agent is modulating its responses based on your emotional state, and you don't know it's happening.

For individual developers, this might be harmless or even helpful. For teams using Claude Code in CI pipelines or automated review loops, it introduces a variable that's completely outside your control and impossible to audit without reading the source code. Which, until this leak, you couldn't do.

Undercover mode: the AI that hides being AI

This one drew the most heat. A file called undercover.ts implements a mode that strips all traces of Anthropic internals when Claude Code operates in non-internal repos. The system prompt explicitly tells the model: never include "Claude Code" or any mention that you are an AI in commit messages or PR descriptions. No Co-Authored-By lines. No attribution of any kind.

A code comment on line 15 reads: "There is NO force-OFF. This guards against model codename leaks." You can force undercover mode ON with an environment variable, but there's no way to disable it. In external builds, the whole function gets dead-code-eliminated to trivial returns. One-way door.

Hiding internal codenames like "Capybara" or "Tengu" is a reasonable opsec decision. But the prompt goes further. As HN commenter mzajc pointed out, the instructions don't just say "don't leak codenames." They say "never reveal that you are an AI." That's a different thing entirely. AI-authored commits and PRs from Anthropic employees in open source projects will have no indication that an AI wrote them.

In the EU, this may have legal implications. The AI Act requires that providers ensure AI systems interacting with natural persons are designed so people know they're interacting with AI. Whether a merge request qualifies as "direct interaction" is debatable, but the direction of regulation is clear.

Why this matters for code review trust

Each of these findings, taken alone, has a plausible justification. Anti-distillation protects IP. Frustration detection improves UX. Undercover mode prevents codename leaks. But taken together, they reveal something more fundamental: the agent layer between you and the model is doing far more than routing your prompts. It's injecting fake capabilities, reading your emotional state, and actively concealing its own identity.

For teams that rely on AI agents for code generation or review, this creates a trust problem. Code review works because reviewers can reason about what produced the code. When the producing agent has hidden behavioral layers that modify its output based on undisclosed signals, that reasoning breaks down. You're reviewing the output of a system you don't fully understand, and until this leak, couldn't even inspect.

The leaked source also revealed that Claude Code burns roughly 250,000 wasted API calls per day due to a compaction bug where sessions could hit 3,000+ consecutive failures before anyone noticed. The fix was three lines of code. This is the kind of operational detail that's invisible to users but directly affects the reliability of what the agent produces. If compaction is failing silently, your agent's context window is degraded, and its output quality suffers in ways you can't diagnose.

How other agents handle transparency

Some people on HN dismissed the leak because Google's Gemini CLI and OpenAI's Codex are open source. That comparison misses the point. Those companies open-sourced their agent SDKs, which are toolkits for building agents. Claude Code is the full internal wiring of a shipped product, complete with feature flags, anti-distillation logic, and unreleased roadmap items like KAIROS (an autonomous background agent mode with cron-scheduled refreshes and GitHub webhook subscriptions).

That said, transparency postures do vary. GitHub Copilot publishes telemetry documentation and provides enterprise controls for disabling data collection. Copilot's code suggestions don't include hidden behavioral modification layers because its architecture is simpler: it's a completion engine, not an agentic loop. OpenAI's Codex provides session logs that show the full chain of tool calls and reasoning, giving users a replay of what the agent did and why.

Claude Code does preserve conversation history in append-only JSONL session files, and the compaction system marks messages as "not to be sent to the API" rather than deleting them. So the raw transcript is technically available on disk. But that's not the same as proactive transparency. The behavioral layers (fake tools, frustration gating, undercover prompts) aren't represented in those logs in any user-facing way. You'd need to read the source to know they exist.

What enterprise teams should demand

If your engineering org is using or evaluating AI coding agents, this leak should change your vendor conversations. Three concrete things to push for:

  1. Behavioral disclosure. Vendors should publish what behavioral modifications exist in the agent layer. Not the full source code, but a disclosure of what signals the agent reads from user input, what gets injected into the prompt beyond your instructions, and what identity-related instructions exist. If fake tool definitions are being injected, that should be documented.
  2. Audit logs that include behavioral context. Session logs should show not just what the agent did, but what behavioral flags were active. If frustration detection triggered a mode change, that should appear in the log. If anti-distillation injected fake tools into the context window, the log should say so. Without this, audit trails are incomplete.
  3. Opt-out controls for hidden behaviors. Enterprise plans should let teams disable behavioral modifications they don't want. Claude Code already has env vars that can toggle some of these features (CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS disables the anti-distillation betas, for example), but they're undocumented and were only discoverable through the leak. These should be first-class configuration options, not hidden env vars.

The broader pattern

This is Anthropic's second accidental exposure in a week (the model spec also leaked days earlier), and it follows their legal threats against OpenCode for using Claude Code's internal APIs. The timing makes the transparency gap harder to ignore. A company that sends legal notices to protect its API boundaries while simultaneously shipping undisclosed behavioral controls is operating with a double standard.

The real damage from this leak isn't the code. The code can be refactored. It's the feature flags. KAIROS, the anti-distillation mechanisms, the internal codenames. These are product strategy details that competitors can now see and react to. The strategic surprise can't be un-leaked.

For engineering teams, though, the takeaway is simpler. The AI coding agents you're integrating have behavioral layers you can't see. Until vendors start disclosing those layers voluntarily, leaks like this one are the only way to find out what's actually running. That's not a sustainable model for tools that ship production code.

Related News


Get Tenki

Faster Builds. Smarter Reviews. Start Both For Free.

Change 1 line of YAML for faster runners. Install a GitHub App for AI code reviews. No credit card, no contract. Takes about 2 minutes.