What Actually Happened With Claude Code's Token Furnace Bug

Two caching bugs hit Claude Code simultaneously, silently inflating token costs by 10-20x. Users burned through monthly quotas in hours. Here is what broke, why it took days to acknowledge, and what every engineering team should learn from it.

Abhishek Sharma· Founder, Fordel Studios

April 12, 2026Updated May 8, 20268 min read

What Actually Happened With Claude Code's Token Furnace Bug

If you use Claude Code on a paid plan, you probably felt this one. Your quota vanished. Your sessions died early. And for days, nobody at Anthropic said a word.

What happened with Claude Code's caching bugs?

Starting around March 23, 2026, users across all paid Claude Code tiers — Pro, Max 5x, Max 20x — began reporting catastrophic usage limit drain. Sessions that normally lasted eight hours were dying in one. Single prompts consumed 3-7% of an entire session quota. Max 20x subscribers, paying for the highest tier, reported usage jumping from 21% to 100% in a single interaction.

The complaints hit Reddit, GitHub, and Discord simultaneously. GitHub issue #41930 became the central thread, documenting what users called "widespread abnormal usage limit drain across all paid tiers since March 23, 2026." The title also noted: "no formal communication issued."

That last part matters. We will come back to it.

···

What were the two bugs that broke caching?

On April 2, Anthropic released a postmortem confirming two distinct caching bugs had hit production simultaneously. But the bugs had actually been identified days earlier — not by Anthropic, but by a community member who reverse-engineered the Claude Code standalone binary.

Bug one: the billing sentinel collision

Claude Code ships as a standalone binary built on a custom fork of the Bun JavaScript runtime. This fork performs a string replacement on every API request, targeting a billing attribution sentinel — essentially a marker that tells Anthropic's backend how to attribute usage for billing purposes.

The problem: if your conversation history happened to contain billing-related terms — words like "cost," "billing," "quota," or "usage" — the string replacement could hit the wrong position in the request payload. When that happened, it broke the cache prefix. The entire conversation context was forced into an uncached token rebuild.

10-20xToken cost increaseUncached tokens versus cached tokens against the usage quota

Think about what this means. Developers discussing costs in their codebase — maybe reviewing a pricing module, or debugging a billing API — were the most likely to trigger a bug that silently multiplied their own costs. That is not just ironic. It is the kind of interaction bug that is almost impossible to catch in unit testing because it depends on the content of production conversations.

Bug two: the resume flag cache invalidation

The second bug was simpler but equally destructive. When users resumed a previous Claude Code session using --resume or --continue flags, tool attachments were injected at a different position in the request payload than in fresh sessions. This position change invalidated the entire conversation cache.

Every resumed session started from scratch. The system prompt might get cached, but everything else — your entire conversation history, all tool outputs, every file read — was reprocessed as if Claude had never seen it before. For long sessions with substantial context, this meant thousands of tokens being billed at full uncached rates on every single interaction.

“Downgrading to 2.1.34 made a very noticeable difference.”

Claude Code user confirming the version-specific nature of the bug

The interaction between these two bugs was what made the incident so severe. Bug one was content-dependent and intermittent. Bug two was deterministic — every resumed session was affected. Together, they created a system that was doing maximum computational work on every interaction while billing accordingly.

···

Why did it take a week to acknowledge?

This is where the incident becomes a case study in communication failure rather than just engineering failure.

Users began reporting anomalies on March 23. GitHub issues piled up. Reddit threads gained traction. Discord was full of confused developers watching their quotas evaporate. For nearly eight days, there was no official communication from Anthropic.

On March 31, Lydia Hallie, product lead of Claude Code, posted on X: "We're aware people are hitting usage limits in Claude Code way faster than expected. Actively investigating, will share more when we have an update!"

That was eight days after users first reported the problem. The full postmortem did not arrive until April 2.

Timeline of the incident

March 23: First user reports of abnormal quota drain appear on GitHub and Reddit
March 23-30: Community escalation across multiple platforms, no official response
March 30: Community member publishes reverse-engineering analysis identifying both bugs
March 31: Anthropic's first public acknowledgment via X/Twitter
April 2: Full postmortem published, quota extensions promised for affected users

The minimum viable response time for a billing anomaly should be hours, not days. When users are being charged for something that is broken on your end, silence is not just bad communication — it is a trust problem. Every hour of silence is an hour where a paying customer is hemorrhaging money they should not be spending.

···

What should Anthropic have done differently?

The postmortem itself was solid engineering communication. Anthropic identified both bugs, explained the interaction, acknowledged the impact, and committed to quota extensions for affected users. The engineering response, once it happened, was competent.

But three decisions made this incident worse than it needed to be.

First: billing anomaly detection should be automated. If a cohort of users suddenly starts consuming tokens at 10-20x their historical rate, that should trigger an alert within hours, not wait for user complaints. Anthropic has the telemetry. Every API call is logged. A simple statistical anomaly detector on per-user token consumption rates would have caught this on day one.

Second: the custom Bun fork introduced unnecessary risk. Shipping a billing-critical string replacement inside a forked runtime — where it runs on every API request and interacts with arbitrary user content — is a design decision that trades debuggability for convenience. The billing sentinel logic should live server-side, where it cannot be affected by client-side content and where it can be monitored independently.

Third: the communication delay was a choice. Anthropic chose to wait until they had a full answer before saying anything substantive. That is the wrong call when users are actively losing money. The right response is immediate acknowledgment, a temporary mitigation (even if it is just "we recommend starting fresh sessions instead of resuming"), and then the full postmortem when ready.

···

What can engineering teams learn from this?

This incident is useful beyond the Anthropic ecosystem. Every team building on LLM APIs faces similar risks. Here is what generalizes.

Cache invalidation is a billing event. In traditional systems, a cache miss means slower performance. In token-based billing systems, a cache miss means higher costs. If your architecture relies on prompt caching to keep costs predictable, then cache invalidation is not just a performance bug — it is a billing bug. Monitor it like one. Track cache hit rates as a financial metric, not just a performance metric.

Content-dependent bugs are the hardest to catch. Bug one only triggered when conversation content contained specific terms. This class of bug — where the payload itself determines whether the system behaves correctly — is almost invisible in testing because test conversations rarely look like production conversations. If your system processes arbitrary content and makes decisions based on string matching or replacement, fuzz it with real-world data.

Forked runtimes carry compound risk. Every fork is a promise to maintain a divergent codebase. When that fork handles billing-critical logic and interacts with untrusted input, you are combining supply chain risk with financial risk. Unless there is a compelling technical reason for the fork, keep billing logic server-side and runtime-agnostic.

Your community will reverse-engineer your bugs faster than you will. The Claude Code source had already leaked via an npm packaging error weeks earlier. A community member used that knowledge plus MITM analysis to identify both bugs before Anthropic's own team. This is not embarrassing — it is predictable. In 2026, your binary is not a black box. Design your incident response assuming that external analysis will outpace internal triage.

“The minimum viable response to a billing anomaly is not a root cause analysis. It is the words: we know, we are looking, and here is what you can do right now.”

···

Is this a reason to stop using Claude Code?

No. Every developer tool has incidents. VS Code has had extensions that silently consumed CPU. GitHub Copilot has had billing anomalies of its own. The question is not whether incidents happen — it is how the vendor responds.

Anthropic's engineering response was eventually thorough. The postmortem was honest. The quota extensions were the right call. But the eight-day communication gap is the part that should concern you. If you are building your workflow around a tool with consumption-based billing, you need to monitor your own usage independently. Do not rely on the vendor to tell you when their system is overcharging you.

Set up your own token tracking. Compare daily consumption against historical baselines. Alert on anomalies. And if a vendor takes more than 48 hours to acknowledge a billing anomaly that your own monitoring has detected — that tells you something about their operational maturity that no feature list will.

Part of: Fordel pillar guide

AI Agent Observability in Production

Fordel's pillar guide to agent observability — tracing, eval, debugging, and the cost economics of monitoring agents in production.

Read the full guide →

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

Keep Reading

All articles

What Actually Happened With Claude Code's Token Furnace Bug

What happened with Claude Code's caching bugs?

What were the two bugs that broke caching?

Bug one: the billing sentinel collision

Bug two: the resume flag cache invalidation

Why did it take a week to acknowledge?

What should Anthropic have done differently?

What can engineering teams learn from this?

Is this a reason to stop using Claude Code?

AI Agent Observability in Production

Related articles

What Actually Happened With the Claude Code Git Reset Incident

We Built a Multi-Tenant AI Pipeline and Here's What Actually Happened

The Claude Code Leak Proved What We Already Knew: Your AI Dev Tools Are Held Together With Duct Tape

OpenAI Just Shipped Codex as a Claude Code Plugin — Here's What It Actually Means

GitHub Copilot vs Claude Code in 2026: After Crunching the Numbers, Here's the Truth