If you use Claude Code on a paid plan, you probably felt this one. Your quota vanished. Your sessions died early. And for days, nobody at Anthropic said a word.
What happened with Claude Code's caching bugs?
Starting around March 23, 2026, users across all paid Claude Code tiers — Pro, Max 5x, Max 20x — began reporting catastrophic usage limit drain. Sessions that normally lasted eight hours were dying in one. Single prompts consumed 3-7% of an entire session quota. Max 20x subscribers, paying for the highest tier, reported usage jumping from 21% to 100% in a single interaction.
The complaints hit Reddit, GitHub, and Discord simultaneously. GitHub issue #41930 became the central thread, documenting what users called "widespread abnormal usage limit drain across all paid tiers since March 23, 2026." The title also noted: "no formal communication issued."
That last part matters. We will come back to it.
What were the two bugs that broke caching?
On April 2, Anthropic released a postmortem confirming two distinct caching bugs had hit production simultaneously. But the bugs had actually been identified days earlier — not by Anthropic, but by a community member who reverse-engineered the Claude Code standalone binary.
Bug one: the billing sentinel collision
Claude Code ships as a standalone binary built on a custom fork of the Bun JavaScript runtime. This fork performs a string replacement on every API request, targeting a billing attribution sentinel — essentially a marker that tells Anthropic's backend how to attribute usage for billing purposes.
The problem: if your conversation history happened to contain billing-related terms — words like "cost," "billing," "quota," or "usage" — the string replacement could hit the wrong position in the request payload. When that happened, it broke the cache prefix. The entire conversation context was forced into an uncached token rebuild.
Think about what this means. Developers discussing costs in their codebase — maybe reviewing a pricing module, or debugging a billing API — were the most likely to trigger a bug that silently multiplied their own costs. That is not just ironic. It is the kind of interaction bug that is almost impossible to catch in unit testing because it depends on the content of production conversations.
Bug two: the resume flag cache invalidation
The second bug was simpler but equally destructive. When users resumed a previous Claude Code session using --resume or --continue flags, tool attachments were injected at a different position in the request payload than in fresh sessions. This position change invalidated the entire conversation cache.
Every resumed session started from scratch. The system prompt might get cached, but everything else — your entire conversation history, all tool outputs, every file read — was reprocessed as if Claude had never seen it before. For long sessions with substantial context, this meant thousands of tokens being billed at full uncached rates on every single interaction.
“Downgrading to 2.1.34 made a very noticeable difference.”
The interaction between these two bugs was what made the incident so severe. Bug one was content-dependent and intermittent. Bug two was deterministic — every resumed session was affected. Together, they created a system that was doing maximum computational work on every interaction while billing accordingly.
Why did it take a week to acknowledge?
This is where the incident becomes a case study in communication failure rather than just engineering failure.
Users began reporting anomalies on March 23. GitHub issues piled up. Reddit threads gained traction. Discord was full of confused developers watching their quotas evaporate. For nearly eight days, there was no official communication from Anthropic.
On March 31, Lydia Hallie, product lead of Claude Code, posted on X: "We're aware people are hitting usage limits in Claude Code way faster than expected. Actively investigating, will share more when we have an update!"
That was eight days after users first reported the problem. The full postmortem did not arrive until April 2.
- March 23: First user reports of abnormal quota drain appear on GitHub and Reddit
- March 23-30: Community escalation across multiple platforms, no official response
- March 30: Community member publishes reverse-engineering analysis identifying both bugs
- March 31: Anthropic's first public acknowledgment via X/Twitter
- April 2: Full postmortem published, quota extensions promised for affected users
The minimum viable response time for a billing anomaly should be hours, not days. When users are being charged for something that is broken on your end, silence is not just bad communication — it is a trust problem. Every hour of silence is an hour where a paying customer is hemorrhaging money they should not be spending.
What should Anthropic have done differently?
The postmortem itself was solid engineering communication. Anthropic identified both bugs, explained the interaction, acknowledged the impact, and committed to quota extensions for affected users. The engineering response, once it happened, was competent.
But three decisions made this incident worse than it needed to be.
First: billing anomaly detection should be automated. If a cohort of users suddenly starts consuming tokens at 10-20x their historical rate, that should trigger an alert within hours, not wait for user complaints. Anthropic has the telemetry. Every API call is logged. A simple statistical anomaly detector on per-user token consumption rates would have caught this on day one.
Second: the custom Bun fork introduced unnecessary risk. Shipping a billing-critical string replacement inside a forked runtime — where it runs on every API request and interacts with arbitrary user content — is a design decision that trades debuggability for convenience. The billing sentinel logic should live server-side, where it cannot be affected by client-side content and where it can be monitored independently.
Third: the communication delay was a choice. Anthropic chose to wait until they had a full answer before saying anything substantive. That is the wrong call when users are actively losing money. The right response is immediate acknowledgment, a temporary mitigation (even if it is just "we recommend starting fresh sessions instead of resuming"), and then the full postmortem when ready.
What can engineering teams learn from this?
This incident is useful beyond the Anthropic ecosystem. Every team building on LLM APIs faces similar risks. Here is what generalizes.
Cache invalidation is a billing event. In traditional systems, a cache miss means slower performance. In token-based billing systems, a cache miss means higher costs. If your architecture relies on prompt caching to keep costs predictable, then cache invalidation is not just a performance bug — it is a billing bug. Monitor it like one. Track cache hit rates as a financial metric, not just a performance metric.
Content-dependent bugs are the hardest to catch. Bug one only triggered when conversation content contained specific terms. This class of bug — where the payload itself determines whether the system behaves correctly — is almost invisible in testing because test conversations rarely look like production conversations. If your system processes arbitrary content and makes decisions based on string matching or replacement, fuzz it with real-world data.
Forked runtimes carry compound risk. Every fork is a promise to maintain a divergent codebase. When that fork handles billing-critical logic and interacts with untrusted input, you are combining supply chain risk with financial risk. Unless there is a compelling technical reason for the fork, keep billing logic server-side and runtime-agnostic.
Your community will reverse-engineer your bugs faster than you will. The Claude Code source had already leaked via an npm packaging error weeks earlier. A community member used that knowledge plus MITM analysis to identify both bugs before Anthropic's own team. This is not embarrassing — it is predictable. In 2026, your binary is not a black box. Design your incident response assuming that external analysis will outpace internal triage.
“The minimum viable response to a billing anomaly is not a root cause analysis. It is the words: we know, we are looking, and here is what you can do right now.”
Is this a reason to stop using Claude Code?
No. Every developer tool has incidents. VS Code has had extensions that silently consumed CPU. GitHub Copilot has had billing anomalies of its own. The question is not whether incidents happen — it is how the vendor responds.
Anthropic's engineering response was eventually thorough. The postmortem was honest. The quota extensions were the right call. But the eight-day communication gap is the part that should concern you. If you are building your workflow around a tool with consumption-based billing, you need to monitor your own usage independently. Do not rely on the vendor to tell you when their system is overcharging you.
Set up your own token tracking. Compare daily consumption against historical baselines. Alert on anomalies. And if a vendor takes more than 48 hours to acknowledge a billing anomaly that your own monitoring has detected — that tells you something about their operational maturity that no feature list will.





