Claude's Journey to Foolishness in Diagrams: The Cost of Thriftiness, or How API Bill Increased 100-Fold

By: blockbeats|2026/04/13 19:04:34

A few days ago, Stella Laurenzo, Head of AI at AMD, posted an issue titled "Claude Code Unusable for Complex Engineering Tasks" in the Claude Code official repository. This was not a user's emotional complaint but a quantitative analysis based on 6,800 sessions. It brought to the forefront the AI community's most unwilling-to-face issue, with one set of numbers particularly standing out: a cost-saving configuration tweak by Anthropic skyrocketed this team's API monthly bill from $345 to $42,121.

Laurenzo's team tracked 235,000 tool invocations, 18,000 prompts, and documented the systemic performance degradation of Claude Code since February 2026. This report was later covered by The Register, sparking a two-week-long storm of public opinion in the developer community.

Boris Cherny, Head of the Anthropic Claude Code team, provided an explanation on Hacker News. On February 9, with the release of Opus 4.6, a "self-thinking" mechanism was enabled by default, where the model autonomously decides the thought duration. On March 3, Anthropic then lowered the default thinking effort to 85. The official explanation was "the optimal balance point between intelligence, latency, and cost." The actual impact of these two adjustments is evident from the data.

Thought Depth Plummets by Three Quarters

According to Stella Laurenzo's GitHub Issue data, Claude Code's average thought depth experienced a three-stage collapse over two months: from a high of 2,200 characters at the end of January to 720 characters by the end of February, a 67% drop. By March, it further shrunk to 560 characters, a 75% decrease from the peak.

Claude's Journey to Foolishness in Diagrams: The Cost of Thriftiness, or How API Bill Increased 100-Fold

Thought depth here is a proxy metric reflecting how much "internal deliberation" the model is willing to engage in before providing an answer. The difference between 2,200 and 560 characters is roughly equivalent to degrading from "drafting before responding" to "thinking for two seconds in your head before speaking."

Laurenzo also pointed out that the "Thought Content Redaction" feature (redact-thinking-2026-02-12) launched in early March coincidentally masked the model's thought process during this period, making the shrinkage less perceptible to users. Boris Cherny insists this was merely a UI change and did not affect the underlying reasoning. Both claims are technically valid, but from a user's perspective, the effect is indistinguishable.

Boris Cherny later acknowledged that even manually setting the effort back to maximum, the self-thought mechanism may still allocate insufficient reasoning in some rounds, leading to hallucinatory content. "Restoring maximum effort" is not a complete solution; it merely turns the knob back closer to its original position rather than restoring it to its original determinism.

From "Research-Oriented Programmer" to "Blind Edit Programmer"

A detail in Stella Laurenzo's report is more explicit than thinking depth: how many relevant files the model actively reads before making changes to the code.

According to GitHub Issue data, during the prime period, the average read-to-edit ratio is 6.6. Before making a code change, the model, on average, reads 6.6 files to understand the context. During the decay period, this number drops to 2.0, a 70% decrease. More critically, about one-third of code edits occur without the model reading the target file, diving straight in.

Laurenzo refers to this as "blind edits." In engineering terms, this is akin to a programmer writing code without looking at function signatures or knowing variable types. "Every senior engineer on my team has had similar first-hand experiences," she wrote in her report. "Claude can no longer be trusted to carry out complex engineering tasks."

The drop from a 6.6 read-to-edit ratio to 2.0 is not merely a behavioral metric shift; it signifies a collapse in task success rates. The complexity of modern code repositories dictates that any modification involves dependencies across multiple files. Skipping context exploration and directly making changes doesn't lead to merely "incorrect answers" but rather to "seemingly correct changes that trigger new errors downstream. The cost of debugging such errors far exceeds that of a single failed explicit answer.

The Paradox of "Saving Money"

One of the most counterintuitive sets of numbers in the entire incident comes from the same GitHub Issue data: Stella Laurenzo's team saw the monthly invocation costs of Claude Code API plummet from $345 in February 2026 to a whopping $42,121 in March, a 122-fold increase.

The logic behind Anthropics' effort reduction was to lower the token consumption per call, thus reducing costs. However, the outcome was the opposite. The reason behind this was the emergence of numerous "reasoning loops" after the model's decay, leading to repeated self-negation within a single reply, constant restarts, and a token consumption far exceeding the saved amount. According to Stella Laurenzo's data, the rate of users voluntarily aborting tasks increased by 12 times during the same period, requiring developers' continuous intervention, correction, and resubmission.

The underlying logic is a systemic error. Slashing computational power on a complex task does not simply proportionally reduce costs. Once below a certain threshold of thought, the model starts to veer off track, and the overall cost ends up escalating. Lowering effort saved money on simple queries, but on coding tasks, it blew up the bill.

The "Dumbing Down" Thing, GPT-4 Did It Three Years Ago

In July 2023, a research team from Stanford University and the University of California, Berkeley, published a paper on arXiv titled "How is ChatGPT's behavior changing over time?", documenting the same phenomenon happening on GPT-4.

According to the research data, in March 2023, GPT-4 had generated code where over 50% was directly runnable. By June, this proportion had dropped to 10%, an 80% decrease over three months. During the same period, the prime number identification accuracy plummeted from 97.6% to 2.4%. OpenAI's response was highly similar to Anthropic's: there had been optimizations in the background, part of normal iteration.

The structure of the two stories is almost identical: an AI company quietly adjusted parameters affecting the model's capabilities in the background, users noticed, the company acknowledged the adjustment, but explained it as "more reasonable resource allocation." GPT-4's degradation occurred in 2023, Claude's degradation happened in 2026, three years apart, but the script is the same.

This is not a specific company's peculiar mistake. The economic logic of AI subscription models determines that when reasoning costs exceed the pricing that can be covered, manufacturers face the same pressure. Lowering the default thought intensity is currently the easiest knob to turn between cost and performance. What users perceive is the model "getting dumber." What the manufacturer saves on the books is the marginal token cost per call.

Boris Cherny has provided a technical solution where users can manually restore the thought intensity to the highest level through the /effort high command or by modifying the configuration file. This solution is technically feasible, but it also means that "maximum performance" is no longer the default setting.

From $345 to $42,121, what was spent was not just the budget but also an assumption: the default configuration changes made by the manufacturer were intended to improve user experience.

On June 9, The Kobeissi Letter, citing Goldman Sachs data, reported that global investors are selling South Korean stocks at an unusually rapid pace. In the latest trading session, foreign investors sold about $801 million worth of Kospi constituent stocks again; total foreign outflows last week reached about $10 billion, and the market has been in net foreign selling on nearly every trading day over the past month. According to the data cited in the report, foreign investors have sold about $75 billion worth of South Korean stocks so far this year. Meanwhile, South Korean retail and institutional investors together recorded roughly $69 billion in net buying over the same period, suggesting that the market’s main buying support has come from domestic capital rather than returning overseas funds. The information currently disclosed still mainly comes from The Kobeissi Letter’s retelling and Goldman Sachs data summaries, while public details on the statistical period and the specific definition of “selling” remain relatively limited.

Fortune Warns of Strategy’s Financing Structure Risks as Bitcoin Premium Narrows

Fortune warned that Strategy’s Bitcoin treasury model faces growing financing risks as MSTR’s net asset premium narrows and preferred stock dividend pressure increases.

Ferrari Challenge Le Mans: Carl Moon to Dominate in WEEX Livery

The art of absolute control. Inside Carl Moon’s Ferrari 296 Challenge quest at Le Mans, taming the storm together with the official WEEX livery.

Sahara AI Responds to SAHARA’s Sharp Drop: No Contract or Product Security Issues Found, Internal Investigation Underway

Sahara AI responded to SAHARA’s 60% price drop, saying no token contract or product security issues have been found and an internal investigation is underway.

WEEX Deposit/Withdrawal Dynamic Island: Your Asset Status, Always in Sight

WEEX introduces Deposit and Withdrawal Info on Dynamic Island for iOS. See fund transfer progress on your dynamic island, lock screen, or while using other apps. No more guessing. No more refreshing.

Scaling Crypto Derivatives: The Digital Asset Infrastructure Behind High-Volume Trading

In the fast-moving digital asset ecosystem, derivatives platforms face an extreme architectural test. High-leverage futures markets demand more than just standard security—they require absolute operational precision, zero-latency matching engines, and ironclad structural scalability, all while navigating intense market volatility.

As global platforms scale to meet these demands, the industry is shifting away from rigid, monolithic setups toward a more agile, "decoupled" infrastructure philosophy.

The Blueprint for High-Volume Copy Trading

For elite global exchanges like WEEX (founded in 2018), this architectural choice becomes critical when scaling high-volume retail features like social copy trading. When thousands of users automatically mirror the real-time strategies of elite traders simultaneously, it triggers sudden, monumental spikes in concurrent transactional volume.

To prevent execution latency or settlement bottlenecks during these peak volatility events, a platform's primary engine must remain entirely dedicated to risk management, copy-trade synchronization, and order matching.

The Architectural Rule: New-generation platforms must separate front-end user execution engines from heavy backend infrastructural overhead to eliminate operational friction.

By separating these layers, platforms can maintain complete sovereignty over their trading environments and user experiences while strategically aligning with institutional-grade infrastructure ecosystems. This strategic framework allows modern exchanges to leverage advanced Digital Asset Custody infrastructure such as Cobo’s behind the scenes, ensuring that backend wallet management scales elastically alongside trading spikes.

Capitalizing on Market Momentum and 400× Leverage

In a derivatives arena where platforms offer up to 400× leverage on perpetual contracts, capital efficiency and market agility are core business metrics. To capture market momentum, an exchange needs the ability to rapidly expand its asset offerings, supporting everything from legacy crypto assets to sudden, trending altcoins across a massive library of trading pairs.

Adopting a flexible, scalable Wallet-as-a-Service (WaaS) solution such as Cobo’s could completely rewrite the development timeline for high-growth exchanges. Instead of spending months of engineering capital building out custom backend wallet architectures for every new blockchain network, platforms can deploy localized infrastructure in days.

This agility allows platforms to instantly scale their listings to over a thousand trading pairs without compromising security or delaying time-to-market. It mirrors the exact operational advantages seen during high-velocity market events, similar to how advanced wallet infrastructure empowers platforms during sudden asset surges; allowing exchanges to pass that speed and liquidity directly to their global user base.

A Mature Foundation for Growth

The synergy between trusted infrastructure ecosystems and global trading platforms represents the natural evolution of a maturing crypto market. As WEEX continues to scale its global spot and derivatives offerings for over 6 million users, adopting robust backend paradigms proves that platforms no longer have to compromise between cutting-edge trading velocity and uncompromised structural security.

Get Paid to Onboard? Try WEEX’s New Homepage with Rewards for Registration, Deposit & Trade

WEEX just launched a brand new homepage and a 3-step new user onboarding guidance. Complete Registration → Deposit → Trade to earn exclusive rewards. Faster navigation, clear progress, and instant bonuses. Download the latest WEEX App to try it now.

WEEX Custom Layout: Build Your Perfect Trading Workspace in Seconds

WEEX introduces custom layout on futures trading page: left/right panel switch, hide/show core modules, full-screen focus, and one-click reset. Trade your way now.

Morning Report | BitMine increased its holdings by 126,971 ETH last week; trader Eugene announced his exit from the crypto market

Overview of Important Market Events on June 8th

Wang Chuan: How can one not feel anxious after the neighbor Old Wang made thirty times profit by investing in storage stocks? (Seven) - A quarter-century cycle