Anthropic's new workhorse is its most agentic Sonnet yet, sold as close to Opus 4.8 for a fraction of the cost. The fine print sits in two places: the coding benchmark it did not publish, and the tokenizer change that quietly trims the discount.
Tips, corrections, or questions? support@omniscient.media

This article was updated on June 30, 2026 to reflect Anthropic's announcement that U.S. export controls on Fable 5 have been lifted.
Anthropic released Claude Sonnet 5 on June 30, and the most revealing thing about the launch is a number that is not in it. For a model that will spend most of its working life writing and fixing code, Anthropic published agentic benchmarks, browser benchmarks, and safety benchmarks, but not the SWE-bench Verified score the industry uses to rank coding models. That omission is the key to reading the rest of the release. Sonnet 5 is not being sold as a smarter coder. It is being sold as a cheaper one that can run on its own.
The pitch is explicit. Anthropic calls Sonnet 5 "the most agentic Sonnet model yet," able to "make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models," and places its performance "close to that of Opus 4.8, but at lower prices."[1] That is the whole proposition in one sentence: take the autonomy that used to cost Opus money and deliver it in the Sonnet tier. Whether it holds up turns on whether the gains are reliability or marketing dressed up as a price cut.
The headline upgrade is persistence, in the good sense of the word. Anthropic says Sonnet 5 finishes complex multi-step tasks where its predecessors "would stop short," and that it checks its own work without being asked. The customer testimonials the company chose all point the same way: an engineer describes handing it a two-part job that "used to stall halfway" and watching it finish end to end; a Rust engineer asked it to investigate a bug and, unprompted, it "wrote a reproducing test, implemented the fix, then stashed it." These are reliability claims about autonomy more than capability claims about raw intelligence, which is the difference between a model that can do a step and one you can trust to do all of them. For anyone deploying agents, that distinction is the entire ballgame, and it is the right thing to have improved.
Underneath, Anthropic reports a substantial lift in reasoning and tool use over Sonnet 4.6, lower hallucination and sycophancy, and better resistance to prompt injection, all of which matter more to an agent than another point of exam performance. There is also a quieter change with a price tag attached. Sonnet 5 ships a new tokenizer that, in Anthropic's words, "changes how the model processes text," such that the same input now maps to roughly 1.0 to 1.35 times as many tokens as before. Hold that thought for the pricing section.
Sonnet 4.6, the model this replaces, scored 79.6 percent on SWE-bench Verified, the standard coding test, on the eve of Sonnet 5's launch.[2] For reference, Opus 4.8 sits at 88.6 percent, Gemini 3.1 Pro at 80.6 percent, and Fable 5 reached 95 percent before Anthropic suspended it under export controls on June 12. That suspension did not hold: hours after the Sonnet 5 launch, Anthropic said the Commerce Department had lifted the export controls on Fable 5 and Mythos 5 and that it would begin restoring access the following day, July 1.[6] A new Sonnet aimed squarely at software engineers is the most natural place imaginable to post an updated figure. Anthropic did not. Instead it led with BrowseComp and OSWorld-Verified, the benchmarks that measure whether a model can drive a browser and operate a real computer, where Sonnet 5 shows a clear gain over 4.6 and narrows the gap to Opus.[3]
There are two ways to read the silence, and they are not mutually exclusive. The generous reading is that single-shot tests like SWE-bench undersell the actual improvement, which is in finishing long agentic chains rather than nailing one isolated patch, so Anthropic emphasized the benchmarks that capture it. The skeptical reading is that the raw coding number is not a leap over 4.6 or over Gemini, and a company does not headline a figure that fails to impress. Both can be true. Either way the takeaway for a buyer is the same: judge Sonnet 5 on agentic reliability and cost rather than on a leaderboard rank, because the rank is the one thing Anthropic chose not to show you.
On paper, Sonnet 5 is cheap for what it claims to do. Through August 31 it runs at an introductory $2 per million input tokens and $10 per million output, after which it settles to $3 and $15, the same sticker as Sonnet 4.6.[4] Opus 4.8, the model it is measured against, costs $5 and $25. If Sonnet 5 truly delivers near-Opus autonomy, paying roughly half for it is a strong trade.
The sticker is not the whole bill, though, and this is where the new tokenizer comes back. Because the same text now becomes up to 1.35 times as many tokens, and tokens are the unit you are billed in, a workload that cost a dollar on Sonnet 4.6 can cost as much as $1.35 on Sonnet 5 at the identical per-token rate, before the model writes a single extra word. The introductory discount partly masks this, which is presumably part of why it exists. Set against the field, the picture is competitive without being a runaway: Gemini 3.1 Pro lists at $2 input and $12 output and posted a higher published coding score than Sonnet 4.6 did, so price-sensitive teams that do not need Anthropic's agentic edge have a genuine alternative.
This is the question Sonnet 5 forces, because "close to Opus at lower prices" is, read uncharitably, Anthropic competing with itself. The honest answer is that it depends on the job. For the bulk of agentic software work, the browsing, the multi-step refactors, the bug-hunt-and-fix loops the launch testimonials describe, Sonnet 5 looks like enough, and the saving is real. Opus 4.8 keeps the edge on the number the launch did publish: its 88.6 percent on SWE-bench Verified is nine points above what Sonnet 4.6 posted, and Anthropic still positions Opus for the security-sensitive work where its stronger cyber and exploit-development scores aid defenders most. Sonnet 5 ships the identical default cyber safeguards that Opus 4.7 and 4.8 use, but Anthropic's own guidance still steers security teams that need reduced guardrails toward Opus, not Sonnet. The shape of the decision is familiar from the rest of the lineup. Sonnet is the default you reach for; Opus is the model you escalate to when Sonnet stalls. What Sonnet 5 changes is how rarely you have to escalate.
For a model whose entire pitch is autonomy, safety is part of the spec, and the practical news is good. Anthropic reports better resistance to prompt injection, the attack that matters most when a model is reading untrusted web pages and running tools on your behalf, and more reliable refusal of outright malicious requests.[5] Cyber safeguards ship enabled by default, the same ones used on Opus 4.7 and 4.8. The one usability cost to budget for is over-refusal: the same dial that makes Sonnet 5 decline malicious requests more reliably also makes it turn down more benign ones, with the slightly preachier tone its own testers flagged. The deeper question, how aligned the model actually is and what its own safety card quietly concedes about its evaluations, is a bigger story than a deployment review can hold. We take it apart separately in our read of the Sonnet 5 system card.
Yes, with two costs you should price in before you commit. Sonnet 5 is, as of today, the most capable model you can actually put into production at scale: GPT-5.6 remains a government-gated preview off the public API, and Fable 5's export suspension, though lifted by the Commerce Department the evening of the Sonnet 5 launch, will not translate into restored access until Anthropic's stated July 1 timeline plays out.[6] Until then, Sonnet 5 and Opus 4.8 remain the frontier you can actually buy by the token; once Fable 5 is back, the comparison in this review is the first one worth rerunning. For agentic work Sonnet 5 is the new default in the meantime, and the autonomy gains are the kind that show up as fewer half-finished tasks rather than as a flashy demo. The first cost is the coding benchmark Anthropic did not publish, which means you should run your own evaluation on your own codebase before you believe the upgrade. The second is the tokenizer that makes each of those tasks cost more tokens than the headline price implies, like a discount that gets a little smaller every time you read the invoice. Neither is disqualifying, but together they are the gap between the marketing and the bill. Buy it for the autonomy, deploy it against your own benchmark, and check the token math before the introductory rate expires on August 31.
Get this every weekday.
The Omniscient Bulletin: consequential AI, explained and evaluated. 5 to 7 items a day with the take, not the recap.
Sign in to join the discussion.