Claude Sonnet 5: Default Model Now, Beats Opus 4.8 on Two Benchmarks

Abhishek GautamJuly 4, 20266 min read

Claude Sonnet 5: Default Model Now, Beats Opus 4.8 on Two Benchmarks

Quick summary

Claude Sonnet 5 launched June 30 at $3/M input tokens and became the default for all free and Pro users July 1. It beats Opus 4.8 on Terminal-Bench and GDPval.

What Sonnet 5 Actually Is

Sonnet 5 is the most agentic model Anthropic has built at the Sonnet tier — designed to plan, use tools (browsers, terminals, code interpreters), and run autonomously at a level that previously required larger and more expensive models. It has a 1 million token context window, a 128,000 token max output (expandable to 300,000 via the batch API beta), and is live across Claude.ai, the Claude API, Claude Code, Cursor, VS Code, and GitHub Copilot.

Benchmark Numbers

Sonnet 5 scores 63.2% on SWE-bench Pro versus 69.2% for Opus 4.8 — a meaningful gap on coding tasks. On OSWorld-Verified (computer use), it scores 81.2% versus Opus 4.8's 83.4%. But on two benchmarks it actually beats Opus: Terminal-Bench 2.1 (80.4% vs 74.6%) and GDPval-AA v2 knowledge work (1,618 Elo vs 1,615). The Terminal-Bench result is particularly notable for developers — it suggests Sonnet 5 is better than Opus 4.8 at terminal-based agentic workflows, which is the primary use case in Claude Code.

Pricing Compared to Previous Models

Model	Input (per M tokens)	Output (per M tokens)
Sonnet 5 (intro, through Aug 31)	$2	$10
Sonnet 5 (standard from Sep 1)	$3	$15
Sonnet 4.6	$3	$15
Opus 4.8	$15	$75

At introductory pricing, Sonnet 5 is cheaper than Sonnet 4.6 and 7.5x cheaper than Opus 4.8 on input tokens. For developers running agentic workflows, this matters: Sonnet 5 beating Opus 4.8 on Terminal-Bench means you can switch workloads to the cheaper model without accepting a performance tradeoff on that specific task type.

What Changed From Sonnet 4.6

Sonnet 4.6 was a strong coding and general-purpose model but not primarily designed for agentic operation. Sonnet 5 shifts the design emphasis: the model is optimized for multi-step plans, tool chains, and autonomous execution rather than single-turn generation quality. The 300,000 token output via batch API beta is new — it allows Sonnet 5 to write very long documents or generate large codebases in a single call without chunking.

The Default Model Decision

Making Sonnet 5 the default for free and Pro users rather than Opus 4.8 is a cost and capability trade-off call. Anthropic runs its own inference at scale, and defaulting every free-tier user to Opus 4.8 ($15/M output) would be significantly more expensive than Sonnet 5 ($10/M at intro pricing). The benchmark numbers give Anthropic cover to make this call without appearing to downgrade the free tier — Sonnet 5 genuinely beats Opus 4.8 on two relevant benchmarks and is competitive on the rest.

Our Analysis

The Terminal-Bench result is the most useful data point here for developers. If Sonnet 5 outperforms Opus 4.8 on terminal-based agentic tasks, the correct call for Claude Code users and DevOps automation pipelines is to switch to Sonnet 5 now and capture the 7.5x cost reduction. The August 31 deadline for introductory pricing creates a window: workloads validated on Sonnet 5 before September 1 will then face the standard $3/$15 pricing, which is still cheaper than Opus 4.8 but requires re-evaluation for cost-sensitive pipelines. The 300k batch output token expansion is underreported — it quietly makes Sonnet 5 useful for document-generation and large codebase tasks that previously required Opus.

Key Takeaways

Sonnet 5 launched June 30: default for all free and Pro claude.ai users from July 1
Intro pricing through Aug 31: $2/$10 per M tokens (input/output) — cheaper than Sonnet 4.6
Beats Opus 4.8 on Terminal-Bench 2.1 (80.4% vs 74.6%) and GDPval-AA v2 (1,618 vs 1,615 Elo)
1M context, 128k output (300k via batch API beta)
For developers: switch terminal-based agentic workloads from Opus 4.8 to Sonnet 5 now — you're getting better performance at 7.5x lower cost
What to watch: standard pricing takes effect September 1 — validate cost models before then

FAQ

Frequently Asked Questions

What is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's most agentic mid-tier model, launched June 30, 2026. It's designed for multi-step autonomous workflows — browsing, terminal use, code execution — and became the default model for all free and Pro claude.ai users on July 1. It has a 1 million token context window and beats Opus 4.8 on two benchmarks while costing 7.5x less.

How does Claude Sonnet 5 compare to Opus 4.8?

Sonnet 5 scores lower than Opus 4.8 on SWE-bench Pro (63.2% vs 69.2%) and OSWorld-Verified (81.2% vs 83.4%). But it beats Opus 4.8 on Terminal-Bench 2.1 (80.4% vs 74.6%) and GDPval-AA v2 knowledge work (1,618 vs 1,615 Elo). At introductory pricing, Sonnet 5 costs $2/$10 per million tokens input/output vs Opus 4.8's $15/$75.

What is Claude Sonnet 5 pricing?

Sonnet 5 costs $2 per million input tokens and $10 per million output tokens through August 31, 2026. From September 1, standard pricing is $3/$15 per million tokens — the same as Sonnet 4.6's standard rate. Opus 4.8 costs $15/$75 per million tokens.

Is Claude Sonnet 5 good for coding and agentic tasks?

Yes. Sonnet 5 is specifically optimized for agentic operation — planning, tool use, and autonomous execution — and beats Opus 4.8 on Terminal-Bench 2.1, which measures terminal-based agentic workflows. For developers using Claude Code or building DevOps automation, Sonnet 5 is the better choice at a fraction of the cost. It scores lower than Opus 4.8 on SWE-bench Pro, so complex single-session coding tasks may still benefit from Opus.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

More on AI

All posts →

AIOpenAI

OpenAI Took the Pentagon Deal Anthropic Refused. 2.5 Million Users Are Quitting ChatGPT. Claude Hit #1.

Anthropic was blacklisted for refusing autonomous weapons access. OpenAI signed the same deal within hours. The backlash broke records — and sent users to Claude.

Mar 5, 2026·7 min read

AIAnthropic

ChatGPT Had 90% of the US Enterprise AI Market in 2025. Claude Now Has 70%. What Happened in 12 Months.

In February 2025, ChatGPT held 90% of the US business AI market. By February 2026, Claude enterprise share surged to nearly 70%. Here is what drove the shift and what it means for developers choosing AI platforms.

Mar 5, 2026·6 min read

AIAnthropic

Goldman Sachs Is Using Claude AI for Trade Accounting and Compliance. Wall Street Just Crossed a New Line.

Goldman Sachs partnered with Anthropic to deploy Claude AI agents for trade accounting and client onboarding. Anthropic engineers were embedded at Goldman for 6 months. Here is what this means for finance, developers, and enterprise AI adoption.

Mar 5, 2026·6 min read

AIOpenAI

OpenAI, Anthropic, and SSI All Say They Are Building Safe AI. They Disagree on What That Means.

Three companies, three completely different theories of how to build powerful AI responsibly. OpenAI ships fast and figures out safety later. Anthropic wants to understand before deploying. SSI refuses to launch any product until safety is solved. Only one approach can be right.

Feb 23, 2026·8 min read

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

ShareX / Twitter LinkedIn Instagram

Written by

Abhishek Gautam

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 993+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.

LinkedIn Instagram GitHub Portfolio Leave a thought →