Claude Sonnet 5: Default Model Now, Beats Opus 4.8 on Two Benchmarks

Abhishek GautamAbhishek Gautam6 min read
Claude Sonnet 5: Default Model Now, Beats Opus 4.8 on Two Benchmarks

Quick summary

Claude Sonnet 5 launched June 30 at $3/M input tokens and became the default for all free and Pro users July 1. It beats Opus 4.8 on Terminal-Bench and GDPval.

Claude Sonnet 5 launched on June 30, 2026 and became the default model for every free and Pro user on claude.ai starting July 1. At its introductory price of $2 per million input tokens and $10 per million output tokens through August 31, it costs less than its predecessor Sonnet 4.6 while delivering performance that, on two specific benchmarks, beats Anthropic's flagship Opus 4.8.

What Sonnet 5 Actually Is

Sonnet 5 is the most agentic model Anthropic has built at the Sonnet tier — designed to plan, use tools (browsers, terminals, code interpreters), and run autonomously at a level that previously required larger and more expensive models. It has a 1 million token context window, a 128,000 token max output (expandable to 300,000 via the batch API beta), and is live across Claude.ai, the Claude API, Claude Code, Cursor, VS Code, and GitHub Copilot.

Benchmark Numbers

Sonnet 5 scores 63.2% on SWE-bench Pro versus 69.2% for Opus 4.8 — a meaningful gap on coding tasks. On OSWorld-Verified (computer use), it scores 81.2% versus Opus 4.8's 83.4%. But on two benchmarks it actually beats Opus: Terminal-Bench 2.1 (80.4% vs 74.6%) and GDPval-AA v2 knowledge work (1,618 Elo vs 1,615). The Terminal-Bench result is particularly notable for developers — it suggests Sonnet 5 is better than Opus 4.8 at terminal-based agentic workflows, which is the primary use case in Claude Code.

Pricing Compared to Previous Models

ModelInput (per M tokens)Output (per M tokens)
Sonnet 5 (intro, through Aug 31)$2$10
Sonnet 5 (standard from Sep 1)$3$15
Sonnet 4.6$3$15
Opus 4.8$15$75

At introductory pricing, Sonnet 5 is cheaper than Sonnet 4.6 and 7.5x cheaper than Opus 4.8 on input tokens. For developers running agentic workflows, this matters: Sonnet 5 beating Opus 4.8 on Terminal-Bench means you can switch workloads to the cheaper model without accepting a performance tradeoff on that specific task type.

What Changed From Sonnet 4.6

Sonnet 4.6 was a strong coding and general-purpose model but not primarily designed for agentic operation. Sonnet 5 shifts the design emphasis: the model is optimized for multi-step plans, tool chains, and autonomous execution rather than single-turn generation quality. The 300,000 token output via batch API beta is new — it allows Sonnet 5 to write very long documents or generate large codebases in a single call without chunking.

The Default Model Decision

Making Sonnet 5 the default for free and Pro users rather than Opus 4.8 is a cost and capability trade-off call. Anthropic runs its own inference at scale, and defaulting every free-tier user to Opus 4.8 ($15/M output) would be significantly more expensive than Sonnet 5 ($10/M at intro pricing). The benchmark numbers give Anthropic cover to make this call without appearing to downgrade the free tier — Sonnet 5 genuinely beats Opus 4.8 on two relevant benchmarks and is competitive on the rest.

Our Analysis

The Terminal-Bench result is the most useful data point here for developers. If Sonnet 5 outperforms Opus 4.8 on terminal-based agentic tasks, the correct call for Claude Code users and DevOps automation pipelines is to switch to Sonnet 5 now and capture the 7.5x cost reduction. The August 31 deadline for introductory pricing creates a window: workloads validated on Sonnet 5 before September 1 will then face the standard $3/$15 pricing, which is still cheaper than Opus 4.8 but requires re-evaluation for cost-sensitive pipelines. The 300k batch output token expansion is underreported — it quietly makes Sonnet 5 useful for document-generation and large codebase tasks that previously required Opus.

Key Takeaways

  • Sonnet 5 launched June 30: default for all free and Pro claude.ai users from July 1
  • Intro pricing through Aug 31: $2/$10 per M tokens (input/output) — cheaper than Sonnet 4.6
  • Beats Opus 4.8 on Terminal-Bench 2.1 (80.4% vs 74.6%) and GDPval-AA v2 (1,618 vs 1,615 Elo)
  • 1M context, 128k output (300k via batch API beta)
  • For developers: switch terminal-based agentic workloads from Opus 4.8 to Sonnet 5 now — you're getting better performance at 7.5x lower cost
  • What to watch: standard pricing takes effect September 1 — validate cost models before then

FAQ

Frequently Asked Questions

What is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's most agentic mid-tier model, launched June 30, 2026. It's designed for multi-step autonomous workflows — browsing, terminal use, code execution — and became the default model for all free and Pro claude.ai users on July 1. It has a 1 million token context window and beats Opus 4.8 on two benchmarks while costing 7.5x less.

How does Claude Sonnet 5 compare to Opus 4.8?

Sonnet 5 scores lower than Opus 4.8 on SWE-bench Pro (63.2% vs 69.2%) and OSWorld-Verified (81.2% vs 83.4%). But it beats Opus 4.8 on Terminal-Bench 2.1 (80.4% vs 74.6%) and GDPval-AA v2 knowledge work (1,618 vs 1,615 Elo). At introductory pricing, Sonnet 5 costs $2/$10 per million tokens input/output vs Opus 4.8's $15/$75.

What is Claude Sonnet 5 pricing?

Sonnet 5 costs $2 per million input tokens and $10 per million output tokens through August 31, 2026. From September 1, standard pricing is $3/$15 per million tokens — the same as Sonnet 4.6's standard rate. Opus 4.8 costs $15/$75 per million tokens.

Is Claude Sonnet 5 good for coding and agentic tasks?

Yes. Sonnet 5 is specifically optimized for agentic operation — planning, tool use, and autonomous execution — and beats Opus 4.8 on Terminal-Bench 2.1, which measures terminal-based agentic workflows. For developers using Claude Code or building DevOps automation, Sonnet 5 is the better choice at a fraction of the cost. It scores lower than Opus 4.8 on SWE-bench Pro, so complex single-session coding tasks may still benefit from Opus.

Free Weekly Briefing

The AI & Dev Briefing

One honest email a week — what actually matters in AI and software engineering. No noise, no sponsored content. Read by developers across 30+ countries.

No spam. Unsubscribe anytime.

Free Tool

Will AI replace your job?

4 questions. Get a personalised developer risk score based on your stack, role, and what you actually build day to day.

Check Your AI Risk Score →

Written by

Software Engineer based in Delhi, India. Writes about AI models, semiconductor supply chains, and tech geopolitics — covering the intersection of infrastructure and global events. 993+ posts cited by ChatGPT, Perplexity, and Gemini. Read in 167 countries.