AI Flash Report

Claude Sonnet 4.6 vs Gemini 3.1 Pro: Benchmarks, Pricing & Capabilities Compared

TL;DR — Claude Sonnet 4.6 wins for coding · Gemini 3.1 Pro wins for reasoning + long-context.

Claude Sonnet 4.6 Anthropic
Released
2026-02-17
Context window
500K tokens
Input price
$3.00 / Mtok
Output price
$15.00 / Mtok
Key features
  • Agent Teams: orchestrate 2-16 Claude instances
  • Near-Opus performance at 1/5th cost
  • 80.8% SWE-bench Verified
Released
2026-02-19
Context window
2M tokens
Input price
$2.50 / Mtok
Output price
$10.00 / Mtok
Key features
  • 2x reasoning improvement
  • ARC-AGI-2 score of 77.1%
  • Enhanced multimodal understanding

Benchmark comparison

Benchmark Claude Sonnet 4.6 Gemini 3.1 Pro
GPQA Diamond 78.4% 84.2%
SWE-bench Verified 80.8% 72.3%

Pricing comparison

Metric Claude Sonnet 4.6 Gemini 3.1 Pro
Input ($/Mtok) $3.00 $2.50
Output ($/Mtok) $15.00 $10.00
Cached input ($/Mtok) $0.30 $0.25
Cost per 1M-token roundtrip (1M in + 1M out) $18.00 $12.50

Context window & modalities

Attribute Claude Sonnet 4.6 Gemini 3.1 Pro
Context window 500K tokens 2M tokens
Input modalities text, image, PDF text, image, audio, video, PDF
Output modalities text text
Knowledge cutoff 2025-10 2025-12

Verdict by use case

Coding
→ Claude Sonnet 4.6
Basis: SWE-bench

Claude Sonnet 4.6 80.8% vs Gemini 3.1 Pro 72.3% on SWE-bench.

Reasoning
→ Gemini 3.1 Pro
Basis: GPQA Diamond

Claude Sonnet 4.6 78.4% vs Gemini 3.1 Pro 84.2% on GPQA Diamond.

Math
Insufficient data
Basis: MATH / AIME

No shared math benchmark.

Long context
→ Gemini 3.1 Pro
Basis: Context window

Claude Sonnet 4.6 500K tokens vs Gemini 3.1 Pro 2M tokens.

Cost
→ Gemini 3.1 Pro
Basis: Input $/Mtok

Claude Sonnet 4.6 $3/Mtok vs Gemini 3.1 Pro $2.5/Mtok input.

Changelog & releases

Claude Sonnet 4.6
Released 2026-02-17
  • Agent Teams: orchestrate 2–16 Claude instances in parallel
  • +8.5pt on SWE-bench Verified vs Sonnet 4
  • 1/5 the cost of Opus 4.5 at ~95% of coding quality
  • Fast mode research preview for lower-latency inference
Gemini 3.1 Pro
Released 2026-02-19
Predecessor: google-gemini-3-pro
  • 2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
  • Context window expanded to 2M tokens
  • Deep Think mode enabled by default on the Pro tier
  • Lower latency on first-token despite larger context

Related comparisons