AI Flash Report

Claude Opus 4.5 vs Gemini 3.1 Pro: Benchmarks, Pricing & Capabilities Compared

TL;DR — Claude Opus 4.5 wins for coding · Gemini 3.1 Pro wins for cost + long-context.

Claude Opus 4.5 Anthropic
Released
2025-11-24
Context window
500K tokens
Input price
$15.00 / Mtok
Output price
$75.00 / Mtok
Key features
  • First model to break 80.9% on SWE-Bench Verified
  • 67% price reduction vs previous Opus
  • Extended reasoning capabilities
Released
2026-02-19
Context window
2M tokens
Input price
$2.50 / Mtok
Output price
$10.00 / Mtok
Key features
  • 2x reasoning improvement
  • ARC-AGI-2 score of 77.1%
  • Enhanced multimodal understanding

Benchmark comparison

Benchmark Claude Opus 4.5 Gemini 3.1 Pro
GPQA Diamond 82.4% 84.2%
SWE-bench Verified 78.9% 72.3%

Pricing comparison

Metric Claude Opus 4.5 Gemini 3.1 Pro
Input ($/Mtok) $15.00 $2.50
Output ($/Mtok) $75.00 $10.00
Cached input ($/Mtok) $1.50 $0.25
Cost per 1M-token roundtrip (1M in + 1M out) $90.00 $12.50

Context window & modalities

Attribute Claude Opus 4.5 Gemini 3.1 Pro
Context window 500K tokens 2M tokens
Input modalities text, image, PDF text, image, audio, video, PDF
Output modalities text text
Knowledge cutoff 2025-08 2025-12

Verdict by use case

Coding
→ Claude Opus 4.5
Basis: SWE-bench

Claude Opus 4.5 78.9% vs Gemini 3.1 Pro 72.3% on SWE-bench.

Reasoning
→ Gemini 3.1 Pro
Basis: GPQA Diamond

Claude Opus 4.5 82.4% vs Gemini 3.1 Pro 84.2% on GPQA Diamond.

Math
Insufficient data
Basis: MATH / AIME

No shared math benchmark.

Long context
→ Gemini 3.1 Pro
Basis: Context window

Claude Opus 4.5 500K tokens vs Gemini 3.1 Pro 2M tokens.

Cost
→ Gemini 3.1 Pro
Basis: Input $/Mtok

Claude Opus 4.5 $15/Mtok vs Gemini 3.1 Pro $2.5/Mtok input.

Changelog & releases

Claude Opus 4.5
Released 2025-11-24
Gemini 3.1 Pro
Released 2026-02-19
Predecessor: google-gemini-3-pro
  • 2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
  • Context window expanded to 2M tokens
  • Deep Think mode enabled by default on the Pro tier
  • Lower latency on first-token despite larger context

Related comparisons