Claude Opus 4.5 vs Gemini 3.1 Pro: Benchmarks, Pricing & Capabilities Compared
TL;DR — Claude Opus 4.5 wins for coding · Gemini 3.1 Pro wins for cost + long-context.
Claude Opus 4.5 Anthropic
- Released
- 2025-11-24
- Context window
- 500K tokens
- Input price
- $15.00 / Mtok
- Output price
- $75.00 / Mtok
Key features
- First model to break 80.9% on SWE-Bench Verified
- 67% price reduction vs previous Opus
- Extended reasoning capabilities
Gemini 3.1 Pro Google
- Released
- 2026-02-19
- Context window
- 2M tokens
- Input price
- $2.50 / Mtok
- Output price
- $10.00 / Mtok
Key features
- 2x reasoning improvement
- ARC-AGI-2 score of 77.1%
- Enhanced multimodal understanding
Benchmark comparison
| Benchmark | Claude Opus 4.5 | Gemini 3.1 Pro |
|---|---|---|
| GPQA Diamond | 82.4% | 84.2% ✓ |
| SWE-bench Verified | 78.9% ✓ | 72.3% |
Pricing comparison
| Metric | Claude Opus 4.5 | Gemini 3.1 Pro |
|---|---|---|
| Input ($/Mtok) | $15.00 | $2.50 |
| Output ($/Mtok) | $75.00 | $10.00 |
| Cached input ($/Mtok) | $1.50 | $0.25 |
| Cost per 1M-token roundtrip (1M in + 1M out) | $90.00 | $12.50 |
Context window & modalities
| Attribute | Claude Opus 4.5 | Gemini 3.1 Pro |
|---|---|---|
| Context window | 500K tokens | 2M tokens |
| Input modalities | text, image, PDF | text, image, audio, video, PDF |
| Output modalities | text | text |
| Knowledge cutoff | 2025-08 | 2025-12 |
Verdict by use case
Coding
→ Claude Opus 4.5
Basis: SWE-bench
Claude Opus 4.5 78.9% vs Gemini 3.1 Pro 72.3% on SWE-bench.
Reasoning
→ Gemini 3.1 Pro
Basis: GPQA Diamond
Claude Opus 4.5 82.4% vs Gemini 3.1 Pro 84.2% on GPQA Diamond.
Math
Insufficient data
Basis: MATH / AIME
No shared math benchmark.
Long context
→ Gemini 3.1 Pro
Basis: Context window
Claude Opus 4.5 500K tokens vs Gemini 3.1 Pro 2M tokens.
Cost
→ Gemini 3.1 Pro
Basis: Input $/Mtok
Claude Opus 4.5 $15/Mtok vs Gemini 3.1 Pro $2.5/Mtok input.
Changelog & releases
Gemini 3.1 Pro
Released 2026-02-19
Predecessor: google-gemini-3-pro
- 2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
- Context window expanded to 2M tokens
- Deep Think mode enabled by default on the Pro tier
- Lower latency on first-token despite larger context