Anthropic's top-tier reasoning model for complex research and agents.
Anthropic's most capable model with breakthrough coding performance and major price reduction
| Benchmark | Claude Opus 4.5 | Claude Opus 4.1 | Δ |
|---|---|---|---|
| SWE-bench | 80.9% | 75.2% | +5.7 |
| MMLU | 92.8% | 91.2% | +1.6 |
| HumanEval | 95.0% | 94.0% | +1.0 |
| SWE-bench Verified | 78.9% | 74.5% | +4.4 |
| GPQA Diamond | 82.4% | 79.1% | +3.3 |
| AIME 2025 | 90.5% | — | — |