Anthropic's most powerful pre-Claude 4 model — tops GPT-4 on reasoning
Most capable model in the Claude 3 family with near-human performance on complex tasks
| Benchmark | Claude 3 Opus | Claude 2.1 | Δ |
|---|---|---|---|
| MMLU | 86.8% | 73.1% | +13.7 |
| HumanEval | 84.9% | 70.0% | +14.9 |
| MATH | 60.1% | 71.1% | -11.0 |
| GSM8K | 95.0% | — | — |