Claude 4 mid-tier with strong coding and long-horizon agentic reliability.
Latest generation Claude model with significant performance improvements
| Benchmark | Claude Sonnet 4 |
|---|---|
| MMLU | 88.7% |
| HumanEval | 94.5% |
| MATH | 76.8% |
| SWE-bench Verified | 72.3% |
| GPQA Diamond | 74.0% |