Ranked by SWE-bench Verified and HumanEval scores. Covers GPT-5, Claude Sonnet, Gemini, DeepSeek and more.
Updated April 2026 · Source: AI Flash Report model database
| # | Model | SWE-bench | HumanEval | Input price |
|---|---|---|---|---|
| #1 |
Claude Sonnet 4.6
Anthropic
|
80.8% | 95.2% | $3.00 |
| #2 |
Claude Opus 4.5
Anthropic
|
78.9% | 95.0% | $15.00 |
| #3 |
Claude Opus 4.1
Anthropic
|
74.5% | 94.0% | $15.00 |
| #4 |
GPT-5.2
OpenAI
|
72.5% | 95.8% | $2.00 |
| #5 |
Claude Sonnet 4
Anthropic
|
72.3% | 94.5% | $3.00 |
| #6 |
GPT-5
OpenAI
|
67.4% | 93.5% | $2.50 |
| #7 |
Claude Sonnet 3.7
Anthropic
|
62.3% | 93.2% | $3.00 |
| #8 |
Claude 3.5 Sonnet
Anthropic
|
49.0% | 92% | $3.00 |
| #9 |
Gemini 3.1 Pro
Google
|
72.3% | — | $2.50 |
| #10 |
GPT-5.1
OpenAI
|
70.1% | — | $2.25 |
| #11 |
Gemini 3 Pro
Google
|
68.2% | — | $2.50 |
| #12 |
Kimi K2
Moonshot AI
|
65.8% | — | $0.15 |
| #13 |
DeepSeek V3.2
DeepSeek
|
— | 92.5% | $0.27 |
| #14 |
DeepSeek-V3
DeepSeek
|
— | 90.2% | $0.27 |
| #15 |
Claude 3 Opus
Anthropic
|
— | 84.9% | $15/M |