GPQA Diamond leaderboard
GPQA Diamond is the hardest tier of the GPQA benchmark — graduate-level science questions designed to be Google-proof.
101 models ranked, highest score first.
| # | Model | Company | Score |
|---|---|---|---|
| 1 | GPT-5.5 | OpenAI | 93.5% |
| 2 | MiniMax-M3 | MiniMax | 92.9% |
| 3 | Claude Fable 5 | Anthropic | 92.6% |
| 4 | Qwen3.7 Max | Alibaba | 92.3% |
| 5 | Claude Opus 4.8 | Anthropic | 92.0% |
| 6 | GPT-5.4 | OpenAI | 92.0% |
| 7 | GPT-5.3 Codex | OpenAI | 91.5% |
| 8 | Claude Opus 4.7 | Anthropic | 91.4% |
| 9 | Kimi K2.6 | Kimi | 91.1% |
| 10 | Grok 4.20 0309 v2 | xAI | 91.1% |
| 11 | GPT-5.2 | OpenAI | 90.3% |
| 12 | Grok 4.3 | xAI | 90.1% |
| 13 | Qwen3.7 Plus | Alibaba | 90.0% |
| 14 | DeepSeek V4 Flash | DeepSeek | 89.4% |
| 15 | DeepSeek V4 Pro | DeepSeek | 88.8% |
| 16 | Qwen3.6 Max Preview | Alibaba | 88.8% |
| 17 | Grok 4.20 0309 | xAI | 88.5% |
| 18 | Muse Spark | Meta | 88.4% |
| 19 | Qwen3.6 Plus | Alibaba | 88.2% |
| 20 | GPT-5.4 mini | OpenAI | 87.5% |
| 21 | MiniMax-M2.7 | MiniMax | 87.4% |
| 22 | GPT-5.1 | OpenAI | 87.3% |
| 23 | MiMo-V2-Pro | Xiaomi | 87.0% |
| 24 | GLM-5.1 | Z AI | 86.8% |
| 25 | Nemotron 3 Ultra 550B A55B | NVIDIA | 86.7% |
| 26 | Hy3-preview | Tencent | 86.7% |
| 27 | MiMo-V2.5-Pro | Xiaomi | 86.6% |
| 28 | Claude Opus 4.5 | Anthropic | 86.6% |
| 29 | Qwen3.5 27B | Alibaba | 85.8% |
| 30 | Ring-2.6-1T | InclusionAI | 85.7% |
| 31 | Gemma 4 31B | 85.7% | |
| 32 | Qwen3.5 122B A10B | Alibaba | 85.7% |
| 33 | MiMo-V2-Omni-0327 | Xiaomi | 85.5% |
| 34 | MiMo-V2.5 | Xiaomi | 84.9% |
| 35 | GLM-5-Turbo | Z AI | 84.7% |
| 36 | GPT-5.5 Instant | OpenAI | 84.6% |
| 37 | Qwen3.5 35B A3B | Alibaba | 84.5% |
| 38 | Qwen3.6 27B | Alibaba | 84.2% |
| 39 | Gemini 3.1 Pro | 84.2% | |
| 40 | Qwen3.6 35B A3B | Alibaba | 84.1% |
| 41 | DeepSeek V3.2 | DeepSeek | 84.0% |
| 42 | JT-35B-Flash | China Mobile | 82.9% |
| 43 | Gemini 3.5 Flash | 82.8% | |
| 44 | MiMo-V2-Omni | Xiaomi | 82.8% |
| 45 | Step 3.5 Flash 2603 | StepFun | 82.6% |
| 46 | Qwen3.5 Omni Plus | Alibaba | 82.6% |
| 47 | Gemini 3.1 Flash-Lite Preview | 82.2% | |
| 48 | GPT-5.4 nano | OpenAI | 81.7% |
| 49 | Step 3.7 Flash | StepFun | 80.9% |
| 50 | GLM 5V Turbo | Z AI | 80.9% |
| 51 | Qwen3.5 9B | Alibaba | 80.6% |
| 52 | NVIDIA Nemotron 3 Super 120B A12B | NVIDIA | 80.0% |
| 53 | Claude Sonnet 4.6 | Anthropic | 79.7% |
| 54 | EXAONE 4.5 33B | LG AI Research | 79.4% |
| 55 | Gemma 4 26B A4B | 79.2% | |
| 56 | Claude Opus 4.1 | Anthropic | 79.1% |
| 57 | Gemini 2.5 Flash | 79.0% | |
| 58 | Gemini 3 Pro | 78.5% | |
| 59 | Qwen3.5 4B | Alibaba | 77.1% |
| 60 | Mistral Small 4 | Mistral | 76.9% |
| 61 | Nemotron Cascade 2 30B A3B | NVIDIA | 75.8% |
| 62 | North Mini Code | Cohere | 75.7% |
| 63 | Gemma 4 12B | 75.3% | |
| 64 | Ling-2.6-1T | InclusionAI | 75.2% |
| 65 | Trinity Large Thinking | Arcee AI | 75.2% |
| 66 | Mistral Medium 3.5 | Mistral | 74.8% |
| 67 | Qwen3.5 Omni Flash | Alibaba | 74.2% |
| 68 | Kimi K2 | Moonshot AI | 74.1% |
| 69 | Claude Sonnet 4 | Anthropic | 74.0% |
| 70 | Sarvam 105B | Sarvam | 73.8% |
| 71 | HyperNova 60B 2605 | Multiverse Computing | 73.3% |
| 72 | Solar Pro 3 | Upstage | 72.4% |
| 73 | Claude Sonnet 3.7 | Anthropic | 68.3% |
| 74 | Mistral Large 3 | Mistral | 68.0% |
| 75 | JT-MINI | China Mobile | 67.6% |
| 76 | GPT-5 | OpenAI | 67.3% |
| 77 | Gemini 2.0 Flash | 63.6% | |
| 78 | Sarvam 30B | Sarvam | 63.3% |
| 79 | Ling 2.6 Flash | InclusionAI | 59.3% |
| 80 | Gemma 4 E4B | 57.6% | |
| 81 | Claude 3.5 Sonnet | Anthropic | 56.0% |
| 82 | DeepSeek-V3 | DeepSeek | 55.7% |
| 83 | LFM2.5-8B-A1B | Liquid AI | 51.3% |
| 84 | NVIDIA Nemotron 3 Nano 4B | NVIDIA | 51.3% |
| 85 | Grok-2 | xAI | 51.0% |
| 86 | Claude 3 Opus | Anthropic | 48.9% |
| 87 | Granite 4.1 30B | IBM | 48.1% |
| 88 | LFM2 24B A2B | Liquid AI | 47.4% |
| 89 | Nemotron 3 Nano Omni 30B A3B Reasoning | NVIDIA | 46.9% |
| 90 | Qwen3.5 2B | Alibaba | 45.6% |
| 91 | Granite 4.1 8B | IBM | 43.3% |
| 92 | Gemma 4 E2B | 43.3% | |
| 93 | Claude 3 Sonnet | Anthropic | 40.0% |
| 94 | Claude 3 Haiku | Anthropic | 37.4% |
| 95 | Gemini 1.5 Pro | 37.1% | |
| 96 | Mistral Large | Mistral | 35.1% |
| 97 | Claude 2.1 | Anthropic | 31.9% |
| 98 | Granite 4.1 3B | IBM | 31.4% |
| 99 | MiniCPM-V 4.6 1.3B | OpenBMB | 30.5% |
| 100 | MiniCPM5-1B | OpenBMB | 27.8% |
| 101 | Qwen3.5 0.8B | Alibaba | 11.1% |