AI benchmarks
Sortable leaderboards for the benchmarks frontier labs report. Pick one to see every model's score.
GPQA Diamond
101 models scored
Leader: Gemini 3.1 Pro Preview 94.1%
AA Intelligence Index
99 models scored
Leader: Claude Opus 4.8 61.4
Chatbot Arena Elo
52 models scored
Leader: Qwen3.7 Max 1537
MMLU-Pro
36 models scored
Leader: Gemini 3.1 Pro 93.8%
HLE
97 models scored
Leader: GLM-5 50.4%
TAU2-bench
88 models scored
Leader: JT-35B-Flash 99.1%
LiveCodeBench
99 models scored
Leader: GPT-5.2 88.9%
SciCode
96 models scored
Leader: Gemini 3.1 Pro Preview 58.9%
ARC-AGI-2
1 models scored
Leader: Gemini 3.1 Pro 77.1%
SWE-bench Verified
14 models scored
Leader: GPT-5.3 Codex 82.4%
HumanEval
26 models scored
Leader: GPT-5.3 Codex 96.8%