AI benchmarks
Sortable leaderboards for the benchmarks frontier labs report. Pick one to see every model's score.
GPQA Diamond
101 models scored
Leader: GPT-5.5 93.5%
AA Intelligence Index
0 models scored
Chatbot Arena Elo
0 models scored
MMLU-Pro
36 models scored
Leader: Gemini 3.1 Pro 93.8%
HLE
97 models scored
Leader: Claude Fable 5 53.3%
TAU2-bench
88 models scored
Leader: JT-35B-Flash 99.1%
LiveCodeBench
99 models scored
Leader: GPT-5.2 88.9%
SciCode
96 models scored
Leader: Claude Fable 5 60.2%
ARC-AGI-2
1 models scored
Leader: Gemini 3.1 Pro 77.1%
SWE-bench Verified
14 models scored
Leader: GPT-5.3 Codex 82.4%
HumanEval
26 models scored
Leader: GPT-5.3 Codex 96.8%