AI Flash Report

AI benchmarks

Sortable leaderboards for the benchmarks frontier labs report. Pick one to see every model's score.

GPQA Diamond

101 models scored

Leader: Gemini 3.1 Pro Preview 94.1%

AA Intelligence Index

99 models scored

Leader: Claude Opus 4.8 61.4

Chatbot Arena Elo

52 models scored

Leader: Qwen3.7 Max 1537

MMLU-Pro

36 models scored

Leader: Gemini 3.1 Pro 93.8%

HLE

97 models scored

Leader: GLM-5 50.4%

TAU2-bench

88 models scored

Leader: JT-35B-Flash 99.1%

LiveCodeBench

99 models scored

Leader: GPT-5.2 88.9%

SciCode

96 models scored

Leader: Gemini 3.1 Pro Preview 58.9%

ARC-AGI-2

1 models scored

Leader: Gemini 3.1 Pro 77.1%

SWE-bench Verified

14 models scored

Leader: GPT-5.3 Codex 82.4%

HumanEval

26 models scored

Leader: GPT-5.3 Codex 96.8%