AI Flash Report

HLE leaderboard

97 models ranked, highest score first.

HLE leaderboard — 97 models ranked by score
# Model Company Score
1 GLM-5 Zhipu AI 50.4%
2 Claude Opus 4.8 Anthropic 45.7%
3 Gemini 3.1 Pro Preview Google 44.7%
4 GPT-5.5 OpenAI 44.3%
5 GPT-5.4 OpenAI 41.6%
6 Muse Spark Meta 39.9%
7 GPT-5.3 Codex OpenAI 39.9%
8 Claude Opus 4.7 Anthropic 39.6%
9 Qwen3.7 Max Alibaba 38.1%
10 MiniMax-M3 MiniMax 37.1%
11 DeepSeek V4 Pro DeepSeek 35.9%
12 Kimi K2.6 Kimi 35.9%
13 GPT-5.2 OpenAI 35.4%
14 Grok 4.3 xAI 35.0%
15 MiMo-V2.5-Pro Xiaomi 33.8%
16 Qwen3.7 Plus Alibaba 33.4%
17 Grok 4.20 0309 v2 xAI 32.2%
18 DeepSeek V4 Flash DeepSeek 32.1%
19 Grok 4.20 0309 xAI 30.0%
20 Qwen3.6 Max Preview Alibaba 28.9%
21 Claude Opus 4.5 Anthropic 28.4%
22 MiMo-V2-Pro Xiaomi 28.3%
23 MiniMax-M2.7 MiniMax 28.1%
24 GLM-5.1 Z AI 28.0%
25 Qwen3.5 397B A17B Alibaba 27.3%
26 Nemotron 3 Ultra 550B A55B NVIDIA 26.6%
27 GPT-5.4 mini OpenAI 26.6%
28 GPT-5.4 nano OpenAI 26.5%
29 GPT-5.1 OpenAI 26.5%
30 Qwen3.6 Plus Alibaba 25.7%
31 Hy3-preview Tencent 25.5%
32 GLM-5-Turbo Z AI 25.4%
33 MiMo-V2.5 Xiaomi 25.2%
34 Qwen3.5 122B A10B Alibaba 23.4%
35 Gemini 3.5 Flash Google 23.1%
36 Gemma 4 31B Google 22.7%
37 Step 3.5 Flash 2603 StepFun 22.6%
38 Qwen3.5 27B Alibaba 22.2%
39 DeepSeek V3.2 DeepSeek 22.2%
40 Qwen3.6 27B Alibaba 21.6%
41 MiMo-V2-Omni-0327 Xiaomi 20.4%
42 GPT-5.5 Instant OpenAI 20.3%
43 Qwen3.6 35B A3B Alibaba 20.2%
44 Step 3.7 Flash StepFun 19.9%
45 MiMo-V2-Omni Xiaomi 19.9%
46 Qwen3.5 35B A3B Alibaba 19.7%
47 NVIDIA Nemotron 3 Super 120B A12B NVIDIA 19.2%
48 Ring-2.6-1T InclusionAI 18.3%
49 Gemma 4 26B A4B Google 18.3%
50 Gemini 3.1 Flash-Lite Preview Google 16.2%
51 GLM 5V Turbo Z AI 15.8%
52 Mercury 2 Inception 15.5%
53 Trinity Large Thinking Arcee AI 14.7%
54 Gemma 4 12B Google 14.6%
55 Qwen3.5 Omni Plus Alibaba 13.9%
56 Qwen3.5 9B Alibaba 13.3%
57 Mistral Medium 3.5 Mistral 12.8%
58 EXAONE 4.5 33B LG AI Research 11.6%
59 Nemotron Cascade 2 30B A3B NVIDIA 11.4%
60 Gemini 2.5 Flash Google 11.1%
61 Claude Sonnet 4.6 Anthropic 10.8%
62 Solar Pro 3 Upstage 10.1%
63 Sarvam 105B Sarvam 10.1%
64 Mistral Small 4 Mistral 9.5%
65 Ling-2.6-1T InclusionAI 8.2%
66 Qwen3.5 4B Alibaba 7.8%
67 Qwen3.5 Omni Flash Alibaba 7.1%
68 Sarvam 30B Sarvam 7.0%
69 JT-MINI China Mobile 6.6%
70 MiniCPM5-1B OpenBMB 6.5%
71 Ling 2.6 Flash InclusionAI 6.2%
72 JT-35B-Flash China Mobile 6.1%
73 GPT-5 OpenAI 5.4%
74 Nemotron 3 Nano Omni 30B A3B Reasoning NVIDIA 5.3%
75 Tiny Aya Global Cohere 5.2%
76 MiniCPM-V 4.6 1.3B OpenBMB 4.9%
77 Gemma 4 E2B Google 4.8%
78 NVIDIA Nemotron 3 Nano 4B NVIDIA 4.8%
79 Gemini 2.0 Flash Google 4.7%
80 LFM2 24B A2B Liquid AI 4.4%
81 Granite 4.1 30B IBM 4.2%
82 Claude 2.1 Anthropic 4.2%
83 Mistral Large 3 Mistral 4.1%
84 Claude 3 Haiku Anthropic 3.9%
85 Gemini 1.5 Pro Google 3.9%
86 Granite 4.1 8B IBM 3.8%
87 Grok-2 xAI 3.8%
88 Claude 3 Sonnet Anthropic 3.8%
89 Gemma 4 E4B Google 3.7%
90 Claude 3.5 Sonnet Anthropic 3.7%
91 DeepSeek-V3 DeepSeek 3.6%
92 Granite 4.1 3B IBM 3.4%
93 Mistral Large Mistral 3.4%
94 GPT-4 Turbo OpenAI 3.3%
95 Claude 3 Opus Anthropic 3.1%
96 Qwen3.5 2B Alibaba 2.1%
97 Qwen3.5 0.8B Alibaba 1.2%