AI Flash Report

HLE leaderboard

97 models ranked, highest score first.

HLE leaderboard — 97 models ranked by score
# Model Company Score
1 Claude Fable 5 Anthropic 53.3%
2 GLM-5 Zhipu AI 50.4%
3 Claude Opus 4.8 Anthropic 45.7%
4 GPT-5.5 OpenAI 44.3%
5 GPT-5.4 OpenAI 41.6%
6 Muse Spark Meta 39.9%
7 GPT-5.3 Codex OpenAI 39.9%
8 Claude Opus 4.7 Anthropic 39.6%
9 Qwen3.7 Max Alibaba 38.1%
10 MiniMax-M3 MiniMax 37.1%
11 DeepSeek V4 Pro DeepSeek 35.9%
12 Kimi K2.6 Kimi 35.9%
13 GPT-5.2 OpenAI 35.4%
14 Grok 4.3 xAI 35.0%
15 MiMo-V2.5-Pro Xiaomi 33.8%
16 Qwen3.7 Plus Alibaba 33.4%
17 Grok 4.20 0309 v2 xAI 32.2%
18 DeepSeek V4 Flash DeepSeek 32.1%
19 Grok 4.20 0309 xAI 30.0%
20 Qwen3.6 Max Preview Alibaba 28.9%
21 Claude Opus 4.5 Anthropic 28.4%
22 MiMo-V2-Pro Xiaomi 28.3%
23 MiniMax-M2.7 MiniMax 28.1%
24 GLM-5.1 Z AI 28.0%
25 Nemotron 3 Ultra 550B A55B NVIDIA 26.6%
26 GPT-5.4 mini OpenAI 26.6%
27 GPT-5.4 nano OpenAI 26.5%
28 GPT-5.1 OpenAI 26.5%
29 Qwen3.6 Plus Alibaba 25.7%
30 Hy3-preview Tencent 25.5%
31 GLM-5-Turbo Z AI 25.4%
32 MiMo-V2.5 Xiaomi 25.2%
33 Qwen3.5 122B A10B Alibaba 23.4%
34 Gemini 3.5 Flash Google 23.1%
35 Gemma 4 31B Google 22.7%
36 Step 3.5 Flash 2603 StepFun 22.6%
37 Qwen3.5 27B Alibaba 22.2%
38 DeepSeek V3.2 DeepSeek 22.2%
39 Qwen3.6 27B Alibaba 21.6%
40 MiMo-V2-Omni-0327 Xiaomi 20.4%
41 GPT-5.5 Instant OpenAI 20.3%
42 Qwen3.6 35B A3B Alibaba 20.2%
43 Step 3.7 Flash StepFun 19.9%
44 MiMo-V2-Omni Xiaomi 19.9%
45 Qwen3.5 35B A3B Alibaba 19.7%
46 NVIDIA Nemotron 3 Super 120B A12B NVIDIA 19.2%
47 Ring-2.6-1T InclusionAI 18.3%
48 Gemma 4 26B A4B Google 18.3%
49 Gemini 3.1 Flash-Lite Preview Google 16.2%
50 GLM 5V Turbo Z AI 15.8%
51 HyperNova 60B 2605 Multiverse Computing 15.1%
52 Gemma 4 12B Google 14.8%
53 Trinity Large Thinking Arcee AI 14.7%
54 Qwen3.5 Omni Plus Alibaba 13.9%
55 Qwen3.5 9B Alibaba 13.3%
56 Mistral Medium 3.5 Mistral 12.8%
57 EXAONE 4.5 33B LG AI Research 11.6%
58 Nemotron Cascade 2 30B A3B NVIDIA 11.4%
59 Gemini 2.5 Flash Google 11.1%
60 Claude Sonnet 4.6 Anthropic 10.8%
61 Solar Pro 3 Upstage 10.1%
62 Sarvam 105B Sarvam 10.1%
63 North Mini Code Cohere 9.9%
64 Mistral Small 4 Mistral 9.5%
65 Ling-2.6-1T InclusionAI 8.2%
66 Qwen3.5 4B Alibaba 7.8%
67 Qwen3.5 Omni Flash Alibaba 7.1%
68 Sarvam 30B Sarvam 7.0%
69 LFM2.5-8B-A1B Liquid AI 6.9%
70 JT-MINI China Mobile 6.6%
71 MiniCPM5-1B OpenBMB 6.5%
72 Ling 2.6 Flash InclusionAI 6.2%
73 JT-35B-Flash China Mobile 6.1%
74 GPT-5 OpenAI 5.4%
75 Nemotron 3 Nano Omni 30B A3B Reasoning NVIDIA 5.3%
76 MiniCPM-V 4.6 1.3B OpenBMB 4.9%
77 Gemma 4 E2B Google 4.8%
78 NVIDIA Nemotron 3 Nano 4B NVIDIA 4.8%
79 Gemini 2.0 Flash Google 4.7%
80 LFM2 24B A2B Liquid AI 4.4%
81 Granite 4.1 30B IBM 4.2%
82 Claude 2.1 Anthropic 4.2%
83 Mistral Large 3 Mistral 4.1%
84 Claude 3 Haiku Anthropic 3.9%
85 Gemini 1.5 Pro Google 3.9%
86 Granite 4.1 8B IBM 3.8%
87 Grok-2 xAI 3.8%
88 Claude 3 Sonnet Anthropic 3.8%
89 Gemma 4 E4B Google 3.7%
90 Claude 3.5 Sonnet Anthropic 3.7%
91 DeepSeek-V3 DeepSeek 3.6%
92 Granite 4.1 3B IBM 3.4%
93 Mistral Large Mistral 3.4%
94 GPT-4 Turbo OpenAI 3.3%
95 Claude 3 Opus Anthropic 3.1%
96 Qwen3.5 2B Alibaba 2.1%
97 Qwen3.5 0.8B Alibaba 1.2%