AI Model Release Timeline 2025–2026

Every LLM launch tracked — GPT-5, Claude 4, Gemini 2, Llama 4 and more. Updated weekly with launch dates, benchmarks, and capabilities.

120 model releases tracked | Last updated: 2026-05-18 10:44 UTC | Compare models side by side →

2026

May

Major Release

MiniCPM-V 4.6 1.3B

OpenBMB
Released: 2026-05-11
Type: LLM
Context: 262K tokens
License: Apache 2.0
OpenBMB MiniCPM-V 4.6 1.3B — AA Intelligence Index 12.7, 262K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

30.5%
GPQA Diamond
4.9%
HLE
2.1%
SciCode
87.7%
TAU2-bench
0.0%
TerminalBench-Hard
26.7%
IF-Bench
6.3%
LiveCodeBench Reasoning
12.7
AA Intelligence Index

April

Major Release

Grok 4.3

xAI
Released: 2026-04-30
Type: LLM
Context: 1M tokens
License: Proprietary
xAI Grok 4.3 — AA Intelligence Index 53.2, 1M tokens context, reasoning model.
💰 $1.25 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

90.1%
GPQA Diamond
35.0%
HLE
47.3%
SciCode
97.7%
TAU2-bench
37.9%
TerminalBench-Hard
81.3%
IF-Bench
64.3%
LiveCodeBench Reasoning
53.2
AA Intelligence Index
1454
Chatbot Arena Elo
Major Release

Granite 4.1 30B

IBM
Released: 2026-04-29
Type: LLM
Context: 131K tokens
License: Apache 2.0
IBM Granite 4.1 30B — AA Intelligence Index 14.7, 131K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

48.1%
GPQA Diamond
4.2%
HLE
25.8%
SciCode
42.1%
TAU2-bench
2.3%
TerminalBench-Hard
44.4%
IF-Bench
18.7%
LiveCodeBench Reasoning
14.7
AA Intelligence Index
Major Release

Granite 4.1 3B

IBM
Released: 2026-04-29
Type: LLM
Context: 131K tokens
License: Apache 2.0
IBM Granite 4.1 3B — AA Intelligence Index 8.5, 131K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

31.4%
GPQA Diamond
3.4%
HLE
11.9%
SciCode
19.6%
TAU2-bench
2.3%
TerminalBench-Hard
33.7%
IF-Bench
3.0%
LiveCodeBench Reasoning
8.5
AA Intelligence Index
Major Release

Granite 4.1 8B

IBM
Released: 2026-04-29
Type: LLM
Context: 131K tokens
License: Apache 2.0
IBM Granite 4.1 8B — AA Intelligence Index 12.4, 131K tokens context.
💰 $0.05 in / $0.10 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

43.3%
GPQA Diamond
3.8%
HLE
21.8%
SciCode
27.8%
TAU2-bench
0.0%
TerminalBench-Hard
38.6%
IF-Bench
12.0%
LiveCodeBench Reasoning
12.4
AA Intelligence Index
1199
Chatbot Arena Elo
Major Release

Mistral Medium 3.5

Mistral
Released: 2026-04-29
Type: LLM
Context: 256K tokens
License: Other
Mistral Mistral Medium 3.5 — AA Intelligence Index 39.2, 256K tokens context, reasoning model.
💰 $1.50 in / $7.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

74.8%
GPQA Diamond
12.8%
HLE
39.6%
SciCode
94.2%
TAU2-bench
33.3%
TerminalBench-Hard
68.8%
IF-Bench
61.0%
LiveCodeBench Reasoning
39.2
AA Intelligence Index
Major Release

Nemotron 3 Nano Omni 30B A3B Reasoning

NVIDIA
Released: 2026-04-29
Type: LLM
Context: 256K tokens
License: NVIDIA Open Model License Agreement
NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning — AA Intelligence Index 21.4, 256K tokens context, reasoning model.
💰 $0.07 in / $0.30 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

46.9%
GPQA Diamond
5.3%
HLE
27.8%
SciCode
45.3%
TAU2-bench
8.3%
TerminalBench-Hard
63.2%
IF-Bench
35.7%
LiveCodeBench Reasoning
21.4
AA Intelligence Index
Major Release

DeepSeek V4 Flash

DeepSeek
Released: 2026-04-24
Type: LLM
Context: 1M tokens
License: MIT
DeepSeek DeepSeek V4 Flash — AA Intelligence Index 46.5, 1M tokens context, reasoning model.
💰 $0.14 in / $0.28 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

89.4%
GPQA Diamond
32.1%
HLE
44.9%
SciCode
95.0%
TAU2-bench
35.6%
TerminalBench-Hard
79.2%
IF-Bench
63.0%
LiveCodeBench Reasoning
46.5
AA Intelligence Index
1441
Chatbot Arena Elo
Major Release

DeepSeek V4 Pro

DeepSeek
Released: 2026-04-24
Type: LLM
Context: 1M tokens
License: MIT
DeepSeek DeepSeek V4 Pro — AA Intelligence Index 51.5, 1M tokens context, reasoning model.
💰 $1.74 in / $3.48 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

88.8%
GPQA Diamond
35.9%
HLE
50.0%
SciCode
96.2%
TAU2-bench
46.2%
TerminalBench-Hard
76.5%
IF-Bench
66.3%
LiveCodeBench Reasoning
51.5
AA Intelligence Index
1459
Chatbot Arena Elo
Major Release

Ling-2.6-1T

InclusionAI
Released: 2026-04-23
Type: LLM
Context: 262K tokens
License: Mit
InclusionAI Ling-2.6-1T — AA Intelligence Index 33.6, 262K tokens context.
💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.2%
GPQA Diamond
8.2%
HLE
37.0%
SciCode
89.8%
TAU2-bench
31.1%
TerminalBench-Hard
56.9%
IF-Bench
34.7%
LiveCodeBench Reasoning
33.6
AA Intelligence Index
Major Release

GPT-5.5

OpenAI
Released: 2026-04-23
Type: LLM
Context: 922K tokens
License: Proprietary
OpenAI GPT-5.5 — AA Intelligence Index 60.2, 922K tokens context, reasoning model.
💰 $5.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

93.5%
GPQA Diamond
44.3%
HLE
56.1%
SciCode
93.9%
TAU2-bench
60.6%
TerminalBench-Hard
75.9%
IF-Bench
74.3%
LiveCodeBench Reasoning
60.2
AA Intelligence Index
1476
Chatbot Arena Elo
Major Release

Hy3-preview

Tencent
Released: 2026-04-23
Type: LLM
Context: 256K tokens
License: TENCENT HY COMMUNITY LICENSE AGREEMENT
Tencent Hy3-preview — AA Intelligence Index 41.9, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.7%
GPQA Diamond
25.5%
HLE
41.2%
SciCode
92.7%
TAU2-bench
34.1%
TerminalBench-Hard
63.1%
IF-Bench
54.7%
LiveCodeBench Reasoning
41.9
AA Intelligence Index
Major Release

Qwen3.6 27B

Alibaba
Released: 2026-04-22
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.6 27B — AA Intelligence Index 45.8, 262K tokens context, reasoning model.
💰 $0.60 in / $3.60 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

84.2%
GPQA Diamond
21.6%
HLE
39.8%
SciCode
94.2%
TAU2-bench
34.8%
TerminalBench-Hard
67.6%
IF-Bench
68.7%
LiveCodeBench Reasoning
45.8
AA Intelligence Index
Major Release

MiMo-V2.5

Xiaomi
Released: 2026-04-22
Type: LLM
Context: 1M tokens
License: Mit
Xiaomi MiMo-V2.5 — AA Intelligence Index 49.0, 1M tokens context, reasoning model.
💰 $0.36 in / $1.80 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.9%
GPQA Diamond
25.2%
HLE
43.1%
SciCode
90.6%
TAU2-bench
41.7%
TerminalBench-Hard
67.1%
IF-Bench
62.7%
LiveCodeBench Reasoning
49.0
AA Intelligence Index
1427
Chatbot Arena Elo
Major Release

MiMo-V2.5-Pro

Xiaomi
Released: 2026-04-22
Type: LLM
Context: 1M tokens
License: Mit
Xiaomi MiMo-V2.5-Pro — AA Intelligence Index 53.8, 1M tokens context, reasoning model.
💰 $1.00 in / $3.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.6%
GPQA Diamond
33.8%
HLE
50.2%
SciCode
94.2%
TAU2-bench
43.2%
TerminalBench-Hard
79.9%
IF-Bench
73.3%
LiveCodeBench Reasoning
53.8
AA Intelligence Index
1463
Chatbot Arena Elo
Major Release

Ling 2.6 Flash

InclusionAI
Released: 2026-04-21
Type: LLM
Context: 262K tokens
License: Mit
InclusionAI Ling 2.6 Flash — AA Intelligence Index 26.2, 262K tokens context.
💰 $0.10 in / $0.30 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

59.3%
GPQA Diamond
6.2%
HLE
27.1%
SciCode
86.0%
TAU2-bench
21.2%
TerminalBench-Hard
57.4%
IF-Bench
25.0%
LiveCodeBench Reasoning
26.2
AA Intelligence Index
Major Release

Qwen3.6 Max Preview

Alibaba
Released: 2026-04-20
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3.6 Max Preview — AA Intelligence Index 51.8, 256K tokens context, reasoning model.
💰 $1.30 in / $7.80 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

88.8%
GPQA Diamond
28.9%
HLE
46.9%
SciCode
95.9%
TAU2-bench
43.9%
TerminalBench-Hard
76.6%
IF-Bench
69.7%
LiveCodeBench Reasoning
51.8
AA Intelligence Index
1458
Chatbot Arena Elo
Major Release

Kimi K2.6

Kimi
Released: 2026-04-20
Type: LLM
Context: 256K tokens
License: Modified MIT
Kimi Kimi K2.6 — AA Intelligence Index 53.9, 256K tokens context, reasoning model.
💰 $0.95 in / $4.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

91.1%
GPQA Diamond
35.9%
HLE
53.5%
SciCode
95.9%
TAU2-bench
43.9%
TerminalBench-Hard
76.0%
IF-Bench
69.7%
LiveCodeBench Reasoning
53.9
AA Intelligence Index
1461
Chatbot Arena Elo
Major Release

Qwen3.6 35B A3B

Alibaba
Released: 2026-04-16
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.6 35B A3B — AA Intelligence Index 43.5, 262K tokens context, reasoning model.
💰 $0.25 in / $1.49 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.1%
GPQA Diamond
20.2%
HLE
35.8%
SciCode
95.3%
TAU2-bench
34.8%
TerminalBench-Hard
64.4%
IF-Bench
63.7%
LiveCodeBench Reasoning
43.5
AA Intelligence Index
Major Release

Claude Opus 4.7

Anthropic
Released: 2026-04-16
Type: LLM
Context: 1M tokens
Knowledge cutoff: 2026-01-01
License: Proprietary
Anthropic Claude Opus 4.7 — AA Intelligence Index 57.3, 1M tokens context, reasoning model.
💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

91.4%
GPQA Diamond
39.6%
HLE
54.5%
SciCode
88.6%
TAU2-bench
51.5%
TerminalBench-Hard
58.6%
IF-Bench
70.3%
LiveCodeBench Reasoning
57.3
AA Intelligence Index
1492
Chatbot Arena Elo
Major Release

EXAONE 4.5 33B

LG AI Research
Released: 2026-04-09
Type: LLM
Context: 262K tokens
License: EXAONE AI Model License Agreement 1.2 - NC
LG AI Research EXAONE 4.5 33B — AA Intelligence Index 30.2, 262K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

79.4%
GPQA Diamond
11.6%
HLE
28.0%
SciCode
78.1%
TAU2-bench
20.5%
TerminalBench-Hard
58.0%
IF-Bench
49.3%
LiveCodeBench Reasoning
30.2
AA Intelligence Index
Major Release

Muse Spark

Meta
Released: 2026-04-08
Type: LLM
Context: 262K tokens
License: Proprietary
Meta Muse Spark — AA Intelligence Index 52.1, 262K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text

📊 Benchmarks

88.4%
GPQA Diamond
39.9%
HLE
51.5%
SciCode
91.5%
TAU2-bench
45.5%
TerminalBench-Hard
75.9%
IF-Bench
69.7%
LiveCodeBench Reasoning
52.2
AA Intelligence Index
1490
Chatbot Arena Elo
Major Release

GLM-5.1

Z AI
Released: 2026-04-07
Type: LLM
Context: 200K tokens
License: Mit
Z AI GLM-5.1 — AA Intelligence Index 51.4, 200K tokens context, reasoning model.
💰 $1.40 in / $4.40 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.8%
GPQA Diamond
28.0%
HLE
43.8%
SciCode
97.7%
TAU2-bench
43.2%
TerminalBench-Hard
76.3%
IF-Bench
62.3%
LiveCodeBench Reasoning
51.4
AA Intelligence Index
1472
Chatbot Arena Elo
Major Release

Grok 4.20 0309 v2

xAI
Released: 2026-04-07
Type: LLM
Context: 2M tokens
License: Proprietary
xAI Grok 4.20 0309 v2 — AA Intelligence Index 49.3, 2M tokens context, reasoning model.
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

91.1%
GPQA Diamond
32.2%
HLE
45.6%
SciCode
93.0%
TAU2-bench
37.9%
TerminalBench-Hard
81.2%
IF-Bench
58.0%
LiveCodeBench Reasoning
49.3
AA Intelligence Index
Major Release

Solar Pro 3

Upstage
Released: 2026-04-06
Type: LLM
Context: 128K tokens
License: Proprietary
Upstage Solar Pro 3 — AA Intelligence Index 25.9, 128K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

72.4%
GPQA Diamond
10.1%
HLE
24.7%
SciCode
86.3%
TAU2-bench
7.6%
TerminalBench-Hard
71.2%
IF-Bench
27.0%
LiveCodeBench Reasoning
25.9
AA Intelligence Index
Major Release

Gemma 4 E4B

Google
Released: 2026-04-03
Type: LLM
Context: 128K tokens
License: Apache 2.0
Google Gemma 4 E4B — AA Intelligence Index 18.8, 128K tokens context, reasoning model.
💰 $0.30 in / $1.25 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

57.6%
GPQA Diamond
3.7%
HLE
24.4%
SciCode
20.8%
TAU2-bench
8.3%
TerminalBench-Hard
44.2%
IF-Bench
30.7%
LiveCodeBench Reasoning
18.8
AA Intelligence Index
Major Release

Qwen3.6 Plus

Alibaba
Released: 2026-04-02
Type: LLM
Context: 1M tokens
License: Proprietary
Alibaba Qwen3.6 Plus — AA Intelligence Index 50.0, 1M tokens context, reasoning model.
💰 $0.50 in / $3.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

88.2%
GPQA Diamond
25.7%
HLE
40.7%
SciCode
97.7%
TAU2-bench
43.9%
TerminalBench-Hard
75.2%
IF-Bench
69.7%
LiveCodeBench Reasoning
50.0
AA Intelligence Index
1446
Chatbot Arena Elo
Major Release

Gemma 4 26B A4B

Google
Released: 2026-04-02
Type: LLM
Context: 256K tokens
License: Apache 2.0
Google Gemma 4 26B A4B — AA Intelligence Index 31.2, 256K tokens context, reasoning model.
💰 $0.13 in / $0.40 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

79.2%
GPQA Diamond
18.3%
HLE
40.0%
SciCode
43.6%
TAU2-bench
13.6%
TerminalBench-Hard
72.4%
IF-Bench
55.7%
LiveCodeBench Reasoning
31.2
AA Intelligence Index
1438
Chatbot Arena Elo
Major Release

Gemma 4 31B

Google
Released: 2026-04-02
Type: LLM
Context: 256K tokens
License: Apache 2.0
Google Gemma 4 31B — AA Intelligence Index 39.2, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

85.7%
GPQA Diamond
22.7%
HLE
43.4%
SciCode
59.9%
TAU2-bench
36.4%
TerminalBench-Hard
75.6%
IF-Bench
62.0%
LiveCodeBench Reasoning
39.2
AA Intelligence Index
1451
Chatbot Arena Elo
Major Release

Gemma 4 E2B

Google
Released: 2026-04-02
Type: LLM
Context: 128K tokens
License: Apache 2.0
Google Gemma 4 E2B — AA Intelligence Index 15.2, 128K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

43.3%
GPQA Diamond
4.8%
HLE
20.9%
SciCode
20.8%
TAU2-bench
3.0%
TerminalBench-Hard
38.0%
IF-Bench
15.0%
LiveCodeBench Reasoning
15.2
AA Intelligence Index
Major Release

Step 3.5 Flash 2603

StepFun
Released: 2026-04-02
Type: LLM
Context: 256K tokens
License: Proprietary
StepFun Step 3.5 Flash 2603 — AA Intelligence Index 38.5, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.6%
GPQA Diamond
22.6%
HLE
38.5%
SciCode
87.4%
TAU2-bench
32.6%
TerminalBench-Hard
66.5%
IF-Bench
54.3%
LiveCodeBench Reasoning
38.5
AA Intelligence Index
1395
Chatbot Arena Elo
Major Release

Trinity Large Thinking

Arcee AI
Released: 2026-04-01
Type: LLM
Context: 512K tokens
License: Apache 2.0
Arcee AI Trinity Large Thinking — AA Intelligence Index 31.9, 512K tokens context, reasoning model.
💰 $0.23 in / $0.88 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.2%
GPQA Diamond
14.7%
HLE
36.1%
SciCode
90.1%
TAU2-bench
22.7%
TerminalBench-Hard
56.3%
IF-Bench
33.0%
LiveCodeBench Reasoning
31.9
AA Intelligence Index
1374
Chatbot Arena Elo
Major Release

GLM 5V Turbo

Z AI
Released: 2026-04-01
Type: LLM
Context: 200K tokens
License: Proprietary
Z AI GLM 5V Turbo — AA Intelligence Index 42.9, 200K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

80.9%
GPQA Diamond
15.8%
HLE
43.5%
SciCode
98.5%
TAU2-bench
32.6%
TerminalBench-Hard
61.1%
IF-Bench
61.0%
LiveCodeBench Reasoning
42.9
AA Intelligence Index
1229
Chatbot Arena Elo

March

Major Release

Qwen3.5 Omni Flash

Alibaba
Released: 2026-03-30
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3.5 Omni Flash — AA Intelligence Index 25.9, 256K tokens context.
💰 $0.10 in / $0.80 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

74.2%
GPQA Diamond
7.1%
HLE
25.5%
SciCode
84.5%
TAU2-bench
8.3%
TerminalBench-Hard
38.0%
IF-Bench
44.0%
LiveCodeBench Reasoning
25.9
AA Intelligence Index
Major Release

Qwen3.5 Omni Plus

Alibaba
Released: 2026-03-30
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3.5 Omni Plus — AA Intelligence Index 38.6, 256K tokens context.
💰 $0.40 in / $4.80 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.6%
GPQA Diamond
13.9%
HLE
40.5%
SciCode
88.3%
TAU2-bench
21.2%
TerminalBench-Hard
51.2%
IF-Bench
52.7%
LiveCodeBench Reasoning
38.6
AA Intelligence Index
Major Release

MiMo-V2-Omni-0327

Xiaomi
Released: 2026-03-27
Type: LLM
Context: 256K tokens
License: Proprietary
Xiaomi MiMo-V2-Omni-0327 — AA Intelligence Index 44.9, 256K tokens context, reasoning model.
💰 $0.40 in / $2.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.5%
GPQA Diamond
20.4%
HLE
39.5%
SciCode
88.0%
TAU2-bench
35.6%
TerminalBench-Hard
67.3%
IF-Bench
63.7%
LiveCodeBench Reasoning
44.9
AA Intelligence Index
Major Release

Nemotron Cascade 2 30B A3B

NVIDIA
Released: 2026-03-19
Type: LLM
Context: 1M tokens
License: Nvidia Open Model License
NVIDIA Nemotron Cascade 2 30B A3B — AA Intelligence Index 28.4, 1M tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.8%
GPQA Diamond
11.4%
HLE
34.8%
SciCode
53.2%
TAU2-bench
21.2%
TerminalBench-Hard
80.4%
IF-Bench
34.0%
LiveCodeBench Reasoning
28.4
AA Intelligence Index
Major Release

MiMo-V2-Omni

Xiaomi
Released: 2026-03-19
Type: LLM
Context: 256K tokens
License: Proprietary
Xiaomi MiMo-V2-Omni — AA Intelligence Index 43.4, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

82.8%
GPQA Diamond
19.9%
HLE
36.7%
SciCode
91.2%
TAU2-bench
34.8%
TerminalBench-Hard
53.5%
IF-Bench
66.7%
LiveCodeBench Reasoning
43.4
AA Intelligence Index
1210
Chatbot Arena Elo
Major Release

MiniMax-M2.7

MiniMax
Released: 2026-03-18
Type: LLM
Context: 204K tokens
License: NON-COMMERCIAL LICENSE
MiniMax MiniMax-M2.7 — AA Intelligence Index 49.6, 204K tokens context, reasoning model.
💰 $0.30 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

87.4%
GPQA Diamond
28.1%
HLE
47.0%
SciCode
84.8%
TAU2-bench
39.4%
TerminalBench-Hard
75.7%
IF-Bench
68.7%
LiveCodeBench Reasoning
49.6
AA Intelligence Index
1408
Chatbot Arena Elo
Major Release

MiMo-V2-Pro

Xiaomi
Released: 2026-03-18
Type: LLM
Context: 1M tokens
License: Proprietary
Xiaomi MiMo-V2-Pro — AA Intelligence Index 49.2, 1M tokens context, reasoning model.
💰 $1.00 in / $3.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

87.0%
GPQA Diamond
28.3%
HLE
42.5%
SciCode
95.0%
TAU2-bench
40.9%
TerminalBench-Hard
68.8%
IF-Bench
60.7%
LiveCodeBench Reasoning
49.2
AA Intelligence Index
1447
Chatbot Arena Elo
Major Release

GPT-5.4 mini

OpenAI
Released: 2026-03-17
Type: LLM
Context: 400K tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.4 mini — AA Intelligence Index 48.9, 400K tokens context, reasoning model.
💰 $0.75 in / $4.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

87.5%
GPQA Diamond
26.6%
HLE
49.9%
SciCode
83.3%
TAU2-bench
52.3%
TerminalBench-Hard
73.3%
IF-Bench
69.3%
LiveCodeBench Reasoning
48.9
AA Intelligence Index
1455
Chatbot Arena Elo
Major Release

GPT-5.4 nano

OpenAI
Released: 2026-03-17
Type: LLM
Context: 400K tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.4 nano — AA Intelligence Index 44.0, 400K tokens context, reasoning model.
💰 $0.20 in / $1.25 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

81.7%
GPQA Diamond
26.5%
HLE
46.9%
SciCode
76.0%
TAU2-bench
42.4%
TerminalBench-Hard
75.9%
IF-Bench
66.0%
LiveCodeBench Reasoning
44.0
AA Intelligence Index
1407
Chatbot Arena Elo
Major Release

Mistral Small 4

Mistral
Released: 2026-03-16
Type: LLM
Context: 256K tokens
License: Apache 2.0
Mistral Mistral Small 4 — AA Intelligence Index 27.8, 256K tokens context, reasoning model.
💰 $0.15 in / $0.60 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

76.9%
GPQA Diamond
9.5%
HLE
38.0%
SciCode
41.2%
TAU2-bench
17.4%
TerminalBench-Hard
48.2%
IF-Bench
44.7%
LiveCodeBench Reasoning
27.8
AA Intelligence Index
Major Release

NVIDIA Nemotron 3 Nano 4B

NVIDIA
Released: 2026-03-16
Type: LLM
Context: 262K tokens
License: Nvidia Nemotron Open Model License
NVIDIA NVIDIA Nemotron 3 Nano 4B — AA Intelligence Index 14.7, 262K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

51.3%
GPQA Diamond
4.8%
HLE
16.4%
SciCode
28.1%
TAU2-bench
6.8%
TerminalBench-Hard
58.2%
IF-Bench
16.7%
LiveCodeBench Reasoning
14.7
AA Intelligence Index
Major Release

GLM-5-Turbo

Z AI
Released: 2026-03-15
Type: LLM
Context: 200K tokens
License: Proprietary
Z AI GLM-5-Turbo — AA Intelligence Index 46.8, 200K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.7%
GPQA Diamond
25.4%
HLE
43.6%
SciCode
98.5%
TAU2-bench
33.3%
TerminalBench-Hard
73.2%
IF-Bench
60.7%
LiveCodeBench Reasoning
46.8
AA Intelligence Index
Major Release

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA
Released: 2026-03-11
Type: LLM
Context: 1M tokens
License: Nvidia Nemotron Open Model License
NVIDIA NVIDIA Nemotron 3 Super 120B A12B — AA Intelligence Index 36.0, 1M tokens context, reasoning model.
💰 $0.30 in / $0.75 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

80.0%
GPQA Diamond
19.2%
HLE
36.0%
SciCode
67.8%
TAU2-bench
28.8%
TerminalBench-Hard
71.5%
IF-Bench
60.0%
LiveCodeBench Reasoning
36.0
AA Intelligence Index
1361
Chatbot Arena Elo
Major Release

Grok 4.20 0309

xAI
Released: 2026-03-10
Type: LLM
Context: 2M tokens
License: Proprietary
xAI Grok 4.20 0309 — AA Intelligence Index 48.5, 2M tokens context, reasoning model.
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

88.5%
GPQA Diamond
30.0%
HLE
44.7%
SciCode
96.5%
TAU2-bench
40.9%
TerminalBench-Hard
82.9%
IF-Bench
59.0%
LiveCodeBench Reasoning
48.5
AA Intelligence Index
Major Release

Sarvam 105B

Sarvam
Released: 2026-03-06
Type: LLM
Context: 128K tokens
License: Apache 2.0
Sarvam Sarvam 105B — AA Intelligence Index 18.2, 128K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

73.8%
GPQA Diamond
10.1%
HLE
26.4%
SciCode
46.8%
TAU2-bench
1.5%
TerminalBench-Hard
34.4%
IF-Bench
0.0%
LiveCodeBench Reasoning
18.2
AA Intelligence Index
Major Release

Sarvam 30B

Sarvam
Released: 2026-03-06
Type: LLM
Context: 65K tokens
License: Apache 2.0
Sarvam Sarvam 30B — AA Intelligence Index 12.3, 65K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

63.3%
GPQA Diamond
7.0%
HLE
19.2%
SciCode
34.5%
TAU2-bench
2.3%
TerminalBench-Hard
26.5%
IF-Bench
0.0%
LiveCodeBench Reasoning
12.3
AA Intelligence Index
Major Release

GPT-5.4

OpenAI
Released: 2026-03-05
Type: LLM
Context: 1M tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.4 — AA Intelligence Index 56.8, 1M tokens context, reasoning model.
💰 $2.50 in / $15.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

92.0%
GPQA Diamond
41.6%
HLE
56.6%
SciCode
87.1%
TAU2-bench
57.6%
TerminalBench-Hard
73.9%
IF-Bench
74.0%
LiveCodeBench Reasoning
56.8
AA Intelligence Index
1467
Chatbot Arena Elo
Major Release

Gemini 3.1 Flash-Lite Preview

Google
Released: 2026-03-03
Type: LLM
Context: 1M tokens
Knowledge cutoff: 2025-01-01
License: Proprietary
Google Gemini 3.1 Flash-Lite Preview — AA Intelligence Index 33.5, 1M tokens context, reasoning model.
💰 $0.25 in / $1.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.2%
GPQA Diamond
16.2%
HLE
41.9%
SciCode
31.3%
TAU2-bench
24.2%
TerminalBench-Hard
77.2%
IF-Bench
65.3%
LiveCodeBench Reasoning
33.5
AA Intelligence Index
1438
Chatbot Arena Elo
Major Release

Qwen3.5 0.8B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 0.8B — AA Intelligence Index 10.5, 262K tokens context, reasoning model.
💰 $0.01 in / $0.05 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

11.1%
GPQA Diamond
1.2%
HLE
0.0%
SciCode
47.7%
TAU2-bench
0.0%
TerminalBench-Hard
21.5%
IF-Bench
5.3%
LiveCodeBench Reasoning
10.5
AA Intelligence Index
Major Release

Qwen3.5 2B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 2B — AA Intelligence Index 16.3, 262K tokens context, reasoning model.
💰 $0.02 in / $0.10 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

45.6%
GPQA Diamond
2.1%
HLE
2.8%
SciCode
69.0%
TAU2-bench
3.8%
TerminalBench-Hard
31.5%
IF-Bench
23.7%
LiveCodeBench Reasoning
16.3
AA Intelligence Index
Major Release

Qwen3.5 4B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 4B — AA Intelligence Index 27.1, 262K tokens context, reasoning model.
💰 $0.03 in / $0.15 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

77.1%
GPQA Diamond
7.8%
HLE
16.1%
SciCode
92.1%
TAU2-bench
18.2%
TerminalBench-Hard
52.0%
IF-Bench
55.7%
LiveCodeBench Reasoning
27.1
AA Intelligence Index
Major Release

Qwen3.5 9B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 9B — AA Intelligence Index 32.4, 262K tokens context, reasoning model.
💰 $0.10 in / $0.15 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

80.6%
GPQA Diamond
13.3%
HLE
27.5%
SciCode
86.8%
TAU2-bench
24.2%
TerminalBench-Hard
66.7%
IF-Bench
59.0%
LiveCodeBench Reasoning
32.4
AA Intelligence Index

February

Major Release

LFM2 24B A2B

Liquid AI
Released: 2026-02-25
Type: LLM
Context: 32K tokens
License: lfm 1.0
Liquid AI LFM2 24B A2B — AA Intelligence Index 10.5, 32K tokens context.
💰 $0.03 in / $0.12 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

47.4%
GPQA Diamond
4.4%
HLE
10.9%
SciCode
11.1%
TAU2-bench
0.0%
TerminalBench-Hard
45.9%
IF-Bench
0.0%
LiveCodeBench Reasoning
10.5
AA Intelligence Index
Major Release

Qwen3.5 122B A10B

Alibaba
Released: 2026-02-24
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 122B A10B — AA Intelligence Index 41.6, 262K tokens context, reasoning model.
💰 $0.40 in / $3.20 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.7%
GPQA Diamond
23.4%
HLE
42.0%
SciCode
93.6%
TAU2-bench
31.1%
TerminalBench-Hard
75.7%
IF-Bench
66.7%
LiveCodeBench Reasoning
41.6
AA Intelligence Index
1419
Chatbot Arena Elo
Major Release

Qwen3.5 27B

Alibaba
Released: 2026-02-24
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 27B — AA Intelligence Index 42.1, 262K tokens context, reasoning model.
💰 $0.30 in / $2.40 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.8%
GPQA Diamond
22.2%
HLE
39.5%
SciCode
93.9%
TAU2-bench
32.6%
TerminalBench-Hard
75.6%
IF-Bench
67.3%
LiveCodeBench Reasoning
42.1
AA Intelligence Index
1408
Chatbot Arena Elo
Major Release

Qwen3.5 35B A3B

Alibaba
Released: 2026-02-24
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 35B A3B — AA Intelligence Index 37.1, 262K tokens context, reasoning model.
💰 $0.25 in / $2.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.5%
GPQA Diamond
19.7%
HLE
37.7%
SciCode
89.2%
TAU2-bench
26.5%
TerminalBench-Hard
72.5%
IF-Bench
62.7%
LiveCodeBench Reasoning
37.1
AA Intelligence Index
1398
Chatbot Arena Elo
Major Release

Mercury 2

Inception
Released: 2026-02-20
Type: LLM
Context: 128K tokens
License: Proprietary
Inception Mercury 2 — AA Intelligence Index 32.8, 128K tokens context, reasoning model.
💰 $0.25 in / $0.75 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

77.0%
GPQA Diamond
15.5%
HLE
38.7%
SciCode
70.8%
TAU2-bench
26.5%
TerminalBench-Hard
69.8%
IF-Bench
36.3%
LiveCodeBench Reasoning
32.8
AA Intelligence Index
1347
Chatbot Arena Elo
Major Release

Gemini 3.1 Pro

Google
Released: 2026-02-19
Type: LLM
Size: ~1T MoE
Architecture: Sparse Mixture-of-Experts (MoE)
Context: 2M tokens
Knowledge cutoff: 2025-12
License: Proprietary
Google's flagship reasoning model with a 2x jump on hard multi-step tasks. — Google's latest flagship model with a major 2x jump in reasoning capabilities
💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • 2x reasoning improvement
  • ARC-AGI-2 score of 77.1%
  • Enhanced multimodal understanding
  • Deep Think mode

📊 Benchmarks

77.1%
ARC-AGI-2
93.8%
MMLU
89.4%
MATH
93.8%
MMLU-Pro
84.2%
GPQA Diamond
72.3%
SWE-bench Verified
78.9%
LiveCodeBench

🔄 What's new vs previous version

  • 2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
  • Context window expanded to 2M tokens
  • Deep Think mode enabled by default on the Pro tier
  • Lower latency on first-token despite larger context
Major Release

Gemini 3.1 Pro Preview

Google
Released: 2026-02-19
Type: LLM
Context: 1M tokens
License: Proprietary
Google Gemini 3.1 Pro Preview — AA Intelligence Index 57.2, 1M tokens context, reasoning model.
💰 $2.00 in / $12.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

94.1%
GPQA Diamond
44.7%
HLE
58.9%
SciCode
95.6%
TAU2-bench
53.8%
TerminalBench-Hard
77.1%
IF-Bench
72.7%
LiveCodeBench Reasoning
57.2
AA Intelligence Index
1489
Chatbot Arena Elo
Major Release

Claude Sonnet 4.6

Anthropic
Released: 2026-02-17
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 500K tokens
Knowledge cutoff: 2025-10
License: Proprietary
Near-Opus quality at a fraction of the cost, with Agent Teams orchestration. — Anthropic's latest Sonnet with Agent Teams capability and near-Opus performance at a fraction of the cost
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Agent Teams: orchestrate 2-16 Claude instances
  • Near-Opus performance at 1/5th cost
  • 80.8% SWE-bench Verified
  • Fast mode research preview

📊 Benchmarks

80.8%
SWE-bench
92.1%
MMLU
95.2%
HumanEval
80.8%
SWE-bench Verified
79.7%
GPQA Diamond
88.5%
AIME 2025
71.2%
TAU-bench
10.8%
HLE
44.1%
SciCode
78.9%
TAU2-bench
42.4%
TerminalBench-Hard
42.4%
IF-Bench
58.7%
LiveCodeBench Reasoning
42.6
AA Intelligence Index
1468
Chatbot Arena Elo

🔄 What's new vs previous version

  • Agent Teams: orchestrate 2–16 Claude instances in parallel
  • +8.5pt on SWE-bench Verified vs Sonnet 4
  • 1/5 the cost of Opus 4.5 at ~95% of coding quality
  • Fast mode research preview for lower-latency inference
Major Release

Tiny Aya Global

Cohere
Released: 2026-02-17
Type: LLM
Context: 8K tokens
License: Cc By Nc 4.0
Cohere Tiny Aya Global — AA Intelligence Index 4.7, 8K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

30.5%
GPQA Diamond
5.2%
HLE
3.6%
SciCode
0.0%
TAU2-bench
0.0%
TerminalBench-Hard
20.1%
IF-Bench
0.0%
LiveCodeBench Reasoning
4.7
AA Intelligence Index
Major Release

Qwen3.5 397B A17B

Alibaba
Released: 2026-02-16
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 397B A17B — AA Intelligence Index 45.0, 262K tokens context, reasoning model.
💰 $0.60 in / $3.60 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

89.3%
GPQA Diamond
27.3%
HLE
42.0%
SciCode
95.6%
TAU2-bench
40.9%
TerminalBench-Hard
78.8%
IF-Bench
65.7%
LiveCodeBench Reasoning
45.0
AA Intelligence Index
1445
Chatbot Arena Elo
Update

DeepSeek V3.2

DeepSeek
Released: 2026-02-12
Type: LLM
Size: 671B MoE
Architecture: Sparse MoE (37B active / 671B total)
Context: 1M tokens
Knowledge cutoff: 2025-09
License: DeepSeek License (open weights, commercial OK)
Open-weight MoE with a 1M+ token context window and strong coding. — Major update with 10x context window expansion to over 1 million tokens
💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI · Fireworks AI

✨ Key Features

  • 1M+ token context window (10x expansion)
  • Improved reasoning capabilities
  • Open source release
  • Cost-effective inference

📊 Benchmarks

90.1%
MMLU
92.5%
HumanEval
1M+ tokens
Context Window
86.2%
MMLU-Pro
84.0%
GPQA Diamond
85.6%
MATH
68.4%
GPQA
86.2%
LiveCodeBench
22.2%
HLE
38.9%
SciCode
90.6%
TAU2-bench
35.6%
TerminalBench-Hard
60.7%
IF-Bench
65.0%
LiveCodeBench Reasoning
41.7
AA Intelligence Index
1424
Chatbot Arena Elo

🔄 What's new vs previous version

  • 10x context window expansion (128K → 1M+ tokens)
  • Sliding-window attention for long-context throughput
  • Improved chain-of-thought reasoning
  • Native FP8 inference support
Major Release

MiniMax-M2.5

MiniMax
Released: 2026-02-12
Type: LLM
Context: 204K tokens
License: MIT
MiniMax MiniMax-M2.5 — AA Intelligence Index 41.9, 204K tokens context, reasoning model.
💰 $0.30 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.8%
GPQA Diamond
19.1%
HLE
42.6%
SciCode
95.3%
TAU2-bench
34.8%
TerminalBench-Hard
71.6%
IF-Bench
66.0%
LiveCodeBench Reasoning
41.9
AA Intelligence Index
1394
Chatbot Arena Elo
Major Release

GLM-5

Zhipu AI
Released: 2026-02-11
Type: LLM
Size: 744B
Architecture: Dense Transformer (744B)
Context: 200K tokens
Knowledge cutoff: 2025-11
License: Proprietary (open weights for non-frontier sizes)
First frontier model trained entirely on Huawei Ascend silicon. — First frontier AI model trained entirely without NVIDIA GPUs, using Huawei Ascend chips
💰 $0.11 in / $0.28 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Zhipu BigModel API

✨ Key Features

  • First frontier model trained on Huawei Ascend chips (no NVIDIA)
  • #1 HLE score (50.4%)
  • 1.2% hallucination rate via Slime RL
  • 136x cheaper than Claude Opus 4.5

📊 Benchmarks

50.4%
HLE
1.2%
Hallucination Rate
$0.11/M tokens
Cost
88.7%
MMLU
92.1%
C-Eval
94.8%
GSM8K

🔄 What's new vs previous version

  • Trained entirely on Huawei Ascend 910B clusters (no NVIDIA)
  • Slime RL fine-tuning drops hallucination rate to 1.2%
  • 136x cheaper than Claude Opus 4.5 at comparable quality
Major Release

Nanbeige4.1-3B

Nanbeige
Released: 2026-02-11
Type: LLM
Context: 256K tokens
License: Apache 2.0
Nanbeige Nanbeige4.1-3B — AA Intelligence Index 16.1, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.9%
GPQA Diamond
10.0%
HLE
26.6%
SciCode
21.6%
TAU2-bench
0.0%
TerminalBench-Hard
35.4%
IF-Bench
0.0%
LiveCodeBench Reasoning
16.1
AA Intelligence Index
Major Release

GLM-5

Z AI
Released: 2026-02-11
Type: LLM
Context: 200K tokens
License: MIT
Z AI GLM-5 — AA Intelligence Index 49.8, 200K tokens context, reasoning model.
💰 $1.00 in / $3.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.0%
GPQA Diamond
27.2%
HLE
46.2%
SciCode
98.2%
TAU2-bench
43.2%
TerminalBench-Hard
72.3%
IF-Bench
63.3%
LiveCodeBench Reasoning
49.8
AA Intelligence Index
1456
Chatbot Arena Elo
Major Release

GPT-5.3 Codex

OpenAI
Released: 2026-02-05
Type: Code
Size: ~200B
Architecture: MoE (coding-specialized fine-tune)
Context: 400K tokens
Knowledge cutoff: 2025-11
License: Proprietary
Coding-specialized variant of GPT-5.3, tuned for agentic IDE workflows. — OpenAI's specialized self-improving coding model with state-of-the-art software engineering performance
💰 $1.25 in / $10.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI · GitHub Copilot

✨ Key Features

  • Self-improving agentic coding
  • 25% faster than GPT-5.2-Codex
  • 1,000+ tokens/sec generation
  • First OpenAI model flagged 'high' on cybersecurity framework

📊 Benchmarks

77.3%
Terminal-Bench
SOTA
SWE-Bench Pro
1,000+ tok/s
Speed
82.4%
SWE-bench Verified
96.8%
HumanEval
91.5%
GPQA Diamond
84.2%
LiveCodeBench
79.5%
Aider Polyglot
39.9%
HLE
53.2%
SciCode
86.0%
TAU2-bench
53.0%
TerminalBench-Hard
75.4%
IF-Bench
74.0%
LiveCodeBench Reasoning
53.6
AA Intelligence Index
1406
Chatbot Arena Elo

🔄 What's new vs previous version

  • +4pt on SWE-bench Verified vs GPT-5.2 Codex
  • Native IDE tool-calling at reduced latency
  • Extended max output to 100K for multi-file patches
Major Release

Claude Opus 4.6

Anthropic
Released: 2026-02-05
Type: LLM
Context: 1M tokens
License: Proprietary
Anthropic Claude Opus 4.6 — AA Intelligence Index 46.5, 1M tokens context.
💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.0%
GPQA Diamond
18.6%
HLE
45.7%
SciCode
84.8%
TAU2-bench
48.5%
TerminalBench-Hard
44.6%
IF-Bench
58.3%
LiveCodeBench Reasoning
46.5
AA Intelligence Index
1498
Chatbot Arena Elo
Major Release

Qwen3 Coder Next

Alibaba
Released: 2026-02-03
Type: LLM
Context: 256K tokens
License: Apache 2.0
Alibaba Qwen3 Coder Next — AA Intelligence Index 28.3, 256K tokens context.
💰 $0.35 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

73.7%
GPQA Diamond
9.3%
HLE
32.3%
SciCode
79.5%
TAU2-bench
18.2%
TerminalBench-Hard
35.2%
IF-Bench
40.0%
LiveCodeBench Reasoning
28.3
AA Intelligence Index
Major Release

Step 3.5 Flash

StepFun
Released: 2026-02-02
Type: LLM
Context: 256K tokens
License: Apache 2.0
StepFun Step 3.5 Flash — AA Intelligence Index 37.8, 256K tokens context, reasoning model.
💰 $0.10 in / $0.30 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

83.1%
GPQA Diamond
19.1%
HLE
40.4%
SciCode
94.4%
TAU2-bench
27.3%
TerminalBench-Hard
64.6%
IF-Bench
43.0%
LiveCodeBench Reasoning
37.8
AA Intelligence Index
1395
Chatbot Arena Elo

January

Major Release

LongCat Flash Lite

LongCat
Released: 2026-01-28
Type: LLM
Context: 256K tokens
License: MIT
LongCat LongCat Flash Lite — AA Intelligence Index 23.9, 256K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

63.6%
GPQA Diamond
6.0%
HLE
28.4%
SciCode
79.5%
TAU2-bench
10.6%
TerminalBench-Hard
43.1%
IF-Bench
25.7%
LiveCodeBench Reasoning
23.9
AA Intelligence Index
Major Release

Kimi K2.5

Kimi
Released: 2026-01-27
Type: LLM
Context: 256K tokens
License: Modified MIT License
Kimi Kimi K2.5 — AA Intelligence Index 46.8, 256K tokens context, reasoning model.
💰 $0.54 in / $2.92 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

87.9%
GPQA Diamond
29.4%
HLE
49.0%
SciCode
95.9%
TAU2-bench
34.8%
TerminalBench-Hard
70.2%
IF-Bench
65.3%
LiveCodeBench Reasoning
46.8
AA Intelligence Index
1449
Chatbot Arena Elo
Major Release

Qwen3 Max Thinking

Alibaba
Released: 2026-01-26
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3 Max Thinking — AA Intelligence Index 39.9, 256K tokens context, reasoning model.
💰 $1.20 in / $6.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

77.6%
GPQA Diamond
12.0%
HLE
38.7%
SciCode
83.6%
TAU2-bench
17.4%
TerminalBench-Hard
53.8%
IF-Bench
57.7%
LiveCodeBench Reasoning
32.5
AA Intelligence Index
82.4%
MMLU-Pro
53.5%
LiveCodeBench
Major Release

Kimi K2

Moonshot AI
Released: 2026-01-20
Type: LLM
Size: 1.04T MoE
Architecture: MoE (32B active / ~1T total)
Context: 2M tokens
Knowledge cutoff: 2025-10
License: Modified MIT (open weights)
Moonshot's open-weight frontier MoE with strong agentic benchmarks. — First open-weight model to rank #1 on LMSYS Chatbot Arena with over 1 trillion parameters
💰 $0.15 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Moonshot API · Hugging Face · Together AI

✨ Key Features

  • First open-weight model #1 on LMSYS Chatbot Arena
  • 1.04 trillion parameters
  • K2.5 agent swarms with up to 100 sub-agents
  • $0.15/M input tokens

📊 Benchmarks

#1
LMSYS Arena
1.04T
Parameters
$0.15/M tokens
Cost
91.3%
MMLU
65.8%
SWE-bench Verified
74.1%
GPQA Diamond
68.9%
LiveCodeBench

🔄 What's new vs previous version

  • 2M token context window (20x vs first Kimi)
  • Agentic tool-use tuning via MuonClip optimizer
  • Open weights under modified MIT
Major Release

LFM2.5-1.2B-Thinking

Liquid AI
Released: 2026-01-20
Type: LLM
Context: 32K tokens
License: lfm 1.0
Liquid AI LFM2.5-1.2B-Thinking — AA Intelligence Index 8.1, 32K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

33.9%
GPQA Diamond
6.1%
HLE
4.2%
SciCode
19.6%
TAU2-bench
0.0%
TerminalBench-Hard
41.8%
IF-Bench
0.0%
LiveCodeBench Reasoning
8.1
AA Intelligence Index
Major Release

Step3 VL 10B

StepFun
Released: 2026-01-20
Type: LLM
Context: 65K tokens
License: Apache 2.0
StepFun Step3 VL 10B — AA Intelligence Index 15.4, 65K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

69.0%
GPQA Diamond
10.2%
HLE
31.1%
SciCode
16.1%
TAU2-bench
5.3%
TerminalBench-Hard
50.2%
IF-Bench
0.0%
LiveCodeBench Reasoning
15.5
AA Intelligence Index
Major Release

GLM-4.7-Flash

Z AI
Released: 2026-01-19
Type: LLM
Context: 200K tokens
License: MIT
Z AI GLM-4.7-Flash — AA Intelligence Index 30.1, 200K tokens context, reasoning model.
💰 $0.07 in / $0.40 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

58.1%
GPQA Diamond
7.1%
HLE
33.7%
SciCode
98.8%
TAU2-bench
22.0%
TerminalBench-Hard
60.8%
IF-Bench
35.0%
LiveCodeBench Reasoning
30.1
AA Intelligence Index
1368
Chatbot Arena Elo
Major Release

LFM2.5-1.2B-Instruct

Liquid AI
Released: 2026-01-05
Type: LLM
Context: 32K tokens
License: lfm 1.0
Liquid AI LFM2.5-1.2B-Instruct — AA Intelligence Index 8.0, 32K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

32.6%
GPQA Diamond
6.8%
HLE
2.3%
SciCode
10.8%
TAU2-bench
0.0%
TerminalBench-Hard
43.8%
IF-Bench
0.0%
LiveCodeBench Reasoning
8.0
AA Intelligence Index

2025

December

Update

GPT-5.2 Codex

OpenAI
Released: 2025-12-18
Type: Code
Size: ~200B
Architecture: MoE (coding fine-tune)
Context: 256K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Prior-gen Codex variant of GPT-5.2 for agentic coding. — Specialized coding variant of GPT-5.2 focused on software engineering tasks
💰 $1.50 in / $12.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Specialized for software engineering
  • Enhanced agentic coding
  • Multi-file refactoring
  • Advanced debugging capabilities

📊 Benchmarks

SOTA
SWE-Bench
95.1%
HumanEval
72.8%
Terminal-Bench
78.2%
SWE-bench Verified
80.4%
LiveCodeBench
Major Release

Mistral Large 3

Mistral
Released: 2025-12-15
Type: LLM
Size: ~123B
Architecture: Dense Transformer
Context: 256K tokens
Knowledge cutoff: 2025-07
License: Mistral Commercial License
Mistral's flagship proprietary model, tuned for European enterprise. — Mistral's flagship model competing with GPT-5 class models at a fraction of the cost
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Mistral La Plateforme · Azure · AWS Bedrock

✨ Key Features

  • 128K context window
  • Improved multilingual capabilities
  • Enhanced function calling
  • Competitive with GPT-5 class models

📊 Benchmarks

89.4%
MMLU
91.2%
HumanEval
82.1%
MATH
80.7%
MMLU-Pro
68.0%
GPQA Diamond
76.8%
MMMU
46.5%
LiveCodeBench
4.1%
HLE
36.2%
SciCode
24.6%
TAU2-bench
15.9%
TerminalBench-Hard
36.2%
IF-Bench
34.7%
LiveCodeBench Reasoning
22.8
AA Intelligence Index
1415
Chatbot Arena Elo
Update

GPT-5.2

OpenAI
Released: 2025-12-11
Type: LLM
Size: ~200B
Architecture: MoE
Context: 400K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Late-2025 GPT-5 refresh with improved reasoning and steerability. — Iterative improvement on GPT-5.1 with enhanced reasoning and faster performance
💰 $2.00 in / $10.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Enhanced reasoning capabilities
  • Improved adaptive reasoning
  • Better multimodal understanding
  • Faster inference

📊 Benchmarks

92.8%
MMLU
88.5%
MATH
95.8%
HumanEval
87.4%
MMLU-Pro
72.5%
SWE-bench Verified
90.3%
GPQA Diamond
92.1%
AIME 2025
88.9%
LiveCodeBench
35.4%
HLE
52.1%
SciCode
84.8%
TAU2-bench
47.0%
TerminalBench-Hard
75.4%
IF-Bench
72.7%
LiveCodeBench Reasoning
51.3
AA Intelligence Index
1440
Chatbot Arena Elo

November

Major Release

Claude Opus 4.5

Anthropic
Released: 2025-11-24
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 500K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Anthropic's top-tier reasoning model for complex research and agents. — Anthropic's most capable model with breakthrough coding performance and major price reduction
💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • First model to break 80.9% on SWE-Bench Verified
  • 67% price reduction vs previous Opus
  • Extended reasoning capabilities
  • Advanced coding performance

📊 Benchmarks

80.9%
SWE-bench
92.8%
MMLU
95.0%
HumanEval
89.5%
MMLU-Pro
78.9%
SWE-bench Verified
86.6%
GPQA Diamond
90.5%
AIME 2025
87.1%
LiveCodeBench
28.4%
HLE
49.5%
SciCode
89.5%
TAU2-bench
47.0%
TerminalBench-Hard
58.0%
IF-Bench
74.0%
LiveCodeBench Reasoning
49.7
AA Intelligence Index
1467
Chatbot Arena Elo
Major Release

Gemini 3 Pro

Google
Released: 2025-11-18
Type: Multimodal
Size: ~1T MoE
Architecture: Sparse MoE
Context: 1M tokens
Knowledge cutoff: 2025-09
License: Proprietary
First Gemini 3 tier release; strong multimodal + long-context. — Google's flagship model with Deep Think mode, ranked #1 on LMSYS Arena at launch
💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • 1M token context window
  • Deep Think reasoning mode
  • Solved 5/6 IMO 2025 problems
  • #1 on LMSYS Arena

📊 Benchmarks

87.5%
ARC-AGI
93.2%
MMLU
#1
LMSYS Arena
89.4%
MMLU-Pro
82.1%
MMMU
78.5%
GPQA Diamond
68.2%
SWE-bench Verified
Major Release

GPT-5.1

OpenAI
Released: 2025-11-12
Type: LLM
Size: ~200B
Architecture: MoE
Context: 400K tokens
Knowledge cutoff: 2025-06
License: Proprietary
Maintenance update to GPT-5 with steerability + latency improvements. — Major GPT-5 iteration with adaptive reasoning and perfect scores on math competitions
💰 $2.25 in / $11.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Adaptive reasoning modes
  • Perfect 100% on AIME 2025
  • 87.5% on ARC-AGI
  • Enhanced multimodal capabilities

📊 Benchmarks

87.5%
ARC-AGI
100%
AIME 2025
92.5%
MMLU
87.0%
MMLU-Pro
70.1%
SWE-bench Verified
87.3%
GPQA Diamond
86.8%
LiveCodeBench
26.5%
HLE
43.3%
SciCode
81.9%
TAU2-bench
45.5%
TerminalBench-Hard
72.9%
IF-Bench
75.0%
LiveCodeBench Reasoning
47.7
AA Intelligence Index
1439
Chatbot Arena Elo

August

Major Release

GPT-5

OpenAI
Released: 2025-08-15
Type: LLM
Size: ~200B
Architecture: MoE with unified reasoning router
Context: 400K tokens
Knowledge cutoff: 2025-05
License: Proprietary
OpenAI's flagship unified reasoning + chat model replacing the GPT-4 line. — OpenAI's next-generation flagship model with adaptive reasoning capabilities
💰 $2.50 in / $12.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Adaptive reasoning (routes between quick and deep thinking)
  • Improved math and coding
  • Enhanced multimodal reasoning
  • New safety architecture

📊 Benchmarks

91.0%
MMLU
95.1%
HumanEval
90.1%
MATH
80.6%
MMLU-Pro
67.4%
SWE-bench Verified
67.3%
GPQA Diamond
55.8%
LiveCodeBench
86.1%
MATH-500
36.7%
AIME 2025
5.4%
HLE
38.8%
SciCode
67.0%
TAU2-bench
18.2%
TerminalBench-Hard
45.6%
IF-Bench
25.0%
LiveCodeBench Reasoning
23.9
AA Intelligence Index
1434
Chatbot Arena Elo

July

Update

Claude Opus 4.1

Anthropic
Released: 2025-07-15
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2025-03
License: Proprietary
Mid-2025 Opus refresh focused on agentic coding reliability. — Iterative improvement on Claude Opus 4 with enhanced multi-file refactoring
💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Improved multi-file refactoring
  • Enhanced agentic capabilities
  • Better long-context performance
  • Reduced hallucinations

📊 Benchmarks

75.2%
SWE-bench
91.2%
MMLU
94.0%
HumanEval
74.5%
SWE-bench Verified
79.1%
GPQA Diamond

June

Update

Gemini 2.5 Flash

Google
Released: 2025-06-20
Type: Multimodal
Size: ~175B
Architecture: Dense multimodal transformer
Context: 1M tokens
Knowledge cutoff: 2025-01
License: Proprietary
Google's cost-optimized multimodal model with thinking mode. — Google's fast and cost-effective model with enhanced image capabilities
💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • Enhanced image editing stabilization
  • Faster inference
  • Improved multimodal understanding
  • Cost-effective deployment

📊 Benchmarks

87.5%
MMLU
2x Gemini 2.0 Flash
Speed
High
Image Quality
83.2%
MMLU-Pro
96.2%
HumanEval
79.0%
GPQA Diamond
79.7%
MMMU
69.5%
LiveCodeBench
98.1%
MATH-500
82.3%
AIME 2025
11.1%
HLE
39.4%
SciCode
31.6%
TAU2-bench
13.6%
TerminalBench-Hard
50.3%
IF-Bench
61.7%
LiveCodeBench Reasoning
27.0
AA Intelligence Index
1411
Chatbot Arena Elo

May

Major Release

Claude Sonnet 4

Anthropic
Released: 2025-05-22
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2025-03
License: Proprietary
Claude 4 mid-tier with strong coding and long-horizon agentic reliability. — Latest generation Claude model with significant performance improvements
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Enhanced reasoning capabilities
  • Improved safety measures
  • Advanced multimodal understanding
  • Extended context window

📊 Benchmarks

88.7%
MMLU
94.5%
HumanEval
76.8%
MATH
72.3%
SWE-bench Verified
74.0%
GPQA Diamond

February

Update

Claude Sonnet 3.7

Anthropic
Released: 2025-02-24
Type: LLM
Size: ~300B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2024-11
License: Proprietary
Extended-thinking update to Sonnet 3.5 with visible reasoning toggle. — Iterative improvement on Claude 3.5 with enhanced capabilities
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Improved reasoning
  • Better code generation
  • Enhanced safety
  • Reduced hallucinations

📊 Benchmarks

86.1%
MMLU
93.2%
HumanEval
74.1%
MATH
62.3%
SWE-bench Verified
68.3%
GPQA Diamond

2024

December

Major Release

DeepSeek-V3

DeepSeek
Released: 2024-12-26
Type: LLM
Size: 671B
Architecture: Sparse MoE (37B active / 671B total)
Context: 128K tokens
Knowledge cutoff: 2024-07
License: DeepSeek License (open weights)
DeepSeek's breakthrough open-weight MoE rivaling GPT-4-class quality. — DeepSeek's most advanced open-source model with MoE architecture
💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI

✨ Key Features

  • Mixture of Experts architecture
  • Cost-effective training
  • Open source release
  • Strong reasoning capabilities

📊 Benchmarks

88.5%
MMLU
90.6%
HumanEval
61.6%
MATH
75.2%
MMLU-Pro
55.7%
GPQA Diamond
35.9%
LiveCodeBench
88.7%
MATH-500
25.3%
AIME 2025
3.6%
HLE
35.4%
SciCode
22.8%
TAU2-bench
6.8%
TerminalBench-Hard
34.8%
IF-Bench
29.0%
LiveCodeBench Reasoning
16.5
AA Intelligence Index
1358
Chatbot Arena Elo
Major Release

Gemini 2.0 Flash

Google
Released: 2024-12-11
Type: Multimodal
Size: ~175B
Architecture: Dense multimodal transformer
Context: 1M tokens
Knowledge cutoff: 2024-08
License: Proprietary
First Gemini 2 model — fast, cheap, multimodal, with tool use native. — Google's next-generation model with native multimodal capabilities
💰 $0.10 in / $0.40 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text, image, audio 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • Native multimodal generation
  • Real-time API
  • Agentic capabilities
  • Enhanced speed

📊 Benchmarks

85.8%
MMLU
90.7%
HumanEval
58.8%
MATH
78.2%
MMLU-Pro
63.6%
GPQA Diamond
70.7%
MMMU
21.0%
LiveCodeBench
91.1%
MATH-500
30.0%
AIME 2025
4.7%
HLE
34.0%
SciCode
29.5%
TAU2-bench
3.8%
TerminalBench-Hard
40.2%
IF-Bench
28.3%
LiveCodeBench Reasoning
16.8
AA Intelligence Index

August

Major Release

Grok-2

xAI
Released: 2024-08-13
Type: LLM
Size: ~314B
Architecture: Transformer
Context: 128K tokens
Knowledge cutoff: Real-time (X feed)
License: Proprietary
Elon's second-gen Grok — real-time X/Twitter data access — xAI's flagship model with real-time web access and multimodal capabilities
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 xAI API · x.com (Grok)

✨ Key Features

  • Real-time information access
  • Multimodal understanding
  • X platform integration
  • Conversational AI

📊 Benchmarks

84.0%
MMLU
86.3%
HumanEval
56.0%
MATH
70.9%
MMLU-Pro
51.0%
GPQA Diamond
26.7%
LiveCodeBench
77.8%
MATH-500
13.3%
AIME 2025
3.8%
HLE
28.5%
SciCode
13.9
AA Intelligence Index

🔄 What's new vs previous version

  • Vision input
  • Real-time X data
  • Improved reasoning

June

Major Release

Claude 3.5 Sonnet

Anthropic
Released: 2024-06-20
Type: LLM
Size: ~175B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2024-04
License: Proprietary
The mid-2024 Sonnet release that set the SOTA bar for coding and agents. — Anthropic's most intelligent model with significantly improved capabilities
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Improved coding capabilities
  • Enhanced reasoning
  • Vision capabilities

📊 Benchmarks

88.7%
MMLU
89.9%
HumanEval
71.1%
MATH
75.1%
MMLU-Pro
49.0%
SWE-bench Verified
56.0%
GPQA Diamond
38.1%
LiveCodeBench
69.5%
MATH-500
9.7%
AIME 2025
3.7%
HLE
31.6%
SciCode
14.2
AA Intelligence Index
1372
Chatbot Arena Elo

March

Major Release

Claude 3 Opus

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Anthropic's most powerful pre-Claude 4 model — tops GPT-4 on reasoning — Most capable model in the Claude 3 family with near-human performance on complex tasks
💰 $18.75 in / $75.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Advanced reasoning
  • Multimodal capabilities
  • Constitutional AI training

📊 Benchmarks

86.8%
MMLU
84.8%
HumanEval
60.1%
MATH
69.6%
MMLU-Pro
48.9%
GPQA Diamond
95.0%
GSM8K
27.9%
LiveCodeBench
64.1%
MATH-500
3.3%
AIME 2025
3.1%
HLE
23.3%
SciCode
18.0
AA Intelligence Index
1063
Chatbot Arena Elo

🔄 What's new vs previous version

  • 200K context
  • Vision input
  • +15% MMLU vs Claude 2
  • Tool use
Major Release

Claude 3 Sonnet

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Balanced Claude 3 variant — best price/performance in the family — Balanced model offering strong performance with faster response times
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Balanced capability and speed
  • Multimodal input
  • Strong reasoning

📊 Benchmarks

79.0%
MMLU
71.3%
HumanEval
40.5%
MATH
57.9%
MMLU-Pro
40.0%
GPQA Diamond
17.5%
LiveCodeBench
41.4%
MATH-500
4.7%
AIME 2025
3.8%
HLE
22.9%
SciCode
10.3
AA Intelligence Index
1018
Chatbot Arena Elo
Major Release

Claude 3 Haiku

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~25B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Fastest and cheapest Claude 3 — sub-second latency at $0.25/M — Fastest and most compact model in the Claude 3 family
💰 $0.25 in / $1.25 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Fastest response times
  • Multimodal input
  • Cost-effective

📊 Benchmarks

75.2%
MMLU
75.7%
HumanEval
38.9%
MATH
37.4%
GPQA Diamond
15.4%
LiveCodeBench
39.4%
MATH-500
1.0%
AIME 2025
3.9%
HLE
18.6%
SciCode
21.1%
TAU2-bench
0.8%
TerminalBench-Hard
36.1%
IF-Bench
21.0%
LiveCodeBench Reasoning
12.3
AA Intelligence Index
1001
Chatbot Arena Elo

February

Major Release

Mistral Large

Mistral
Released: 2024-02-26
Type: LLM
Size: ~70B
Context: 32K tokens
Top-tier reasoning model with strong multilingual capabilities
💰 $4.00 in / $12.00 out per 1M tok 🎛 In: text 📤 Out: text

✨ Key Features

  • 32K context window
  • Multilingual capabilities
  • Function calling
  • JSON mode

📊 Benchmarks

81.2%
MMLU
70.6%
HumanEval
89.2%
HellaSwag
51.5%
MMLU-Pro
35.1%
GPQA Diamond
17.8%
LiveCodeBench
52.7%
MATH-500
0.0%
AIME 2025
3.4%
HLE
20.8%
SciCode
9.9
AA Intelligence Index
Major Release

Gemini 1.5 Pro

Google
Released: 2024-02-15
Type: Multimodal
Size: ~175B
Architecture: MoE Transformer
Context: 1M tokens
Knowledge cutoff: 2023-11
License: Proprietary
Google's first 1M-context model — multimodal needle-in-haystack champion — Google's next-generation model with breakthrough long context capabilities
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

  • 1M token context window
  • Multimodal understanding
  • Video analysis
  • Audio processing

📊 Benchmarks

81.9%
MMLU
83.4%
HumanEval
58.5%
MATH
65.7%
MMLU-Pro
37.1%
GPQA Diamond
24.4%
LiveCodeBench
67.3%
MATH-500
8.0%
AIME 2025
3.9%
HLE
27.4%
SciCode
12.0
AA Intelligence Index

🔄 What's new vs previous version

  • 1M token context
  • Multi-hour video understanding
  • MoE architecture

January

Major Release

text-embedding-3-large

OpenAI
Released: 2024-01-25
Type: Embedding
Size: ~7B
Architecture: Transformer encoder
License: Proprietary
OpenAI's best embedding model — 3× cheaper than ada-002 with better MTEB — OpenAI's most powerful text embedding model
🎛 In: text 📤 Out: embeddings 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 3072 embedding dimensions
  • Improved retrieval performance
  • Reduced hallucinations
  • Multi-language support

📊 Benchmarks

64.6%
MTEB Score
3072
Dimensions
100+
Languages
64.6%
MTEB avg
Major Release

GPT-4 Turbo

OpenAI
Released: 2024-01-25
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 128K tokens
Knowledge cutoff: 2023-04
License: Proprietary
GPT-4 with 128K context and knowledge through April 2023 — 3× cheaper than GPT-4 — Latest iteration of GPT-4 with improved performance and longer context window
💰 $10.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 128K context window
  • Improved instruction following
  • Enhanced reasoning capabilities
  • Reduced hallucinations

📊 Benchmarks

86.4%
MMLU
91.8%
HumanEval
95.3%
HellaSwag
69.4%
MMLU-Pro
29.1%
LiveCodeBench
73.7%
MATH-500
15.0%
AIME 2025
3.3%
HLE
31.9%
SciCode
13.7
AA Intelligence Index

🔄 What's new vs previous version

  • 128K context (8× increase)
  • Updated knowledge cutoff
  • 3× cheaper than GPT-4

2023

December

Major Release

Grok-1

xAI
Released: 2023-12-07
Type: LLM
Size: ~314B
Architecture: MoE Transformer
Context: 8K tokens
Knowledge cutoff: 2023-10-01
License: Apache 2.0
xAI's open-source release — 314B MoE, first frontier model fully open-sourced — xAI's first major language model with real-time internet access
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Self-hosted (HuggingFace)

✨ Key Features

  • Real-time information
  • Conversational interface
  • X platform integration
  • Uncensored responses

📊 Benchmarks

73.0%
MMLU
63.2%
HumanEval
62.9%
GSM8K
11.7
AA Intelligence Index
Research

AlphaCode 2

Google DeepMind
Released: 2023-12-06
Type: Code
Size: ~340B
Architecture: Transformer (Gemini-based)
License: Proprietary (research)
DeepMind's coding specialist — top 15% of competitive programmers — DeepMind's advanced code generation system for competitive programming
🎛 In: text, code 📤 Out: code

✨ Key Features

  • Advanced code generation
  • Competitive programming
  • Multi-language support
  • Problem decomposition

📊 Benchmarks

1747
Codeforces Rating
85th percentile
Problem Solving
10+ languages
Language Support
Top 15%
Codeforces percentile
Major Release

Gemini Ultra

Google
Released: 2023-12-06
Type: Multimodal
Size: ~540B
Architecture: MoE Transformer
Knowledge cutoff: 2023-06
License: Proprietary
Google's first model to beat GPT-4 on MMLU — 90%+ with CoT — Google's most capable multimodal AI model
🎛 In: text, image, audio, video 📤 Out: text 🌐 Google One AI Premium · Vertex AI

✨ Key Features

  • Multimodal reasoning
  • Text, image, audio, video understanding
  • Advanced mathematical reasoning
  • Code generation

📊 Benchmarks

90.0%
MMLU
74.4%
HumanEval
53.2%
MATH

🔄 What's new vs previous version

  • First model to exceed human expert on MMLU
  • Native multimodal
  • 32K context
Major Release

Gemini Pro

Google
Released: 2023-12-06
Type: Multimodal
Size: ~175B
Architecture: Transformer
Context: 32K tokens
Knowledge cutoff: 2023-06
License: Proprietary
Google's workhorse Gemini model — free tier in Google AI Studio — Google's balanced model for wide range of tasks
🎛 In: text, image 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

  • Multimodal capabilities
  • 32K context window
  • Fast inference
  • Scalable deployment

📊 Benchmarks

79.1%
MMLU
67.7%
HumanEval
32.6%
MATH

November

Update

Claude 2.1

Anthropic
Released: 2023-11-21
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-01
License: Proprietary
Claude 2 update — 200K context and reduced hallucinations — Significant improvements in accuracy and honesty over Claude 2
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

  • 200K context window
  • Reduced hallucination rates
  • Enhanced accuracy
  • Tool use capabilities

📊 Benchmarks

73.1%
MMLU
15.9%
HumanEval
71.1%
MATH
49.5%
MMLU-Pro
31.9%
GPQA Diamond
19.5%
LiveCodeBench
37.4%
MATH-500
3.3%
AIME 2025
4.2%
HLE
18.4%
SciCode
9.3
AA Intelligence Index

🔄 What's new vs previous version

  • 200K context (2× Claude 2)
  • 50% fewer hallucinations
  • Tool use beta
Major Release

Whisper v3

OpenAI
Released: 2023-11-06
Type: Audio
Size: ~1.55B
Architecture: Transformer encoder-decoder
License: MIT
State-of-the-art open speech recognition — 99 languages, open weights — OpenAI's multilingual speech recognition system
🎛 In: audio 📤 Out: text 🌐 OpenAI API · Self-hosted (HuggingFace) · Groq

✨ Key Features

  • Multilingual speech recognition
  • 99 language support
  • Robust noise handling
  • Real-time transcription

📊 Benchmarks

5.1%
WER English
99 languages
Language Coverage
0.8x
Real-time Factor
~2.7%
WER (English)

August

Major Release

Code Llama 34B

Meta
Released: 2023-08-24
Type: Code
Size: 34B
Architecture: Transformer (Llama 2 fine-tune)
Context: 100K tokens
Knowledge cutoff: 2023-01
License: Llama 2 Community License
Meta's code-specialized open model — top open-source coding at launch — Specialized model for code generation built on Llama 2
🎛 In: text, code 📤 Out: code, text 🌐 Self-hosted · Together AI · Fireworks AI

✨ Key Features

  • Code generation
  • Code completion
  • Multiple programming languages
  • Large context window

📊 Benchmarks

48.8%
HumanEval
55.0%
MBPP
45.9%
MultiPL-E

July

Major Release

Llama 2 70B

Meta
Released: 2023-07-18
Type: LLM
Size: 70B
Architecture: Transformer
Context: 4K tokens
Knowledge cutoff: 2023-01
License: Llama 2 Community License
Meta's open-weight landmark — 70B that matched GPT-3.5 and ignited open AI — Meta's open-source large language model with commercial license
🎛 In: text 📤 Out: text 🌐 Self-hosted · Together AI · AWS Bedrock · Azure AI

✨ Key Features

  • Open source
  • Commercial license
  • Improved safety
  • Enhanced performance

📊 Benchmarks

68.9%
MMLU
29.9%
HumanEval
13.5%
MATH
Major Release

Claude 2

Anthropic
Released: 2023-07-11
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 100K tokens
Knowledge cutoff: 2023-01
License: Proprietary
Claude's first major leap — 100K context and better at instructions — Significant improvement over Claude 1 with enhanced capabilities
🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

  • 100K context window
  • Improved safety
  • Enhanced reasoning
  • Better code generation

📊 Benchmarks

78.5%
MMLU
71.2%
HumanEval
88.0%
MATH

🔄 What's new vs previous version

  • 100K context (10× Claude 1)
  • Improved reasoning
  • Reduced refusals

May

Major Release

PaLM 2

Google
Released: 2023-05-10
Type: LLM
Size: ~340B
Architecture: Transformer
Context: 8K tokens
Knowledge cutoff: 2023-02
License: Proprietary
Google's multilingual flagship — powers Bard 2023, 100+ languages — Google's improved large language model powering Bard and other services
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Google Cloud Vertex AI · Google AI Studio

✨ Key Features

  • Multilingual capabilities
  • Reasoning improvements
  • Coding abilities
  • Multiple model sizes

📊 Benchmarks

78.3%
MMLU
77.2%
HumanEval
34.3%
MATH
8.6
AA Intelligence Index

March

Major Release

GPT-4

OpenAI
Released: 2023-03-14
Type: LLM
Size: ~175B
Architecture: Transformer (reported MoE)
Context: 8K tokens (32K with gpt-4-32k)
Knowledge cutoff: 2021-09
License: Proprietary
The model that changed everything — GPT-4 set the standard for capable AI — OpenAI's most advanced system producing safer and more useful responses
💰 $30.00 in / $60.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 8K context window
  • Multimodal capabilities
  • Enhanced reasoning
  • Improved factual accuracy

📊 Benchmarks

86.4%
MMLU
67.0%
HumanEval
52.9%
MATH
~90th percentile
Bar exam
12.8
AA Intelligence Index

🔄 What's new vs previous version

  • Passed bar exam (top 10%)
  • Vision input (GPT-4V)
  • Multimodal
Major Release

Claude 1.3

Anthropic
Released: 2023-03-14
Type: LLM
Size: ~52B
Architecture: Transformer
Context: 100K tokens
Knowledge cutoff: 2022-12
License: Proprietary
Anthropic's first public model — 100K context ahead of its time — Anthropic's AI assistant built using Constitutional AI methods
🎛 In: text 📤 Out: text 🌐 Anthropic API

✨ Key Features

  • Constitutional AI
  • Helpful and harmless
  • Long conversations
  • Improved reasoning

📊 Benchmarks

75.0%
MMLU
56.0%
HumanEval
36.0%
MATH

2022

November

Major Release

ChatGPT (GPT-3.5 Turbo)

OpenAI
Released: 2022-11-30
Type: LLM
Size: ~175B
Architecture: Transformer (RLHF fine-tune of GPT-3.5)
Context: 16K tokens
Knowledge cutoff: 2021-09
License: Proprietary
The product that launched the AI era — 100M users in 2 months — Conversational AI that sparked mainstream adoption of large language models
🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Conversational interface
  • Fine-tuned for chat
  • RLHF training
  • Fast response times

📊 Benchmarks

70.0%
MMLU
48.1%
HumanEval
34.1%
MATH

🔄 What's new vs previous version

  • Conversational interface
  • RLHF alignment
  • Faster and cheaper than GPT-4

April

Research

PaLM

Google
Released: 2022-04-04
Type: LLM
Size: 540B
Architecture: Pathways Transformer
License: Proprietary (research)
Google's 540B pathways model — first to demonstrate chain-of-thought at scale — Google's 540-billion parameter language model demonstrating breakthrough capabilities
🎛 In: text 📤 Out: text

✨ Key Features

  • Large parameter count
  • Few-shot learning
  • Reasoning capabilities
  • Code generation

📊 Benchmarks

69.3%
MMLU
26.2%
HumanEval
8.8%
MATH
58.1%
BIG-bench

2021

August

Major Release

Codex

OpenAI
Released: 2021-08-10
Type: Code
Size: ~12B
Architecture: Transformer (GPT-3 fine-tune)
Context: 4K tokens
License: Proprietary
GPT-3 trained on code — the engine behind GitHub Copilot v1 — AI system that translates natural language to code
🎛 In: text, code 📤 Out: code 🌐 OpenAI API (deprecated) · GitHub Copilot

✨ Key Features

  • Code generation
  • Natural language to code
  • Multiple programming languages
  • GitHub Copilot integration

📊 Benchmarks

28.8%
HumanEval
59.6%
MBPP
25.0%
APPS

2020

June

Major Release

GPT-3

OpenAI
Released: 2020-06-11
Type: LLM
Size: 175B
Architecture: Transformer
Context: 2K tokens
Knowledge cutoff: 2019-10
License: Proprietary (API)
The model that showed scaling works — 175B parameters, few-shot learning pioneer — Breakthrough large language model that demonstrated emergent capabilities
🎛 In: text 📤 Out: text 🌐 OpenAI API

✨ Key Features

  • 175 billion parameters
  • Few-shot learning
  • Text generation
  • Multiple capabilities

📊 Benchmarks

43.9%
MMLU
0.0%
HumanEval
5.2%
MATH