AI Model Release Timeline 2025–2026

Every LLM launch tracked — GPT-5, Claude 4, Gemini 2, Llama 4 and more. Updated weekly with launch dates, benchmarks, and capabilities.

120 model releases tracked | Last updated: 2026-06-01 11:00 UTC | Compare models side by side →

AI Flash Report tracks every major AI model release with launch date, benchmarks and pricing, updated weekly. The most recent tracked launches are Anthropic Claude Opus 4.8 (2026-05-28), OpenBMB MiniCPM5-1B, and Alibaba Qwen3.7 Max. New AI models currently arrive roughly every 3 days.

59 models released in the last 90 days.

2026

May

Major Release

Claude Opus 4.8

Anthropic
Released: 2026-05-28
Type: LLM
Context: 1M tokens
License: Proprietary
Anthropic Claude Opus 4.8 — AA Intelligence Index 61.4, 1M tokens context, reasoning model.
💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

92.0%
GPQA Diamond
45.7%
HLE
53.5%
SciCode
94.4%
TAU2-bench
58.3%
TerminalBench-Hard
62.2%
IF-Bench
67.7%
LiveCodeBench Reasoning
61.4
AA Intelligence Index
Major Release

MiniCPM5-1B

OpenBMB
Released: 2026-05-25
Type: LLM
Context: 128K tokens
License: apache-2.0
OpenBMB MiniCPM5-1B — AA Intelligence Index 17.9, 128K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

26.9%
GPQA Diamond
4.6%
HLE
1.4%
SciCode
82.5%
TAU2-bench
0.0%
TerminalBench-Hard
35.2%
IF-Bench
4.7%
LiveCodeBench Reasoning
17.9
AA Intelligence Index
Major Release

Qwen3.7 Max

Alibaba
Released: 2026-05-19
Type: LLM
Context: 1M tokens
License: Proprietary
Alibaba Qwen3.7 Max — AA Intelligence Index 56.6, 1M tokens context, reasoning model.
💰 $2.50 in / $7.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

92.3%
GPQA Diamond
38.1%
HLE
48.8%
SciCode
94.7%
TAU2-bench
50.8%
TerminalBench-Hard
80.5%
IF-Bench
69.0%
LiveCodeBench Reasoning
56.6
AA Intelligence Index
1541
Chatbot Arena Elo
Major Release

Gemini 3.5 Flash

Google
Released: 2026-05-19
Type: LLM
Context: 1M tokens
License: Proprietary
Google Gemini 3.5 Flash — AA Intelligence Index 55.3, 1M tokens context, reasoning model.
💰 $1.50 in / $9.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.8%
GPQA Diamond
23.1%
HLE
48.8%
SciCode
58.8%
TAU2-bench
46.2%
TerminalBench-Hard
47.3%
IF-Bench
53.3%
LiveCodeBench Reasoning
43.3
AA Intelligence Index
1479
Chatbot Arena Elo
Major Release

JT-35B-Flash

China Mobile
Released: 2026-05-14
Type: LLM
Context: 256K tokens
License: Proprietary
China Mobile JT-35B-Flash — AA Intelligence Index 36.1, 256K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.9%
GPQA Diamond
6.1%
HLE
29.1%
SciCode
99.1%
TAU2-bench
28.8%
TerminalBench-Hard
42.0%
IF-Bench
55.3%
LiveCodeBench Reasoning
36.1
AA Intelligence Index
Major Release

MiniCPM-V 4.6 1.3B

OpenBMB
Released: 2026-05-11
Type: LLM
Context: 262K tokens
License: Apache 2.0
OpenBMB MiniCPM-V 4.6 1.3B — AA Intelligence Index 12.7, 262K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

30.5%
GPQA Diamond
4.9%
HLE
2.1%
SciCode
87.7%
TAU2-bench
0.0%
TerminalBench-Hard
26.7%
IF-Bench
6.3%
LiveCodeBench Reasoning
12.7
AA Intelligence Index
Major Release

Ring-2.6-1T

InclusionAI
Released: 2026-05-08
Type: LLM
Context: 262K tokens
License: MIT
InclusionAI Ring-2.6-1T — AA Intelligence Index 38.5, 262K tokens context, reasoning model.
💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

85.7%
GPQA Diamond
18.3%
HLE
42.4%
SciCode
92.4%
TAU2-bench
28.8%
TerminalBench-Hard
44.6%
IF-Bench
64.3%
LiveCodeBench Reasoning
38.5
AA Intelligence Index
Major Release

GPT-5.5 Instant

OpenAI
Released: 2026-05-05
Type: LLM
Context: 400K tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.5 Instant — AA Intelligence Index 41.8, 400K tokens context, reasoning model.
💰 $5.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.6%
GPQA Diamond
20.3%
HLE
50.3%
SciCode
49.4%
TAU2-bench
42.4%
TerminalBench-Hard
71.5%
IF-Bench
55.7%
LiveCodeBench Reasoning
41.8
AA Intelligence Index
1474
Chatbot Arena Elo

April

Major Release

Grok 4.3

xAI
Released: 2026-04-30
Type: LLM
Context: 1M tokens
License: Proprietary
xAI Grok 4.3 — AA Intelligence Index 53.2, 1M tokens context, reasoning model.
💰 $1.25 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

90.1%
GPQA Diamond
35.0%
HLE
47.3%
SciCode
97.7%
TAU2-bench
37.9%
TerminalBench-Hard
81.3%
IF-Bench
64.3%
LiveCodeBench Reasoning
53.2
AA Intelligence Index
1447
Chatbot Arena Elo
Major Release

Granite 4.1 30B

IBM
Released: 2026-04-29
Type: LLM
Context: 131K tokens
License: Apache 2.0
IBM Granite 4.1 30B — AA Intelligence Index 14.7, 131K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

48.1%
GPQA Diamond
4.2%
HLE
25.8%
SciCode
42.1%
TAU2-bench
2.3%
TerminalBench-Hard
44.4%
IF-Bench
18.7%
LiveCodeBench Reasoning
14.7
AA Intelligence Index
Major Release

Granite 4.1 3B

IBM
Released: 2026-04-29
Type: LLM
Context: 131K tokens
License: Apache 2.0
IBM Granite 4.1 3B — AA Intelligence Index 8.5, 131K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

31.4%
GPQA Diamond
3.4%
HLE
11.9%
SciCode
19.6%
TAU2-bench
2.3%
TerminalBench-Hard
33.7%
IF-Bench
3.0%
LiveCodeBench Reasoning
8.5
AA Intelligence Index
Major Release

Granite 4.1 8B

IBM
Released: 2026-04-29
Type: LLM
Context: 131K tokens
License: Apache 2.0
IBM Granite 4.1 8B — AA Intelligence Index 12.4, 131K tokens context.
💰 $0.05 in / $0.10 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

43.3%
GPQA Diamond
3.8%
HLE
21.8%
SciCode
27.8%
TAU2-bench
0.0%
TerminalBench-Hard
38.6%
IF-Bench
12.0%
LiveCodeBench Reasoning
12.4
AA Intelligence Index
1202
Chatbot Arena Elo
Major Release

Mistral Medium 3.5

Mistral
Released: 2026-04-29
Type: LLM
Context: 256K tokens
License: Other
Mistral Mistral Medium 3.5 — AA Intelligence Index 39.2, 256K tokens context, reasoning model.
💰 $1.50 in / $7.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

74.8%
GPQA Diamond
12.8%
HLE
39.6%
SciCode
94.2%
TAU2-bench
33.3%
TerminalBench-Hard
68.8%
IF-Bench
61.0%
LiveCodeBench Reasoning
39.2
AA Intelligence Index
Major Release

Nemotron 3 Nano Omni 30B A3B Reasoning

NVIDIA
Released: 2026-04-29
Type: LLM
Context: 256K tokens
License: NVIDIA Open Model License Agreement
NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning — AA Intelligence Index 21.4, 256K tokens context, reasoning model.
💰 $0.07 in / $0.30 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

46.9%
GPQA Diamond
5.3%
HLE
27.8%
SciCode
45.3%
TAU2-bench
8.3%
TerminalBench-Hard
63.2%
IF-Bench
35.7%
LiveCodeBench Reasoning
21.4
AA Intelligence Index
Major Release

DeepSeek V4 Flash

DeepSeek
Released: 2026-04-24
Type: LLM
Context: 1M tokens
License: MIT
DeepSeek DeepSeek V4 Flash — AA Intelligence Index 46.5, 1M tokens context, reasoning model.
💰 $0.14 in / $0.28 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

89.4%
GPQA Diamond
32.1%
HLE
44.9%
SciCode
95.0%
TAU2-bench
35.6%
TerminalBench-Hard
79.2%
IF-Bench
63.0%
LiveCodeBench Reasoning
46.5
AA Intelligence Index
1433
Chatbot Arena Elo
Major Release

DeepSeek V4 Pro

DeepSeek
Released: 2026-04-24
Type: LLM
Context: 1M tokens
License: MIT
DeepSeek DeepSeek V4 Pro — AA Intelligence Index 51.5, 1M tokens context, reasoning model.
💰 $1.74 in / $3.48 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

88.8%
GPQA Diamond
35.9%
HLE
50.0%
SciCode
96.2%
TAU2-bench
46.2%
TerminalBench-Hard
76.5%
IF-Bench
66.3%
LiveCodeBench Reasoning
51.5
AA Intelligence Index
1454
Chatbot Arena Elo
Major Release

Ling-2.6-1T

InclusionAI
Released: 2026-04-23
Type: LLM
Context: 262K tokens
License: Mit
InclusionAI Ling-2.6-1T — AA Intelligence Index 33.6, 262K tokens context.
💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.2%
GPQA Diamond
8.2%
HLE
37.0%
SciCode
89.8%
TAU2-bench
31.1%
TerminalBench-Hard
56.9%
IF-Bench
34.7%
LiveCodeBench Reasoning
33.6
AA Intelligence Index
Major Release

GPT-5.5

OpenAI
Released: 2026-04-23
Type: LLM
Context: 922K tokens
License: Proprietary
OpenAI GPT-5.5 — AA Intelligence Index 60.2, 922K tokens context, reasoning model.
💰 $5.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

93.5%
GPQA Diamond
44.3%
HLE
56.1%
SciCode
93.9%
TAU2-bench
60.6%
TerminalBench-Hard
75.9%
IF-Bench
74.3%
LiveCodeBench Reasoning
60.2
AA Intelligence Index
1474
Chatbot Arena Elo
Major Release

Hy3-preview

Tencent
Released: 2026-04-23
Type: LLM
Context: 256K tokens
License: TENCENT HY COMMUNITY LICENSE AGREEMENT
Tencent Hy3-preview — AA Intelligence Index 41.9, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.7%
GPQA Diamond
25.5%
HLE
41.2%
SciCode
92.7%
TAU2-bench
34.1%
TerminalBench-Hard
63.1%
IF-Bench
54.7%
LiveCodeBench Reasoning
41.9
AA Intelligence Index
Major Release

Qwen3.6 27B

Alibaba
Released: 2026-04-22
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.6 27B — AA Intelligence Index 45.8, 262K tokens context, reasoning model.
💰 $0.60 in / $3.60 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

84.2%
GPQA Diamond
21.6%
HLE
39.8%
SciCode
94.2%
TAU2-bench
34.8%
TerminalBench-Hard
67.6%
IF-Bench
68.7%
LiveCodeBench Reasoning
45.8
AA Intelligence Index
Major Release

MiMo-V2.5

Xiaomi
Released: 2026-04-22
Type: LLM
Context: 1M tokens
License: Mit
Xiaomi MiMo-V2.5 — AA Intelligence Index 49.0, 1M tokens context, reasoning model.
💰 $0.36 in / $1.80 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.9%
GPQA Diamond
25.2%
HLE
43.1%
SciCode
90.6%
TAU2-bench
41.7%
TerminalBench-Hard
67.1%
IF-Bench
62.7%
LiveCodeBench Reasoning
49.0
AA Intelligence Index
1434
Chatbot Arena Elo
Major Release

MiMo-V2.5-Pro

Xiaomi
Released: 2026-04-22
Type: LLM
Context: 1M tokens
License: Mit
Xiaomi MiMo-V2.5-Pro — AA Intelligence Index 53.8, 1M tokens context, reasoning model.
💰 $1.00 in / $3.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.6%
GPQA Diamond
33.8%
HLE
50.2%
SciCode
94.2%
TAU2-bench
43.2%
TerminalBench-Hard
79.9%
IF-Bench
73.3%
LiveCodeBench Reasoning
53.8
AA Intelligence Index
1465
Chatbot Arena Elo
Major Release

Ling 2.6 Flash

InclusionAI
Released: 2026-04-21
Type: LLM
Context: 262K tokens
License: Mit
InclusionAI Ling 2.6 Flash — AA Intelligence Index 26.2, 262K tokens context.
💰 $0.10 in / $0.30 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

59.3%
GPQA Diamond
6.2%
HLE
27.1%
SciCode
86.0%
TAU2-bench
21.2%
TerminalBench-Hard
57.4%
IF-Bench
25.0%
LiveCodeBench Reasoning
26.2
AA Intelligence Index
Major Release

Qwen3.6 Max Preview

Alibaba
Released: 2026-04-20
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3.6 Max Preview — AA Intelligence Index 51.8, 256K tokens context, reasoning model.
💰 $1.30 in / $7.80 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

88.8%
GPQA Diamond
28.9%
HLE
46.9%
SciCode
95.9%
TAU2-bench
43.9%
TerminalBench-Hard
76.6%
IF-Bench
69.7%
LiveCodeBench Reasoning
51.8
AA Intelligence Index
1459
Chatbot Arena Elo
Major Release

Kimi K2.6

Kimi
Released: 2026-04-20
Type: LLM
Context: 256K tokens
License: Modified MIT
Kimi Kimi K2.6 — AA Intelligence Index 53.9, 256K tokens context, reasoning model.
💰 $0.95 in / $4.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

91.1%
GPQA Diamond
35.9%
HLE
53.5%
SciCode
95.9%
TAU2-bench
43.9%
TerminalBench-Hard
76.0%
IF-Bench
69.7%
LiveCodeBench Reasoning
53.9
AA Intelligence Index
1462
Chatbot Arena Elo
Major Release

Qwen3.6 35B A3B

Alibaba
Released: 2026-04-16
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.6 35B A3B — AA Intelligence Index 43.5, 262K tokens context, reasoning model.
💰 $0.25 in / $1.49 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.1%
GPQA Diamond
20.2%
HLE
35.8%
SciCode
95.3%
TAU2-bench
34.8%
TerminalBench-Hard
64.4%
IF-Bench
63.7%
LiveCodeBench Reasoning
43.5
AA Intelligence Index
Major Release

Claude Opus 4.7

Anthropic
Released: 2026-04-16
Type: LLM
Context: 1M tokens
Knowledge cutoff: 2026-01-01
License: Proprietary
Anthropic Claude Opus 4.7 — AA Intelligence Index 57.3, 1M tokens context, reasoning model.
💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

91.4%
GPQA Diamond
39.6%
HLE
54.5%
SciCode
88.6%
TAU2-bench
51.5%
TerminalBench-Hard
58.6%
IF-Bench
70.3%
LiveCodeBench Reasoning
57.3
AA Intelligence Index
1494
Chatbot Arena Elo
Major Release

JT-MINI

China Mobile
Released: 2026-04-15
Type: LLM
Context: 128K tokens
License: Proprietary
China Mobile JT-MINI — AA Intelligence Index 25.4, 128K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

67.6%
GPQA Diamond
6.6%
HLE
27.2%
SciCode
93.0%
TAU2-bench
18.2%
TerminalBench-Hard
36.7%
IF-Bench
11.7%
LiveCodeBench Reasoning
25.4
AA Intelligence Index
Major Release

EXAONE 4.5 33B

LG AI Research
Released: 2026-04-09
Type: LLM
Context: 262K tokens
License: EXAONE AI Model License Agreement 1.2 - NC
LG AI Research EXAONE 4.5 33B — AA Intelligence Index 30.2, 262K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

79.4%
GPQA Diamond
11.6%
HLE
28.0%
SciCode
78.1%
TAU2-bench
20.5%
TerminalBench-Hard
58.0%
IF-Bench
49.3%
LiveCodeBench Reasoning
30.2
AA Intelligence Index
Major Release

Muse Spark

Meta
Released: 2026-04-08
Type: LLM
Context: 262K tokens
License: Proprietary
Meta Muse Spark — AA Intelligence Index 52.1, 262K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text

📊 Benchmarks

88.4%
GPQA Diamond
39.9%
HLE
51.5%
SciCode
91.5%
TAU2-bench
45.5%
TerminalBench-Hard
75.9%
IF-Bench
69.7%
LiveCodeBench Reasoning
52.2
AA Intelligence Index
1489
Chatbot Arena Elo
Major Release

GLM-5.1

Z AI
Released: 2026-04-07
Type: LLM
Context: 200K tokens
License: Mit
Z AI GLM-5.1 — AA Intelligence Index 51.4, 200K tokens context, reasoning model.
💰 $1.40 in / $4.40 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.8%
GPQA Diamond
28.0%
HLE
43.8%
SciCode
97.7%
TAU2-bench
43.2%
TerminalBench-Hard
76.3%
IF-Bench
62.3%
LiveCodeBench Reasoning
51.4
AA Intelligence Index
1474
Chatbot Arena Elo
Major Release

Grok 4.20 0309 v2

xAI
Released: 2026-04-07
Type: LLM
Context: 2M tokens
License: Proprietary
xAI Grok 4.20 0309 v2 — AA Intelligence Index 49.3, 2M tokens context, reasoning model.
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

91.1%
GPQA Diamond
32.2%
HLE
45.6%
SciCode
93.0%
TAU2-bench
37.9%
TerminalBench-Hard
81.2%
IF-Bench
58.0%
LiveCodeBench Reasoning
49.3
AA Intelligence Index
Major Release

Solar Pro 3

Upstage
Released: 2026-04-06
Type: LLM
Context: 128K tokens
License: Proprietary
Upstage Solar Pro 3 — AA Intelligence Index 25.9, 128K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

72.4%
GPQA Diamond
10.1%
HLE
24.7%
SciCode
86.3%
TAU2-bench
7.6%
TerminalBench-Hard
71.2%
IF-Bench
27.0%
LiveCodeBench Reasoning
25.9
AA Intelligence Index
Major Release

Gemma 4 E4B

Google
Released: 2026-04-03
Type: LLM
Context: 128K tokens
License: Apache 2.0
Google Gemma 4 E4B — AA Intelligence Index 18.8, 128K tokens context, reasoning model.
💰 $0.30 in / $1.25 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

57.6%
GPQA Diamond
3.7%
HLE
24.4%
SciCode
20.8%
TAU2-bench
8.3%
TerminalBench-Hard
44.2%
IF-Bench
30.7%
LiveCodeBench Reasoning
18.8
AA Intelligence Index
Major Release

Qwen3.6 Plus

Alibaba
Released: 2026-04-02
Type: LLM
Context: 1M tokens
License: Proprietary
Alibaba Qwen3.6 Plus — AA Intelligence Index 50.0, 1M tokens context, reasoning model.
💰 $0.50 in / $3.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

88.2%
GPQA Diamond
25.7%
HLE
40.7%
SciCode
97.7%
TAU2-bench
43.9%
TerminalBench-Hard
75.2%
IF-Bench
69.7%
LiveCodeBench Reasoning
50.0
AA Intelligence Index
1444
Chatbot Arena Elo
Major Release

Gemma 4 26B A4B

Google
Released: 2026-04-02
Type: LLM
Context: 256K tokens
License: Apache 2.0
Google Gemma 4 26B A4B — AA Intelligence Index 31.2, 256K tokens context, reasoning model.
💰 $0.13 in / $0.40 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

79.2%
GPQA Diamond
18.3%
HLE
40.0%
SciCode
43.6%
TAU2-bench
13.6%
TerminalBench-Hard
72.4%
IF-Bench
55.7%
LiveCodeBench Reasoning
31.2
AA Intelligence Index
1439
Chatbot Arena Elo
Major Release

Gemma 4 31B

Google
Released: 2026-04-02
Type: LLM
Context: 256K tokens
License: Apache 2.0
Google Gemma 4 31B — AA Intelligence Index 39.2, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

85.7%
GPQA Diamond
22.7%
HLE
43.4%
SciCode
59.9%
TAU2-bench
36.4%
TerminalBench-Hard
75.6%
IF-Bench
62.0%
LiveCodeBench Reasoning
39.2
AA Intelligence Index
1452
Chatbot Arena Elo
Major Release

Gemma 4 E2B

Google
Released: 2026-04-02
Type: LLM
Context: 128K tokens
License: Apache 2.0
Google Gemma 4 E2B — AA Intelligence Index 15.2, 128K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

43.3%
GPQA Diamond
4.8%
HLE
20.9%
SciCode
20.8%
TAU2-bench
3.0%
TerminalBench-Hard
38.0%
IF-Bench
15.0%
LiveCodeBench Reasoning
15.2
AA Intelligence Index
Major Release

Step 3.5 Flash 2603

StepFun
Released: 2026-04-02
Type: LLM
Context: 256K tokens
License: Proprietary
StepFun Step 3.5 Flash 2603 — AA Intelligence Index 38.5, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.6%
GPQA Diamond
22.6%
HLE
38.5%
SciCode
87.4%
TAU2-bench
32.6%
TerminalBench-Hard
66.5%
IF-Bench
54.3%
LiveCodeBench Reasoning
38.5
AA Intelligence Index
1394
Chatbot Arena Elo
Major Release

Trinity Large Thinking

Arcee AI
Released: 2026-04-01
Type: LLM
Context: 512K tokens
License: Apache 2.0
Arcee AI Trinity Large Thinking — AA Intelligence Index 31.9, 512K tokens context, reasoning model.
💰 $0.23 in / $0.88 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.2%
GPQA Diamond
14.7%
HLE
36.1%
SciCode
90.1%
TAU2-bench
22.7%
TerminalBench-Hard
56.3%
IF-Bench
33.0%
LiveCodeBench Reasoning
31.9
AA Intelligence Index
1371
Chatbot Arena Elo
Major Release

GLM 5V Turbo

Z AI
Released: 2026-04-01
Type: LLM
Context: 200K tokens
License: Proprietary
Z AI GLM 5V Turbo — AA Intelligence Index 42.9, 200K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

80.9%
GPQA Diamond
15.8%
HLE
43.5%
SciCode
98.5%
TAU2-bench
32.6%
TerminalBench-Hard
61.1%
IF-Bench
61.0%
LiveCodeBench Reasoning
42.9
AA Intelligence Index
1227
Chatbot Arena Elo

March

Major Release

Qwen3.5 Omni Flash

Alibaba
Released: 2026-03-30
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3.5 Omni Flash — AA Intelligence Index 25.9, 256K tokens context.
💰 $0.10 in / $0.80 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

74.2%
GPQA Diamond
7.1%
HLE
25.5%
SciCode
84.5%
TAU2-bench
8.3%
TerminalBench-Hard
38.0%
IF-Bench
44.0%
LiveCodeBench Reasoning
25.9
AA Intelligence Index
Major Release

Qwen3.5 Omni Plus

Alibaba
Released: 2026-03-30
Type: LLM
Context: 256K tokens
License: Proprietary
Alibaba Qwen3.5 Omni Plus — AA Intelligence Index 38.6, 256K tokens context.
💰 $0.40 in / $4.80 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.6%
GPQA Diamond
13.9%
HLE
40.5%
SciCode
88.3%
TAU2-bench
21.2%
TerminalBench-Hard
51.2%
IF-Bench
52.7%
LiveCodeBench Reasoning
38.6
AA Intelligence Index
Major Release

MiMo-V2-Omni-0327

Xiaomi
Released: 2026-03-27
Type: LLM
Context: 256K tokens
License: Proprietary
Xiaomi MiMo-V2-Omni-0327 — AA Intelligence Index 44.9, 256K tokens context, reasoning model.
💰 $0.40 in / $2.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.5%
GPQA Diamond
20.4%
HLE
39.5%
SciCode
88.0%
TAU2-bench
35.6%
TerminalBench-Hard
67.3%
IF-Bench
63.7%
LiveCodeBench Reasoning
44.9
AA Intelligence Index
Major Release

Nemotron Cascade 2 30B A3B

NVIDIA
Released: 2026-03-19
Type: LLM
Context: 1M tokens
License: Nvidia Open Model License
NVIDIA Nemotron Cascade 2 30B A3B — AA Intelligence Index 28.4, 1M tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.8%
GPQA Diamond
11.4%
HLE
34.8%
SciCode
53.2%
TAU2-bench
21.2%
TerminalBench-Hard
80.4%
IF-Bench
34.0%
LiveCodeBench Reasoning
28.4
AA Intelligence Index
Major Release

MiMo-V2-Omni

Xiaomi
Released: 2026-03-19
Type: LLM
Context: 256K tokens
License: Proprietary
Xiaomi MiMo-V2-Omni — AA Intelligence Index 43.4, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

82.8%
GPQA Diamond
19.9%
HLE
36.7%
SciCode
91.2%
TAU2-bench
34.8%
TerminalBench-Hard
53.5%
IF-Bench
66.7%
LiveCodeBench Reasoning
43.4
AA Intelligence Index
1414
Chatbot Arena Elo
Major Release

MiniMax-M2.7

MiniMax
Released: 2026-03-18
Type: LLM
Context: 204K tokens
License: NON-COMMERCIAL LICENSE
MiniMax MiniMax-M2.7 — AA Intelligence Index 49.6, 204K tokens context, reasoning model.
💰 $0.30 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

87.4%
GPQA Diamond
28.1%
HLE
47.0%
SciCode
84.8%
TAU2-bench
39.4%
TerminalBench-Hard
75.7%
IF-Bench
68.7%
LiveCodeBench Reasoning
49.6
AA Intelligence Index
1413
Chatbot Arena Elo
Major Release

MiMo-V2-Pro

Xiaomi
Released: 2026-03-18
Type: LLM
Context: 1M tokens
License: Proprietary
Xiaomi MiMo-V2-Pro — AA Intelligence Index 49.2, 1M tokens context, reasoning model.
💰 $1.00 in / $3.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

87.0%
GPQA Diamond
28.3%
HLE
42.5%
SciCode
95.0%
TAU2-bench
40.9%
TerminalBench-Hard
68.8%
IF-Bench
60.7%
LiveCodeBench Reasoning
49.2
AA Intelligence Index
1448
Chatbot Arena Elo
Major Release

GPT-5.4 mini

OpenAI
Released: 2026-03-17
Type: LLM
Context: 400K tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.4 mini — AA Intelligence Index 48.9, 400K tokens context, reasoning model.
💰 $0.75 in / $4.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

87.5%
GPQA Diamond
26.6%
HLE
49.9%
SciCode
83.3%
TAU2-bench
52.3%
TerminalBench-Hard
73.3%
IF-Bench
69.3%
LiveCodeBench Reasoning
48.9
AA Intelligence Index
1451
Chatbot Arena Elo
Major Release

GPT-5.4 nano

OpenAI
Released: 2026-03-17
Type: LLM
Context: 400K tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.4 nano — AA Intelligence Index 44.0, 400K tokens context, reasoning model.
💰 $0.20 in / $1.25 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

81.7%
GPQA Diamond
26.5%
HLE
46.9%
SciCode
76.0%
TAU2-bench
42.4%
TerminalBench-Hard
75.9%
IF-Bench
66.0%
LiveCodeBench Reasoning
44.0
AA Intelligence Index
1403
Chatbot Arena Elo
Major Release

Mistral Small 4

Mistral
Released: 2026-03-16
Type: LLM
Context: 256K tokens
License: Apache 2.0
Mistral Mistral Small 4 — AA Intelligence Index 27.8, 256K tokens context, reasoning model.
💰 $0.15 in / $0.60 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

76.9%
GPQA Diamond
9.5%
HLE
38.0%
SciCode
41.2%
TAU2-bench
17.4%
TerminalBench-Hard
48.2%
IF-Bench
44.7%
LiveCodeBench Reasoning
27.8
AA Intelligence Index
Major Release

NVIDIA Nemotron 3 Nano 4B

NVIDIA
Released: 2026-03-16
Type: LLM
Context: 262K tokens
License: Nvidia Nemotron Open Model License
NVIDIA NVIDIA Nemotron 3 Nano 4B — AA Intelligence Index 14.7, 262K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

51.3%
GPQA Diamond
4.8%
HLE
16.4%
SciCode
28.1%
TAU2-bench
6.8%
TerminalBench-Hard
58.2%
IF-Bench
16.7%
LiveCodeBench Reasoning
14.7
AA Intelligence Index
Major Release

GLM-5-Turbo

Z AI
Released: 2026-03-15
Type: LLM
Context: 200K tokens
License: Proprietary
Z AI GLM-5-Turbo — AA Intelligence Index 46.8, 200K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.7%
GPQA Diamond
25.4%
HLE
43.6%
SciCode
98.5%
TAU2-bench
33.3%
TerminalBench-Hard
73.2%
IF-Bench
60.7%
LiveCodeBench Reasoning
46.8
AA Intelligence Index
Major Release

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA
Released: 2026-03-11
Type: LLM
Context: 1M tokens
License: Nvidia Nemotron Open Model License
NVIDIA NVIDIA Nemotron 3 Super 120B A12B — AA Intelligence Index 36.0, 1M tokens context, reasoning model.
💰 $0.30 in / $0.75 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

80.0%
GPQA Diamond
19.2%
HLE
36.0%
SciCode
67.8%
TAU2-bench
28.8%
TerminalBench-Hard
71.5%
IF-Bench
60.0%
LiveCodeBench Reasoning
36.0
AA Intelligence Index
1361
Chatbot Arena Elo
Major Release

Grok 4.20 0309

xAI
Released: 2026-03-10
Type: LLM
Context: 2M tokens
License: Proprietary
xAI Grok 4.20 0309 — AA Intelligence Index 48.5, 2M tokens context, reasoning model.
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

88.5%
GPQA Diamond
30.0%
HLE
44.7%
SciCode
96.5%
TAU2-bench
40.9%
TerminalBench-Hard
82.9%
IF-Bench
59.0%
LiveCodeBench Reasoning
48.5
AA Intelligence Index
Major Release

Sarvam 105B

Sarvam
Released: 2026-03-06
Type: LLM
Context: 128K tokens
License: Apache 2.0
Sarvam Sarvam 105B — AA Intelligence Index 18.2, 128K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

73.8%
GPQA Diamond
10.1%
HLE
26.4%
SciCode
46.8%
TAU2-bench
1.5%
TerminalBench-Hard
34.4%
IF-Bench
0.0%
LiveCodeBench Reasoning
18.2
AA Intelligence Index
Major Release

Sarvam 30B

Sarvam
Released: 2026-03-06
Type: LLM
Context: 65K tokens
License: Apache 2.0
Sarvam Sarvam 30B — AA Intelligence Index 12.3, 65K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

63.3%
GPQA Diamond
7.0%
HLE
19.2%
SciCode
34.5%
TAU2-bench
2.3%
TerminalBench-Hard
26.5%
IF-Bench
0.0%
LiveCodeBench Reasoning
12.3
AA Intelligence Index
Major Release

GPT-5.4

OpenAI
Released: 2026-03-05
Type: LLM
Context: 1M tokens
Knowledge cutoff: 2025-08-31
License: Proprietary
OpenAI GPT-5.4 — AA Intelligence Index 56.8, 1M tokens context, reasoning model.
💰 $2.50 in / $15.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

92.0%
GPQA Diamond
41.6%
HLE
56.6%
SciCode
87.1%
TAU2-bench
57.6%
TerminalBench-Hard
73.9%
IF-Bench
74.0%
LiveCodeBench Reasoning
56.8
AA Intelligence Index
1469
Chatbot Arena Elo
Major Release

Gemini 3.1 Flash-Lite Preview

Google
Released: 2026-03-03
Type: LLM
Context: 1M tokens
Knowledge cutoff: 2025-01-01
License: Proprietary
Google Gemini 3.1 Flash-Lite Preview — AA Intelligence Index 33.5, 1M tokens context, reasoning model.
💰 $0.25 in / $1.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.2%
GPQA Diamond
16.2%
HLE
41.9%
SciCode
31.3%
TAU2-bench
24.2%
TerminalBench-Hard
77.2%
IF-Bench
65.3%
LiveCodeBench Reasoning
33.5
AA Intelligence Index
1433
Chatbot Arena Elo
Major Release

Qwen3.5 0.8B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 0.8B — AA Intelligence Index 10.5, 262K tokens context, reasoning model.
💰 $0.01 in / $0.05 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

11.1%
GPQA Diamond
1.2%
HLE
0.0%
SciCode
47.7%
TAU2-bench
0.0%
TerminalBench-Hard
21.5%
IF-Bench
5.3%
LiveCodeBench Reasoning
10.5
AA Intelligence Index
Major Release

Qwen3.5 2B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 2B — AA Intelligence Index 16.3, 262K tokens context, reasoning model.
💰 $0.02 in / $0.10 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

45.6%
GPQA Diamond
2.1%
HLE
2.8%
SciCode
69.0%
TAU2-bench
3.8%
TerminalBench-Hard
31.5%
IF-Bench
23.7%
LiveCodeBench Reasoning
16.3
AA Intelligence Index
Major Release

Qwen3.5 4B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 4B — AA Intelligence Index 27.1, 262K tokens context, reasoning model.
💰 $0.03 in / $0.15 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

77.1%
GPQA Diamond
7.8%
HLE
16.1%
SciCode
92.1%
TAU2-bench
18.2%
TerminalBench-Hard
52.0%
IF-Bench
55.7%
LiveCodeBench Reasoning
27.1
AA Intelligence Index
Major Release

Qwen3.5 9B

Alibaba
Released: 2026-03-02
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 9B — AA Intelligence Index 32.4, 262K tokens context, reasoning model.
💰 $0.10 in / $0.15 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

80.6%
GPQA Diamond
13.3%
HLE
27.5%
SciCode
86.8%
TAU2-bench
24.2%
TerminalBench-Hard
66.7%
IF-Bench
59.0%
LiveCodeBench Reasoning
32.4
AA Intelligence Index

February

Major Release

LFM2 24B A2B

Liquid AI
Released: 2026-02-25
Type: LLM
Context: 32K tokens
License: lfm 1.0
Liquid AI LFM2 24B A2B — AA Intelligence Index 10.5, 32K tokens context.
💰 $0.03 in / $0.12 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

47.4%
GPQA Diamond
4.4%
HLE
10.9%
SciCode
11.1%
TAU2-bench
0.0%
TerminalBench-Hard
45.9%
IF-Bench
0.0%
LiveCodeBench Reasoning
10.5
AA Intelligence Index
Major Release

Qwen3.5 122B A10B

Alibaba
Released: 2026-02-24
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 122B A10B — AA Intelligence Index 41.6, 262K tokens context, reasoning model.
💰 $0.40 in / $3.20 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.7%
GPQA Diamond
23.4%
HLE
42.0%
SciCode
93.6%
TAU2-bench
31.1%
TerminalBench-Hard
75.7%
IF-Bench
66.7%
LiveCodeBench Reasoning
41.6
AA Intelligence Index
1417
Chatbot Arena Elo
Major Release

Qwen3.5 27B

Alibaba
Released: 2026-02-24
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 27B — AA Intelligence Index 42.1, 262K tokens context, reasoning model.
💰 $0.30 in / $2.40 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.8%
GPQA Diamond
22.2%
HLE
39.5%
SciCode
93.9%
TAU2-bench
32.6%
TerminalBench-Hard
75.6%
IF-Bench
67.3%
LiveCodeBench Reasoning
42.1
AA Intelligence Index
1408
Chatbot Arena Elo
Major Release

Qwen3.5 35B A3B

Alibaba
Released: 2026-02-24
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 35B A3B — AA Intelligence Index 37.1, 262K tokens context, reasoning model.
💰 $0.25 in / $2.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.5%
GPQA Diamond
19.7%
HLE
37.7%
SciCode
89.2%
TAU2-bench
26.5%
TerminalBench-Hard
72.5%
IF-Bench
62.7%
LiveCodeBench Reasoning
37.1
AA Intelligence Index
1396
Chatbot Arena Elo
Major Release

Mercury 2

Inception
Released: 2026-02-20
Type: LLM
Context: 128K tokens
License: Proprietary
Inception Mercury 2 — AA Intelligence Index 32.8, 128K tokens context, reasoning model.
💰 $0.25 in / $0.75 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

77.0%
GPQA Diamond
15.5%
HLE
38.7%
SciCode
70.8%
TAU2-bench
26.5%
TerminalBench-Hard
69.8%
IF-Bench
36.3%
LiveCodeBench Reasoning
32.8
AA Intelligence Index
1347
Chatbot Arena Elo
Major Release

Gemini 3.1 Pro

Google
Released: 2026-02-19
Type: LLM
Size: ~1T MoE
Architecture: Sparse Mixture-of-Experts (MoE)
Context: 2M tokens
Knowledge cutoff: 2025-12
License: Proprietary
Google's flagship reasoning model with a 2x jump on hard multi-step tasks. — Google's latest flagship model with a major 2x jump in reasoning capabilities
💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • 2x reasoning improvement
  • ARC-AGI-2 score of 77.1%
  • Enhanced multimodal understanding
  • Deep Think mode

📊 Benchmarks

77.1%
ARC-AGI-2
93.8%
MMLU
89.4%
MATH
93.8%
MMLU-Pro
84.2%
GPQA Diamond
72.3%
SWE-bench Verified
78.9%
LiveCodeBench

🔄 What's new vs previous version

  • 2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
  • Context window expanded to 2M tokens
  • Deep Think mode enabled by default on the Pro tier
  • Lower latency on first-token despite larger context
Major Release

Gemini 3.1 Pro Preview

Google
Released: 2026-02-19
Type: LLM
Context: 1M tokens
License: Proprietary
Google Gemini 3.1 Pro Preview — AA Intelligence Index 57.2, 1M tokens context, reasoning model.
💰 $2.00 in / $12.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

94.1%
GPQA Diamond
44.7%
HLE
58.9%
SciCode
95.6%
TAU2-bench
53.8%
TerminalBench-Hard
77.1%
IF-Bench
72.7%
LiveCodeBench Reasoning
57.2
AA Intelligence Index
1487
Chatbot Arena Elo
Major Release

Claude Sonnet 4.6

Anthropic
Released: 2026-02-17
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 500K tokens
Knowledge cutoff: 2025-10
License: Proprietary
Near-Opus quality at a fraction of the cost, with Agent Teams orchestration. — Anthropic's latest Sonnet with Agent Teams capability and near-Opus performance at a fraction of the cost
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Agent Teams: orchestrate 2-16 Claude instances
  • Near-Opus performance at 1/5th cost
  • 80.8% SWE-bench Verified
  • Fast mode research preview

📊 Benchmarks

80.8%
SWE-bench
92.1%
MMLU
95.2%
HumanEval
80.8%
SWE-bench Verified
79.7%
GPQA Diamond
88.5%
AIME 2025
71.2%
TAU-bench
10.8%
HLE
44.1%
SciCode
78.9%
TAU2-bench
42.4%
TerminalBench-Hard
42.4%
IF-Bench
58.7%
LiveCodeBench Reasoning
42.6
AA Intelligence Index
1470
Chatbot Arena Elo

🔄 What's new vs previous version

  • Agent Teams: orchestrate 2–16 Claude instances in parallel
  • +8.5pt on SWE-bench Verified vs Sonnet 4
  • 1/5 the cost of Opus 4.5 at ~95% of coding quality
  • Fast mode research preview for lower-latency inference
Major Release

Tiny Aya Global

Cohere
Released: 2026-02-17
Type: LLM
Context: 8K tokens
License: Cc By Nc 4.0
Cohere Tiny Aya Global — AA Intelligence Index 4.7, 8K tokens context.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

30.5%
GPQA Diamond
5.2%
HLE
3.6%
SciCode
0.0%
TAU2-bench
0.0%
TerminalBench-Hard
20.1%
IF-Bench
0.0%
LiveCodeBench Reasoning
4.7
AA Intelligence Index
Major Release

Qwen3.5 397B A17B

Alibaba
Released: 2026-02-16
Type: LLM
Context: 262K tokens
License: Apache 2.0
Alibaba Qwen3.5 397B A17B — AA Intelligence Index 45.0, 262K tokens context, reasoning model.
💰 $0.60 in / $3.60 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

89.3%
GPQA Diamond
27.3%
HLE
42.0%
SciCode
95.6%
TAU2-bench
40.9%
TerminalBench-Hard
78.8%
IF-Bench
65.7%
LiveCodeBench Reasoning
45.0
AA Intelligence Index
1445
Chatbot Arena Elo
Update

DeepSeek V3.2

DeepSeek
Released: 2026-02-12
Type: LLM
Size: 671B MoE
Architecture: Sparse MoE (37B active / 671B total)
Context: 1M tokens
Knowledge cutoff: 2025-09
License: DeepSeek License (open weights, commercial OK)
Open-weight MoE with a 1M+ token context window and strong coding. — Major update with 10x context window expansion to over 1 million tokens
💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI · Fireworks AI

✨ Key Features

  • 1M+ token context window (10x expansion)
  • Improved reasoning capabilities
  • Open source release
  • Cost-effective inference

📊 Benchmarks

90.1%
MMLU
92.5%
HumanEval
1M+ tokens
Context Window
86.2%
MMLU-Pro
84.0%
GPQA Diamond
85.6%
MATH
68.4%
GPQA
86.2%
LiveCodeBench
22.2%
HLE
38.9%
SciCode
90.6%
TAU2-bench
35.6%
TerminalBench-Hard
60.7%
IF-Bench
65.0%
LiveCodeBench Reasoning
41.7
AA Intelligence Index
1424
Chatbot Arena Elo

🔄 What's new vs previous version

  • 10x context window expansion (128K → 1M+ tokens)
  • Sliding-window attention for long-context throughput
  • Improved chain-of-thought reasoning
  • Native FP8 inference support
Major Release

MiniMax-M2.5

MiniMax
Released: 2026-02-12
Type: LLM
Context: 204K tokens
License: MIT
MiniMax MiniMax-M2.5 — AA Intelligence Index 41.9, 204K tokens context, reasoning model.
💰 $0.30 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.8%
GPQA Diamond
19.1%
HLE
42.6%
SciCode
95.3%
TAU2-bench
34.8%
TerminalBench-Hard
71.6%
IF-Bench
66.0%
LiveCodeBench Reasoning
41.9
AA Intelligence Index
1391
Chatbot Arena Elo
Major Release

GLM-5

Zhipu AI
Released: 2026-02-11
Type: LLM
Size: 744B
Architecture: Dense Transformer (744B)
Context: 200K tokens
Knowledge cutoff: 2025-11
License: Proprietary (open weights for non-frontier sizes)
First frontier model trained entirely on Huawei Ascend silicon. — First frontier AI model trained entirely without NVIDIA GPUs, using Huawei Ascend chips
💰 $0.11 in / $0.28 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Zhipu BigModel API

✨ Key Features

  • First frontier model trained on Huawei Ascend chips (no NVIDIA)
  • #1 HLE score (50.4%)
  • 1.2% hallucination rate via Slime RL
  • 136x cheaper than Claude Opus 4.5

📊 Benchmarks

50.4%
HLE
1.2%
Hallucination Rate
$0.11/M tokens
Cost
88.7%
MMLU
92.1%
C-Eval
94.8%
GSM8K

🔄 What's new vs previous version

  • Trained entirely on Huawei Ascend 910B clusters (no NVIDIA)
  • Slime RL fine-tuning drops hallucination rate to 1.2%
  • 136x cheaper than Claude Opus 4.5 at comparable quality
Major Release

Nanbeige4.1-3B

Nanbeige
Released: 2026-02-11
Type: LLM
Context: 256K tokens
License: Apache 2.0
Nanbeige Nanbeige4.1-3B — AA Intelligence Index 16.1, 256K tokens context, reasoning model.
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.9%
GPQA Diamond
10.0%
HLE
26.6%
SciCode
21.6%
TAU2-bench
0.0%
TerminalBench-Hard
35.4%
IF-Bench
0.0%
LiveCodeBench Reasoning
16.1
AA Intelligence Index
Major Release

GLM-5

Z AI
Released: 2026-02-11
Type: LLM
Context: 200K tokens
License: MIT
Z AI GLM-5 — AA Intelligence Index 49.8, 200K tokens context, reasoning model.
💰 $1.00 in / $3.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.0%
GPQA Diamond
27.2%
HLE
46.2%
SciCode
98.2%
TAU2-bench
43.2%
TerminalBench-Hard
72.3%
IF-Bench
63.3%
LiveCodeBench Reasoning
49.8
AA Intelligence Index
1457
Chatbot Arena Elo
Major Release

GPT-5.3 Codex

OpenAI
Released: 2026-02-05
Type: Code
Size: ~200B
Architecture: MoE (coding-specialized fine-tune)
Context: 400K tokens
Knowledge cutoff: 2025-11
License: Proprietary
Coding-specialized variant of GPT-5.3, tuned for agentic IDE workflows. — OpenAI's specialized self-improving coding model with state-of-the-art software engineering performance
💰 $1.25 in / $10.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI · GitHub Copilot

✨ Key Features

  • Self-improving agentic coding
  • 25% faster than GPT-5.2-Codex
  • 1,000+ tokens/sec generation
  • First OpenAI model flagged 'high' on cybersecurity framework

📊 Benchmarks

77.3%
Terminal-Bench
SOTA
SWE-Bench Pro
1,000+ tok/s
Speed
82.4%
SWE-bench Verified
96.8%
HumanEval
91.5%
GPQA Diamond
84.2%
LiveCodeBench
79.5%
Aider Polyglot
39.9%
HLE
53.2%
SciCode
86.0%
TAU2-bench
53.0%
TerminalBench-Hard
75.4%
IF-Bench
74.0%
LiveCodeBench Reasoning
53.6
AA Intelligence Index
1407
Chatbot Arena Elo

🔄 What's new vs previous version

  • +4pt on SWE-bench Verified vs GPT-5.2 Codex
  • Native IDE tool-calling at reduced latency
  • Extended max output to 100K for multi-file patches
Major Release

Claude Opus 4.6

Anthropic
Released: 2026-02-05
Type: LLM
Context: 1M tokens
License: Proprietary
Anthropic Claude Opus 4.6 — AA Intelligence Index 46.5, 1M tokens context.
💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.0%
GPQA Diamond
18.6%
HLE
45.7%
SciCode
84.8%
TAU2-bench
48.5%
TerminalBench-Hard
44.6%
IF-Bench
58.3%
LiveCodeBench Reasoning
46.5
AA Intelligence Index
1498
Chatbot Arena Elo
Major Release

Qwen3 Coder Next

Alibaba
Released: 2026-02-03
Type: LLM
Context: 256K tokens
License: Apache 2.0
Alibaba Qwen3 Coder Next — AA Intelligence Index 28.3, 256K tokens context.
💰 $0.35 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

73.7%
GPQA Diamond
9.3%
HLE
32.3%
SciCode
79.5%
TAU2-bench
18.2%
TerminalBench-Hard
35.2%
IF-Bench
40.0%
LiveCodeBench Reasoning
28.3
AA Intelligence Index

January

Major Release

Kimi K2

Moonshot AI
Released: 2026-01-20
Type: LLM
Size: 1.04T MoE
Architecture: MoE (32B active / ~1T total)
Context: 2M tokens
Knowledge cutoff: 2025-10
License: Modified MIT (open weights)
Moonshot's open-weight frontier MoE with strong agentic benchmarks. — First open-weight model to rank #1 on LMSYS Chatbot Arena with over 1 trillion parameters
💰 $0.15 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Moonshot API · Hugging Face · Together AI

✨ Key Features

  • First open-weight model #1 on LMSYS Chatbot Arena
  • 1.04 trillion parameters
  • K2.5 agent swarms with up to 100 sub-agents
  • $0.15/M input tokens

📊 Benchmarks

#1
LMSYS Arena
1.04T
Parameters
$0.15/M tokens
Cost
91.3%
MMLU
65.8%
SWE-bench Verified
74.1%
GPQA Diamond
68.9%
LiveCodeBench

🔄 What's new vs previous version

  • 2M token context window (20x vs first Kimi)
  • Agentic tool-use tuning via MuonClip optimizer
  • Open weights under modified MIT

2025

December

Update

GPT-5.2 Codex

OpenAI
Released: 2025-12-18
Type: Code
Size: ~200B
Architecture: MoE (coding fine-tune)
Context: 256K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Prior-gen Codex variant of GPT-5.2 for agentic coding. — Specialized coding variant of GPT-5.2 focused on software engineering tasks
💰 $1.50 in / $12.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Specialized for software engineering
  • Enhanced agentic coding
  • Multi-file refactoring
  • Advanced debugging capabilities

📊 Benchmarks

SOTA
SWE-Bench
95.1%
HumanEval
72.8%
Terminal-Bench
78.2%
SWE-bench Verified
80.4%
LiveCodeBench
Major Release

Mistral Large 3

Mistral
Released: 2025-12-15
Type: LLM
Size: ~123B
Architecture: Dense Transformer
Context: 256K tokens
Knowledge cutoff: 2025-07
License: Mistral Commercial License
Mistral's flagship proprietary model, tuned for European enterprise. — Mistral's flagship model competing with GPT-5 class models at a fraction of the cost
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Mistral La Plateforme · Azure · AWS Bedrock

✨ Key Features

  • 128K context window
  • Improved multilingual capabilities
  • Enhanced function calling
  • Competitive with GPT-5 class models

📊 Benchmarks

89.4%
MMLU
91.2%
HumanEval
82.1%
MATH
80.7%
MMLU-Pro
68.0%
GPQA Diamond
76.8%
MMMU
46.5%
LiveCodeBench
4.1%
HLE
36.2%
SciCode
24.6%
TAU2-bench
15.9%
TerminalBench-Hard
36.2%
IF-Bench
34.7%
LiveCodeBench Reasoning
22.8
AA Intelligence Index
1415
Chatbot Arena Elo
Update

GPT-5.2

OpenAI
Released: 2025-12-11
Type: LLM
Size: ~200B
Architecture: MoE
Context: 400K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Late-2025 GPT-5 refresh with improved reasoning and steerability. — Iterative improvement on GPT-5.1 with enhanced reasoning and faster performance
💰 $2.00 in / $10.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Enhanced reasoning capabilities
  • Improved adaptive reasoning
  • Better multimodal understanding
  • Faster inference

📊 Benchmarks

92.8%
MMLU
88.5%
MATH
95.8%
HumanEval
87.4%
MMLU-Pro
72.5%
SWE-bench Verified
90.3%
GPQA Diamond
92.1%
AIME 2025
88.9%
LiveCodeBench
35.4%
HLE
52.1%
SciCode
84.8%
TAU2-bench
47.0%
TerminalBench-Hard
75.4%
IF-Bench
72.7%
LiveCodeBench Reasoning
51.3
AA Intelligence Index
1435
Chatbot Arena Elo

November

Major Release

Claude Opus 4.5

Anthropic
Released: 2025-11-24
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 500K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Anthropic's top-tier reasoning model for complex research and agents. — Anthropic's most capable model with breakthrough coding performance and major price reduction
💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • First model to break 80.9% on SWE-Bench Verified
  • 67% price reduction vs previous Opus
  • Extended reasoning capabilities
  • Advanced coding performance

📊 Benchmarks

80.9%
SWE-bench
92.8%
MMLU
95.0%
HumanEval
89.5%
MMLU-Pro
78.9%
SWE-bench Verified
86.6%
GPQA Diamond
90.5%
AIME 2025
87.1%
LiveCodeBench
28.4%
HLE
49.5%
SciCode
89.5%
TAU2-bench
47.0%
TerminalBench-Hard
58.0%
IF-Bench
74.0%
LiveCodeBench Reasoning
49.7
AA Intelligence Index
1469
Chatbot Arena Elo
Major Release

Gemini 3 Pro

Google
Released: 2025-11-18
Type: Multimodal
Size: ~1T MoE
Architecture: Sparse MoE
Context: 1M tokens
Knowledge cutoff: 2025-09
License: Proprietary
First Gemini 3 tier release; strong multimodal + long-context. — Google's flagship model with Deep Think mode, ranked #1 on LMSYS Arena at launch
💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • 1M token context window
  • Deep Think reasoning mode
  • Solved 5/6 IMO 2025 problems
  • #1 on LMSYS Arena

📊 Benchmarks

87.5%
ARC-AGI
93.2%
MMLU
#1
LMSYS Arena
89.4%
MMLU-Pro
82.1%
MMMU
78.5%
GPQA Diamond
68.2%
SWE-bench Verified
Major Release

GPT-5.1

OpenAI
Released: 2025-11-12
Type: LLM
Size: ~200B
Architecture: MoE
Context: 400K tokens
Knowledge cutoff: 2025-06
License: Proprietary
Maintenance update to GPT-5 with steerability + latency improvements. — Major GPT-5 iteration with adaptive reasoning and perfect scores on math competitions
💰 $2.25 in / $11.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Adaptive reasoning modes
  • Perfect 100% on AIME 2025
  • 87.5% on ARC-AGI
  • Enhanced multimodal capabilities

📊 Benchmarks

87.5%
ARC-AGI
100%
AIME 2025
92.5%
MMLU
87.0%
MMLU-Pro
70.1%
SWE-bench Verified
87.3%
GPQA Diamond
86.8%
LiveCodeBench
26.5%
HLE
43.3%
SciCode
81.9%
TAU2-bench
45.5%
TerminalBench-Hard
72.9%
IF-Bench
75.0%
LiveCodeBench Reasoning
47.7
AA Intelligence Index
1439
Chatbot Arena Elo

August

Major Release

GPT-5

OpenAI
Released: 2025-08-15
Type: LLM
Size: ~200B
Architecture: MoE with unified reasoning router
Context: 400K tokens
Knowledge cutoff: 2025-05
License: Proprietary
OpenAI's flagship unified reasoning + chat model replacing the GPT-4 line. — OpenAI's next-generation flagship model with adaptive reasoning capabilities
💰 $2.50 in / $12.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Adaptive reasoning (routes between quick and deep thinking)
  • Improved math and coding
  • Enhanced multimodal reasoning
  • New safety architecture

📊 Benchmarks

91.0%
MMLU
95.1%
HumanEval
90.1%
MATH
80.6%
MMLU-Pro
67.4%
SWE-bench Verified
67.3%
GPQA Diamond
55.8%
LiveCodeBench
86.1%
MATH-500
36.7%
AIME 2025
5.4%
HLE
38.8%
SciCode
67.0%
TAU2-bench
18.2%
TerminalBench-Hard
45.6%
IF-Bench
25.0%
LiveCodeBench Reasoning
23.9
AA Intelligence Index
1434
Chatbot Arena Elo

July

Update

Claude Opus 4.1

Anthropic
Released: 2025-07-15
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2025-03
License: Proprietary
Mid-2025 Opus refresh focused on agentic coding reliability. — Iterative improvement on Claude Opus 4 with enhanced multi-file refactoring
💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Improved multi-file refactoring
  • Enhanced agentic capabilities
  • Better long-context performance
  • Reduced hallucinations

📊 Benchmarks

75.2%
SWE-bench
91.2%
MMLU
94.0%
HumanEval
74.5%
SWE-bench Verified
79.1%
GPQA Diamond

June

Update

Gemini 2.5 Flash

Google
Released: 2025-06-20
Type: Multimodal
Size: ~175B
Architecture: Dense multimodal transformer
Context: 1M tokens
Knowledge cutoff: 2025-01
License: Proprietary
Google's cost-optimized multimodal model with thinking mode. — Google's fast and cost-effective model with enhanced image capabilities
💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • Enhanced image editing stabilization
  • Faster inference
  • Improved multimodal understanding
  • Cost-effective deployment

📊 Benchmarks

87.5%
MMLU
2x Gemini 2.0 Flash
Speed
High
Image Quality
83.2%
MMLU-Pro
96.2%
HumanEval
79.0%
GPQA Diamond
79.7%
MMMU
69.5%
LiveCodeBench
98.1%
MATH-500
82.3%
AIME 2025
11.1%
HLE
39.4%
SciCode
31.6%
TAU2-bench
13.6%
TerminalBench-Hard
50.3%
IF-Bench
61.7%
LiveCodeBench Reasoning
27.0
AA Intelligence Index
1411
Chatbot Arena Elo

May

Major Release

Claude Sonnet 4

Anthropic
Released: 2025-05-22
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2025-03
License: Proprietary
Claude 4 mid-tier with strong coding and long-horizon agentic reliability. — Latest generation Claude model with significant performance improvements
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Enhanced reasoning capabilities
  • Improved safety measures
  • Advanced multimodal understanding
  • Extended context window

📊 Benchmarks

88.7%
MMLU
94.5%
HumanEval
76.8%
MATH
72.3%
SWE-bench Verified
74.0%
GPQA Diamond

February

Update

Claude Sonnet 3.7

Anthropic
Released: 2025-02-24
Type: LLM
Size: ~300B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2024-11
License: Proprietary
Extended-thinking update to Sonnet 3.5 with visible reasoning toggle. — Iterative improvement on Claude 3.5 with enhanced capabilities
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Improved reasoning
  • Better code generation
  • Enhanced safety
  • Reduced hallucinations

📊 Benchmarks

86.1%
MMLU
93.2%
HumanEval
74.1%
MATH
62.3%
SWE-bench Verified
68.3%
GPQA Diamond

2024

December

Major Release

DeepSeek-V3

DeepSeek
Released: 2024-12-26
Type: LLM
Size: 671B
Architecture: Sparse MoE (37B active / 671B total)
Context: 128K tokens
Knowledge cutoff: 2024-07
License: DeepSeek License (open weights)
DeepSeek's breakthrough open-weight MoE rivaling GPT-4-class quality. — DeepSeek's most advanced open-source model with MoE architecture
💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI

✨ Key Features

  • Mixture of Experts architecture
  • Cost-effective training
  • Open source release
  • Strong reasoning capabilities

📊 Benchmarks

88.5%
MMLU
90.6%
HumanEval
61.6%
MATH
75.2%
MMLU-Pro
55.7%
GPQA Diamond
35.9%
LiveCodeBench
88.7%
MATH-500
25.3%
AIME 2025
3.6%
HLE
35.4%
SciCode
22.8%
TAU2-bench
6.8%
TerminalBench-Hard
34.8%
IF-Bench
29.0%
LiveCodeBench Reasoning
16.5
AA Intelligence Index
1358
Chatbot Arena Elo
Major Release

Gemini 2.0 Flash

Google
Released: 2024-12-11
Type: Multimodal
Size: ~175B
Architecture: Dense multimodal transformer
Context: 1M tokens
Knowledge cutoff: 2024-08
License: Proprietary
First Gemini 2 model — fast, cheap, multimodal, with tool use native. — Google's next-generation model with native multimodal capabilities
💰 $0.10 in / $0.40 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text, image, audio 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • Native multimodal generation
  • Real-time API
  • Agentic capabilities
  • Enhanced speed

📊 Benchmarks

85.8%
MMLU
90.7%
HumanEval
58.8%
MATH
78.2%
MMLU-Pro
63.6%
GPQA Diamond
70.7%
MMMU
21.0%
LiveCodeBench
91.1%
MATH-500
30.0%
AIME 2025
4.7%
HLE
34.0%
SciCode
29.5%
TAU2-bench
3.8%
TerminalBench-Hard
40.2%
IF-Bench
28.3%
LiveCodeBench Reasoning
16.8
AA Intelligence Index

August

Major Release

Grok-2

xAI
Released: 2024-08-13
Type: LLM
Size: ~314B
Architecture: Transformer
Context: 128K tokens
Knowledge cutoff: Real-time (X feed)
License: Proprietary
Elon's second-gen Grok — real-time X/Twitter data access — xAI's flagship model with real-time web access and multimodal capabilities
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 xAI API · x.com (Grok)

✨ Key Features

  • Real-time information access
  • Multimodal understanding
  • X platform integration
  • Conversational AI

📊 Benchmarks

84.0%
MMLU
86.3%
HumanEval
56.0%
MATH
70.9%
MMLU-Pro
51.0%
GPQA Diamond
26.7%
LiveCodeBench
77.8%
MATH-500
13.3%
AIME 2025
3.8%
HLE
28.5%
SciCode
13.9
AA Intelligence Index

🔄 What's new vs previous version

  • Vision input
  • Real-time X data
  • Improved reasoning

June

Major Release

Claude 3.5 Sonnet

Anthropic
Released: 2024-06-20
Type: LLM
Size: ~175B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2024-04
License: Proprietary
The mid-2024 Sonnet release that set the SOTA bar for coding and agents. — Anthropic's most intelligent model with significantly improved capabilities
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Improved coding capabilities
  • Enhanced reasoning
  • Vision capabilities

📊 Benchmarks

88.7%
MMLU
89.9%
HumanEval
71.1%
MATH
75.1%
MMLU-Pro
49.0%
SWE-bench Verified
56.0%
GPQA Diamond
38.1%
LiveCodeBench
69.5%
MATH-500
9.7%
AIME 2025
3.7%
HLE
31.6%
SciCode
14.2
AA Intelligence Index
1372
Chatbot Arena Elo

March

Major Release

Claude 3 Opus

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Anthropic's most powerful pre-Claude 4 model — tops GPT-4 on reasoning — Most capable model in the Claude 3 family with near-human performance on complex tasks
💰 $18.75 in / $75.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Advanced reasoning
  • Multimodal capabilities
  • Constitutional AI training

📊 Benchmarks

86.8%
MMLU
84.8%
HumanEval
60.1%
MATH
69.6%
MMLU-Pro
48.9%
GPQA Diamond
95.0%
GSM8K
27.9%
LiveCodeBench
64.1%
MATH-500
3.3%
AIME 2025
3.1%
HLE
23.3%
SciCode
18.0
AA Intelligence Index
1063
Chatbot Arena Elo

🔄 What's new vs previous version

  • 200K context
  • Vision input
  • +15% MMLU vs Claude 2
  • Tool use
Major Release

Claude 3 Sonnet

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Balanced Claude 3 variant — best price/performance in the family — Balanced model offering strong performance with faster response times
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Balanced capability and speed
  • Multimodal input
  • Strong reasoning

📊 Benchmarks

79.0%
MMLU
71.3%
HumanEval
40.5%
MATH
57.9%
MMLU-Pro
40.0%
GPQA Diamond
17.5%
LiveCodeBench
41.4%
MATH-500
4.7%
AIME 2025
3.8%
HLE
22.9%
SciCode
10.3
AA Intelligence Index
1018
Chatbot Arena Elo
Major Release

Claude 3 Haiku

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~25B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Fastest and cheapest Claude 3 — sub-second latency at $0.25/M — Fastest and most compact model in the Claude 3 family
💰 $0.25 in / $1.25 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Fastest response times
  • Multimodal input
  • Cost-effective

📊 Benchmarks

75.2%
MMLU
75.7%
HumanEval
38.9%
MATH
37.4%
GPQA Diamond
15.4%
LiveCodeBench
39.4%
MATH-500
1.0%
AIME 2025
3.9%
HLE
18.6%
SciCode
21.1%
TAU2-bench
0.8%
TerminalBench-Hard
36.1%
IF-Bench
21.0%
LiveCodeBench Reasoning
12.3
AA Intelligence Index
1001
Chatbot Arena Elo

February

Major Release

Mistral Large

Mistral
Released: 2024-02-26
Type: LLM
Size: ~70B
Context: 32K tokens
Top-tier reasoning model with strong multilingual capabilities
💰 $4.00 in / $12.00 out per 1M tok 🎛 In: text 📤 Out: text

✨ Key Features

  • 32K context window
  • Multilingual capabilities
  • Function calling
  • JSON mode

📊 Benchmarks

81.2%
MMLU
70.6%
HumanEval
89.2%
HellaSwag
51.5%
MMLU-Pro
35.1%
GPQA Diamond
17.8%
LiveCodeBench
52.7%
MATH-500
0.0%
AIME 2025
3.4%
HLE
20.8%
SciCode
9.9
AA Intelligence Index
Major Release

Gemini 1.5 Pro

Google
Released: 2024-02-15
Type: Multimodal
Size: ~175B
Architecture: MoE Transformer
Context: 1M tokens
Knowledge cutoff: 2023-11
License: Proprietary
Google's first 1M-context model — multimodal needle-in-haystack champion — Google's next-generation model with breakthrough long context capabilities
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

  • 1M token context window
  • Multimodal understanding
  • Video analysis
  • Audio processing

📊 Benchmarks

81.9%
MMLU
83.4%
HumanEval
58.5%
MATH
65.7%
MMLU-Pro
37.1%
GPQA Diamond
24.4%
LiveCodeBench
67.3%
MATH-500
8.0%
AIME 2025
3.9%
HLE
27.4%
SciCode
12.0
AA Intelligence Index

🔄 What's new vs previous version

  • 1M token context
  • Multi-hour video understanding
  • MoE architecture

January

Major Release

text-embedding-3-large

OpenAI
Released: 2024-01-25
Type: Embedding
Size: ~7B
Architecture: Transformer encoder
License: Proprietary
OpenAI's best embedding model — 3× cheaper than ada-002 with better MTEB — OpenAI's most powerful text embedding model
🎛 In: text 📤 Out: embeddings 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 3072 embedding dimensions
  • Improved retrieval performance
  • Reduced hallucinations
  • Multi-language support

📊 Benchmarks

64.6%
MTEB Score
3072
Dimensions
100+
Languages
64.6%
MTEB avg
Major Release

GPT-4 Turbo

OpenAI
Released: 2024-01-25
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 128K tokens
Knowledge cutoff: 2023-04
License: Proprietary
GPT-4 with 128K context and knowledge through April 2023 — 3× cheaper than GPT-4 — Latest iteration of GPT-4 with improved performance and longer context window
💰 $10.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 128K context window
  • Improved instruction following
  • Enhanced reasoning capabilities
  • Reduced hallucinations

📊 Benchmarks

86.4%
MMLU
91.8%
HumanEval
95.3%
HellaSwag
69.4%
MMLU-Pro
29.1%
LiveCodeBench
73.7%
MATH-500
15.0%
AIME 2025
3.3%
HLE
31.9%
SciCode
13.7
AA Intelligence Index

🔄 What's new vs previous version

  • 128K context (8× increase)
  • Updated knowledge cutoff
  • 3× cheaper than GPT-4

2023

December

Major Release

Grok-1

xAI
Released: 2023-12-07
Type: LLM
Size: ~314B
Architecture: MoE Transformer
Context: 8K tokens
Knowledge cutoff: 2023-10-01
License: Apache 2.0
xAI's open-source release — 314B MoE, first frontier model fully open-sourced — xAI's first major language model with real-time internet access
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Self-hosted (HuggingFace)

✨ Key Features

  • Real-time information
  • Conversational interface
  • X platform integration
  • Uncensored responses

📊 Benchmarks

73.0%
MMLU
63.2%
HumanEval
62.9%
GSM8K
11.7
AA Intelligence Index
Research

AlphaCode 2

Google DeepMind
Released: 2023-12-06
Type: Code
Size: ~340B
Architecture: Transformer (Gemini-based)
License: Proprietary (research)
DeepMind's coding specialist — top 15% of competitive programmers — DeepMind's advanced code generation system for competitive programming
🎛 In: text, code 📤 Out: code

✨ Key Features

  • Advanced code generation
  • Competitive programming
  • Multi-language support
  • Problem decomposition

📊 Benchmarks

1747
Codeforces Rating
85th percentile
Problem Solving
10+ languages
Language Support
Top 15%
Codeforces percentile
Major Release

Gemini Ultra

Google
Released: 2023-12-06
Type: Multimodal
Size: ~540B
Architecture: MoE Transformer
Knowledge cutoff: 2023-06
License: Proprietary
Google's first model to beat GPT-4 on MMLU — 90%+ with CoT — Google's most capable multimodal AI model
🎛 In: text, image, audio, video 📤 Out: text 🌐 Google One AI Premium · Vertex AI

✨ Key Features

  • Multimodal reasoning
  • Text, image, audio, video understanding
  • Advanced mathematical reasoning
  • Code generation

📊 Benchmarks

90.0%
MMLU
74.4%
HumanEval
53.2%
MATH

🔄 What's new vs previous version

  • First model to exceed human expert on MMLU
  • Native multimodal
  • 32K context
Major Release

Gemini Pro

Google
Released: 2023-12-06
Type: Multimodal
Size: ~175B
Architecture: Transformer
Context: 32K tokens
Knowledge cutoff: 2023-06
License: Proprietary
Google's workhorse Gemini model — free tier in Google AI Studio — Google's balanced model for wide range of tasks
🎛 In: text, image 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

  • Multimodal capabilities
  • 32K context window
  • Fast inference
  • Scalable deployment

📊 Benchmarks

79.1%
MMLU
67.7%
HumanEval
32.6%
MATH

November

Update

Claude 2.1

Anthropic
Released: 2023-11-21
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-01
License: Proprietary
Claude 2 update — 200K context and reduced hallucinations — Significant improvements in accuracy and honesty over Claude 2
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

  • 200K context window
  • Reduced hallucination rates
  • Enhanced accuracy
  • Tool use capabilities

📊 Benchmarks

73.1%
MMLU
15.9%
HumanEval
71.1%
MATH
49.5%
MMLU-Pro
31.9%
GPQA Diamond
19.5%
LiveCodeBench
37.4%
MATH-500
3.3%
AIME 2025
4.2%
HLE
18.4%
SciCode
9.3
AA Intelligence Index

🔄 What's new vs previous version

  • 200K context (2× Claude 2)
  • 50% fewer hallucinations
  • Tool use beta
Major Release

Whisper v3

OpenAI
Released: 2023-11-06
Type: Audio
Size: ~1.55B
Architecture: Transformer encoder-decoder
License: MIT
State-of-the-art open speech recognition — 99 languages, open weights — OpenAI's multilingual speech recognition system
🎛 In: audio 📤 Out: text 🌐 OpenAI API · Self-hosted (HuggingFace) · Groq

✨ Key Features

  • Multilingual speech recognition
  • 99 language support
  • Robust noise handling
  • Real-time transcription

📊 Benchmarks

5.1%
WER English
99 languages
Language Coverage
0.8x
Real-time Factor
~2.7%
WER (English)

August

Major Release

Code Llama 34B

Meta
Released: 2023-08-24
Type: Code
Size: 34B
Architecture: Transformer (Llama 2 fine-tune)
Context: 100K tokens
Knowledge cutoff: 2023-01
License: Llama 2 Community License
Meta's code-specialized open model — top open-source coding at launch — Specialized model for code generation built on Llama 2
🎛 In: text, code 📤 Out: code, text 🌐 Self-hosted · Together AI · Fireworks AI

✨ Key Features

  • Code generation
  • Code completion
  • Multiple programming languages
  • Large context window

📊 Benchmarks

48.8%
HumanEval
55.0%
MBPP
45.9%
MultiPL-E

July

Major Release

Llama 2 70B

Meta
Released: 2023-07-18
Type: LLM
Size: 70B
Architecture: Transformer
Context: 4K tokens
Knowledge cutoff: 2023-01
License: Llama 2 Community License
Meta's open-weight landmark — 70B that matched GPT-3.5 and ignited open AI — Meta's open-source large language model with commercial license
🎛 In: text 📤 Out: text 🌐 Self-hosted · Together AI · AWS Bedrock · Azure AI

✨ Key Features

  • Open source
  • Commercial license
  • Improved safety
  • Enhanced performance

📊 Benchmarks

68.9%
MMLU
29.9%
HumanEval
13.5%
MATH
Major Release

Claude 2

Anthropic
Released: 2023-07-11
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 100K tokens
Knowledge cutoff: 2023-01
License: Proprietary
Claude's first major leap — 100K context and better at instructions — Significant improvement over Claude 1 with enhanced capabilities
🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

  • 100K context window
  • Improved safety
  • Enhanced reasoning
  • Better code generation

📊 Benchmarks

78.5%
MMLU
71.2%
HumanEval
88.0%
MATH

🔄 What's new vs previous version

  • 100K context (10× Claude 1)
  • Improved reasoning
  • Reduced refusals

May

Major Release

PaLM 2

Google
Released: 2023-05-10
Type: LLM
Size: ~340B
Architecture: Transformer
Context: 8K tokens
Knowledge cutoff: 2023-02
License: Proprietary
Google's multilingual flagship — powers Bard 2023, 100+ languages — Google's improved large language model powering Bard and other services
💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Google Cloud Vertex AI · Google AI Studio

✨ Key Features

  • Multilingual capabilities
  • Reasoning improvements
  • Coding abilities
  • Multiple model sizes

📊 Benchmarks

78.3%
MMLU
77.2%
HumanEval
34.3%
MATH
8.6
AA Intelligence Index

March

Major Release

GPT-4

OpenAI
Released: 2023-03-14
Type: LLM
Size: ~175B
Architecture: Transformer (reported MoE)
Context: 8K tokens (32K with gpt-4-32k)
Knowledge cutoff: 2021-09
License: Proprietary
The model that changed everything — GPT-4 set the standard for capable AI — OpenAI's most advanced system producing safer and more useful responses
💰 $30.00 in / $60.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 8K context window
  • Multimodal capabilities
  • Enhanced reasoning
  • Improved factual accuracy

📊 Benchmarks

86.4%
MMLU
67.0%
HumanEval
52.9%
MATH
~90th percentile
Bar exam
12.8
AA Intelligence Index

🔄 What's new vs previous version

  • Passed bar exam (top 10%)
  • Vision input (GPT-4V)
  • Multimodal
Major Release

Claude 1.3

Anthropic
Released: 2023-03-14
Type: LLM
Size: ~52B
Architecture: Transformer
Context: 100K tokens
Knowledge cutoff: 2022-12
License: Proprietary
Anthropic's first public model — 100K context ahead of its time — Anthropic's AI assistant built using Constitutional AI methods
🎛 In: text 📤 Out: text 🌐 Anthropic API

✨ Key Features

  • Constitutional AI
  • Helpful and harmless
  • Long conversations
  • Improved reasoning

📊 Benchmarks

75.0%
MMLU
56.0%
HumanEval
36.0%
MATH

2022

November

Major Release

ChatGPT (GPT-3.5 Turbo)

OpenAI
Released: 2022-11-30
Type: LLM
Size: ~175B
Architecture: Transformer (RLHF fine-tune of GPT-3.5)
Context: 16K tokens
Knowledge cutoff: 2021-09
License: Proprietary
The product that launched the AI era — 100M users in 2 months — Conversational AI that sparked mainstream adoption of large language models
🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Conversational interface
  • Fine-tuned for chat
  • RLHF training
  • Fast response times

📊 Benchmarks

70.0%
MMLU
48.1%
HumanEval
34.1%
MATH

🔄 What's new vs previous version

  • Conversational interface
  • RLHF alignment
  • Faster and cheaper than GPT-4

April

Research

PaLM

Google
Released: 2022-04-04
Type: LLM
Size: 540B
Architecture: Pathways Transformer
License: Proprietary (research)
Google's 540B pathways model — first to demonstrate chain-of-thought at scale — Google's 540-billion parameter language model demonstrating breakthrough capabilities
🎛 In: text 📤 Out: text

✨ Key Features

  • Large parameter count
  • Few-shot learning
  • Reasoning capabilities
  • Code generation

📊 Benchmarks

69.3%
MMLU
26.2%
HumanEval
8.8%
MATH
58.1%
BIG-bench

2021

August

Major Release

Codex

OpenAI
Released: 2021-08-10
Type: Code
Size: ~12B
Architecture: Transformer (GPT-3 fine-tune)
Context: 4K tokens
License: Proprietary
GPT-3 trained on code — the engine behind GitHub Copilot v1 — AI system that translates natural language to code
🎛 In: text, code 📤 Out: code 🌐 OpenAI API (deprecated) · GitHub Copilot

✨ Key Features

  • Code generation
  • Natural language to code
  • Multiple programming languages
  • GitHub Copilot integration

📊 Benchmarks

28.8%
HumanEval
59.6%
MBPP
25.0%
APPS

2020

June

Major Release

GPT-3

OpenAI
Released: 2020-06-11
Type: LLM
Size: 175B
Architecture: Transformer
Context: 2K tokens
Knowledge cutoff: 2019-10
License: Proprietary (API)
The model that showed scaling works — 175B parameters, few-shot learning pioneer — Breakthrough large language model that demonstrated emergent capabilities
🎛 In: text 📤 Out: text 🌐 OpenAI API

✨ Key Features

  • 175 billion parameters
  • Few-shot learning
  • Text generation
  • Multiple capabilities

📊 Benchmarks

43.9%
MMLU
0.0%
HumanEval
5.2%
MATH

Frequently asked questions

When are new AI models coming out?

On average, new AI models arrive every 3 days. The most recent release tracked here is Anthropic Claude Opus 4.8, released on 2026-05-28. Bookmark this page and check back regularly for new additions.

What is the next AI model release?

This tracker logs models as they ship. The newest confirmed release is Anthropic Claude Opus 4.8 (2026-05-28). New releases are added within days of launch. Bookmark this page to catch the next one.

What are the upcoming and latest AI model releases?

The five most recent AI model releases tracked here:

  • Anthropic Claude Opus 4.8 (2026-05-28)
  • OpenBMB MiniCPM5-1B (2026-05-25)
  • Alibaba Qwen3.7 Max (2026-05-19)
  • Google Gemini 3.5 Flash (2026-05-19)
  • China Mobile JT-35B-Flash (2026-05-14)

How often are new AI models released?

Based on recent history, new AI models arrive roughly every 3 days. 59 models have been added to this tracker in the last 90 days.

Where can I see AI model releases from the last 24 hours or this week?

Scroll to the top of the timeline on this page for the most recent releases. This tracker was last updated 2026-06-01 11:00 UTC. For same-day AI news and announcements, visit the AI Flash Report homepage.