When are new AI models coming out?

On average, new AI models arrive every 2 days. The most recent release tracked here is Anthropic Claude Fable 5, released on 2026-06-09. Bookmark this page and check back regularly for new additions.

What is the next AI model release?

This tracker logs models as they ship. The newest confirmed release is Anthropic Claude Fable 5 (2026-06-09). New releases are added within days of launch. Bookmark this page to catch the next one.

What are the upcoming and latest AI model releases?

The five most recent AI model releases tracked here: Anthropic Claude Fable 5 (2026-06-09), Cohere North Mini Code (2026-06-09), NVIDIA Nemotron 3 Ultra 550B A55B (2026-06-04), Google Gemma 4 12B (2026-06-03), Alibaba Qwen3.7 Plus (2026-06-01). See the full timeline on this page.

How often are new AI models released?

Based on recent history, new AI models arrive roughly every 2 days. 41 models have been added to this tracker in the last 90 days.

Where can I see AI model releases from the last 24 hours or this week?

Scroll to the top of the timeline on this page for the most recent releases. This tracker was last updated 2026-07-06 10:36 UTC. For same-day AI news and announcements, visit the AI Flash Report homepage.

AI Model Release Timeline 2025–2026 — Every LLM Launch Tracked

Every LLM launch tracked — GPT-5, Claude 4, Gemini 2, Llama 4 and more. Updated weekly with launch dates, benchmarks, and capabilities.

2026

June

Major Release

Claude Fable 5

Anthropic

Released: 2026-06-09

Type: LLM

Context: 1M tokens

License: Proprietary

Anthropic Claude Fable 5 — 1M tokens context, reasoning model.

💰 $10.00 in / $50.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

92.6%

GPQA Diamond

53.3%

HLE

60.2%

SciCode

98.5%

TAU2-bench

62.9%

TerminalBench-Hard

63.5%

IF-Bench

70.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

North Mini Code

Cohere

Released: 2026-06-09

Type: LLM

Context: 256K tokens

License: Apache 2.0

Cohere North Mini Code — 256K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.7%

GPQA Diamond

9.9%

HLE

38.2%

SciCode

37.4%

TAU2-bench

31.1%

TerminalBench-Hard

57.6%

IF-Bench

32.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Nemotron 3 Ultra 550B A55B

NVIDIA

Released: 2026-06-04

Type: LLM

Context: 262K tokens

License: OpenMDW

NVIDIA Nemotron 3 Ultra 550B A55B — 262K tokens context, reasoning model.

💰 $0.60 in / $2.60 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.7%

GPQA Diamond

26.6%

HLE

39.9%

SciCode

83.3%

TAU2-bench

36.4%

TerminalBench-Hard

81.4%

IF-Bench

67.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Gemma 4 12B

Google

Released: 2026-06-03

Type: LLM

Context: 131K tokens

License: Apache 2.0

Google Gemma 4 12B — 131K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

75.3%

GPQA Diamond

14.8%

HLE

38.2%

SciCode

36.3%

TAU2-bench

18.2%

TerminalBench-Hard

73.5%

IF-Bench

55.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.7 Plus

Alibaba

Released: 2026-06-01

Type: LLM

Context: 1M tokens

License: Proprietary

Alibaba Qwen3.7 Plus — 1M tokens context, reasoning model.

💰 $0.40 in / $1.16 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

90.0%

GPQA Diamond

33.4%

HLE

45.5%

SciCode

93.0%

TAU2-bench

47.0%

TerminalBench-Hard

78.0%

IF-Bench

65.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

MiniMax-M3

MiniMax

Released: 2026-06-01

Type: LLM

Context: 1M tokens

License: Proprietary

MiniMax MiniMax-M3 — 1M tokens context, reasoning model.

💰 $0.30 in / $1.20 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

92.9%

GPQA Diamond

37.1%

HLE

45.4%

SciCode

88.9%

TAU2-bench

42.4%

TerminalBench-Hard

82.9%

IF-Bench

74.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

May

Major Release

Step 3.7 Flash

StepFun

Released: 2026-05-29

Type: LLM

Context: 256K tokens

License: Apache 2.0

StepFun Step 3.7 Flash — 256K tokens context, reasoning model.

💰 $0.20 in / $1.15 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

80.9%

GPQA Diamond

19.9%

HLE

40.0%

SciCode

98.5%

TAU2-bench

35.6%

TerminalBench-Hard

67.3%

IF-Bench

63.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Claude Opus 4.8

Anthropic

Released: 2026-05-28

Type: LLM

Context: 1M tokens

License: Proprietary

Anthropic Claude Opus 4.8 — 1M tokens context, reasoning model.

💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

92.0%

GPQA Diamond

45.7%

HLE

53.5%

SciCode

94.4%

TAU2-bench

58.3%

TerminalBench-Hard

62.2%

IF-Bench

67.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

LFM2.5-8B-A1B

Liquid AI

Released: 2026-05-28

Type: LLM

Context: 32K tokens

License: lfm 1.0

Liquid AI LFM2.5-8B-A1B — 32K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

51.3%

GPQA Diamond

6.9%

HLE

7.8%

SciCode

16.1%

TAU2-bench

4.5%

TerminalBench-Hard

55.6%

IF-Bench

0.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

HyperNova 60B 2605

Multiverse Computing

Released: 2026-05-26

Type: LLM

Context: 131K tokens

License: Apache 2.0

Multiverse Computing HyperNova 60B 2605 — 131K tokens context, reasoning model.

💰 $0.04 in / $0.14 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

73.3%

GPQA Diamond

15.1%

HLE

33.0%

SciCode

63.2%

TAU2-bench

23.5%

TerminalBench-Hard

66.5%

IF-Bench

31.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

MiniCPM5-1B

OpenBMB

Released: 2026-05-25

Type: LLM

Context: 128K tokens

License: apache-2.0

OpenBMB MiniCPM5-1B — 128K tokens context.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

27.8%

GPQA Diamond

6.5%

HLE

4.4%

SciCode

81.0%

TAU2-bench

0.0%

TerminalBench-Hard

49.3%

IF-Bench

3.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.7 Max

Alibaba

Released: 2026-05-19

Type: LLM

Context: 1M tokens

License: Proprietary

Alibaba Qwen3.7 Max — 1M tokens context, reasoning model.

💰 $2.50 in / $7.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

92.3%

GPQA Diamond

38.1%

HLE

48.8%

SciCode

94.7%

TAU2-bench

50.8%

TerminalBench-Hard

80.5%

IF-Bench

69.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

Gemini 3.5 Flash

Google

Released: 2026-05-19

Type: LLM

Context: 1M tokens

License: Proprietary

Google Gemini 3.5 Flash — 1M tokens context, reasoning model.

💰 $1.50 in / $9.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.8%

GPQA Diamond

23.1%

HLE

48.8%

SciCode

58.8%

TAU2-bench

46.2%

TerminalBench-Hard

47.3%

IF-Bench

53.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

JT-35B-Flash

China Mobile

Released: 2026-05-14

Type: LLM

Context: 256K tokens

License: Proprietary

China Mobile JT-35B-Flash — 256K tokens context.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.9%

GPQA Diamond

6.1%

HLE

29.1%

SciCode

99.1%

TAU2-bench

28.8%

TerminalBench-Hard

42.0%

IF-Bench

55.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

MiniCPM-V 4.6 1.3B

OpenBMB

Released: 2026-05-11

Type: LLM

Context: 262K tokens

License: Apache 2.0

OpenBMB MiniCPM-V 4.6 1.3B — 262K tokens context.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

30.5%

GPQA Diamond

4.9%

HLE

2.1%

SciCode

87.7%

TAU2-bench

0.0%

TerminalBench-Hard

26.7%

IF-Bench

6.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Ring-2.6-1T

InclusionAI

Released: 2026-05-08

Type: LLM

Context: 262K tokens

License: MIT

InclusionAI Ring-2.6-1T — 262K tokens context, reasoning model.

💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

85.7%

GPQA Diamond

18.3%

HLE

42.4%

SciCode

92.4%

TAU2-bench

28.8%

TerminalBench-Hard

44.6%

IF-Bench

64.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

GPT-5.5 Instant

OpenAI

Released: 2026-05-05

Type: LLM

Context: 400K tokens

Knowledge cutoff: 2025-08-31

License: Proprietary

OpenAI GPT-5.5 Instant — 400K tokens context, reasoning model.

💰 $5.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.6%

GPQA Diamond

20.3%

HLE

50.3%

SciCode

49.4%

TAU2-bench

42.4%

TerminalBench-Hard

71.5%

IF-Bench

55.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

April

Major Release

Grok 4.3

xAI

Released: 2026-04-30

Type: LLM

Context: 1M tokens

License: Proprietary

xAI Grok 4.3 — 1M tokens context, reasoning model.

💰 $1.25 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

90.1%

GPQA Diamond

35.0%

HLE

47.3%

SciCode

97.7%

TAU2-bench

37.9%

TerminalBench-Hard

81.3%

IF-Bench

64.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Granite 4.1 30B

IBM

Released: 2026-04-29

Type: LLM

Context: 131K tokens

License: Apache 2.0

IBM Granite 4.1 30B — 131K tokens context.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

48.1%

GPQA Diamond

4.2%

HLE

25.8%

SciCode

42.1%

TAU2-bench

2.3%

TerminalBench-Hard

44.4%

IF-Bench

18.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Granite 4.1 3B

IBM

Released: 2026-04-29

Type: LLM

Context: 131K tokens

License: Apache 2.0

IBM Granite 4.1 3B — 131K tokens context.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

31.4%

GPQA Diamond

3.4%

HLE

11.9%

SciCode

19.6%

TAU2-bench

2.3%

TerminalBench-Hard

33.7%

IF-Bench

3.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Granite 4.1 8B

IBM

Released: 2026-04-29

Type: LLM

Context: 131K tokens

License: Apache 2.0

IBM Granite 4.1 8B — 131K tokens context.

💰 $0.05 in / $0.10 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

43.3%

GPQA Diamond

3.8%

HLE

21.8%

SciCode

27.8%

TAU2-bench

0.0%

TerminalBench-Hard

38.6%

IF-Bench

12.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Mistral Medium 3.5

Mistral

Released: 2026-04-29

Type: LLM

Context: 256K tokens

License: Other

Mistral Mistral Medium 3.5 — 256K tokens context, reasoning model.

💰 $1.50 in / $7.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

74.8%

GPQA Diamond

12.8%

HLE

39.6%

SciCode

94.2%

TAU2-bench

33.3%

TerminalBench-Hard

68.8%

IF-Bench

61.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Nemotron 3 Nano Omni 30B A3B Reasoning

NVIDIA

Released: 2026-04-29

Type: LLM

Context: 256K tokens

License: NVIDIA Open Model License Agreement

NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning — 256K tokens context, reasoning model.

💰 $0.07 in / $0.30 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

46.9%

GPQA Diamond

5.3%

HLE

27.8%

SciCode

45.3%

TAU2-bench

8.3%

TerminalBench-Hard

63.2%

IF-Bench

35.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

DeepSeek V4 Flash

DeepSeek

Released: 2026-04-24

Type: LLM

Context: 1M tokens

License: MIT

DeepSeek DeepSeek V4 Flash — 1M tokens context, reasoning model.

💰 $0.14 in / $0.28 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

89.4%

GPQA Diamond

32.1%

HLE

44.9%

SciCode

95.0%

TAU2-bench

35.6%

TerminalBench-Hard

79.2%

IF-Bench

63.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

DeepSeek V4 Pro

DeepSeek

Released: 2026-04-24

Type: LLM

Context: 1M tokens

License: MIT

DeepSeek DeepSeek V4 Pro — 1M tokens context, reasoning model.

💰 $1.74 in / $3.48 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

88.8%

GPQA Diamond

35.9%

HLE

50.0%

SciCode

96.2%

TAU2-bench

46.2%

TerminalBench-Hard

76.5%

IF-Bench

66.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Ling-2.6-1T

InclusionAI

Released: 2026-04-23

Type: LLM

Context: 262K tokens

License: Mit

InclusionAI Ling-2.6-1T — 262K tokens context.

💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.2%

GPQA Diamond

8.2%

HLE

37.0%

SciCode

89.8%

TAU2-bench

31.1%

TerminalBench-Hard

56.9%

IF-Bench

34.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

GPT-5.5

OpenAI

Released: 2026-04-23

Type: LLM

Context: 922K tokens

License: Proprietary

OpenAI GPT-5.5 — 922K tokens context, reasoning model.

💰 $5.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

93.5%

GPQA Diamond

44.3%

HLE

56.1%

SciCode

93.9%

TAU2-bench

60.6%

TerminalBench-Hard

75.9%

IF-Bench

74.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Hy3-preview

Tencent

Released: 2026-04-23

Type: LLM

Context: 256K tokens

License: TENCENT HY COMMUNITY LICENSE AGREEMENT

Tencent Hy3-preview — 256K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.7%

GPQA Diamond

25.5%

HLE

41.2%

SciCode

92.7%

TAU2-bench

34.1%

TerminalBench-Hard

63.1%

IF-Bench

54.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.6 27B

Alibaba

Released: 2026-04-22

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.6 27B — 262K tokens context, reasoning model.

💰 $0.60 in / $3.60 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

84.2%

GPQA Diamond

21.6%

HLE

39.8%

SciCode

94.2%

TAU2-bench

34.8%

TerminalBench-Hard

67.6%

IF-Bench

68.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

MiMo-V2.5

Xiaomi

Released: 2026-04-22

Type: LLM

Context: 1M tokens

License: Mit

Xiaomi MiMo-V2.5 — 1M tokens context, reasoning model.

💰 $0.36 in / $1.80 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.9%

GPQA Diamond

25.2%

HLE

43.1%

SciCode

90.6%

TAU2-bench

41.7%

TerminalBench-Hard

67.1%

IF-Bench

62.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

MiMo-V2.5-Pro

Xiaomi

Released: 2026-04-22

Type: LLM

Context: 1M tokens

License: Mit

Xiaomi MiMo-V2.5-Pro — 1M tokens context, reasoning model.

💰 $1.00 in / $3.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.6%

GPQA Diamond

33.8%

HLE

50.2%

SciCode

94.2%

TAU2-bench

43.2%

TerminalBench-Hard

79.9%

IF-Bench

73.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Ling 2.6 Flash

InclusionAI

Released: 2026-04-21

Type: LLM

Context: 262K tokens

License: Mit

InclusionAI Ling 2.6 Flash — 262K tokens context.

💰 $0.10 in / $0.30 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

59.3%

GPQA Diamond

6.2%

HLE

27.1%

SciCode

86.0%

TAU2-bench

21.2%

TerminalBench-Hard

57.4%

IF-Bench

25.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.6 Max Preview

Alibaba

Released: 2026-04-20

Type: LLM

Context: 256K tokens

License: Proprietary

Alibaba Qwen3.6 Max Preview — 256K tokens context, reasoning model.

💰 $1.30 in / $7.80 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

88.8%

GPQA Diamond

28.9%

HLE

46.9%

SciCode

95.9%

TAU2-bench

43.9%

TerminalBench-Hard

76.6%

IF-Bench

69.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Kimi K2.6

Kimi

Released: 2026-04-20

Type: LLM

Context: 256K tokens

License: Modified MIT

Kimi Kimi K2.6 — 256K tokens context, reasoning model.

💰 $0.95 in / $4.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

91.1%

GPQA Diamond

35.9%

HLE

53.5%

SciCode

95.9%

TAU2-bench

43.9%

TerminalBench-Hard

76.0%

IF-Bench

69.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Qwen3.6 35B A3B

Alibaba

Released: 2026-04-16

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.6 35B A3B — 262K tokens context, reasoning model.

💰 $0.25 in / $1.49 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.1%

GPQA Diamond

20.2%

HLE

35.8%

SciCode

95.3%

TAU2-bench

34.8%

TerminalBench-Hard

64.4%

IF-Bench

63.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Claude Opus 4.7

Anthropic

Released: 2026-04-16

Type: LLM

Context: 1M tokens

Knowledge cutoff: 2026-01-01

License: Proprietary

Anthropic Claude Opus 4.7 — 1M tokens context, reasoning model.

💰 $6.25 in / $25.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

91.4%

GPQA Diamond

39.6%

HLE

54.5%

SciCode

88.6%

TAU2-bench

51.5%

TerminalBench-Hard

58.6%

IF-Bench

70.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

JT-MINI

China Mobile

Released: 2026-04-15

Type: LLM

Context: 128K tokens

License: Proprietary

China Mobile JT-MINI — 128K tokens context.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

67.6%

GPQA Diamond

6.6%

HLE

27.2%

SciCode

93.0%

TAU2-bench

18.2%

TerminalBench-Hard

36.7%

IF-Bench

11.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

EXAONE 4.5 33B

LG AI Research

Released: 2026-04-09

Type: LLM

Context: 262K tokens

License: EXAONE AI Model License Agreement 1.2 - NC

LG AI Research EXAONE 4.5 33B — 262K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

79.4%

GPQA Diamond

11.6%

HLE

28.0%

SciCode

78.1%

TAU2-bench

20.5%

TerminalBench-Hard

58.0%

IF-Bench

49.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Muse Spark

📊 Benchmarks

88.4%

GPQA Diamond

39.9%

HLE

51.5%

SciCode

91.5%

TAU2-bench

45.5%

TerminalBench-Hard

75.9%

IF-Bench

69.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

GLM-5.1

Z AI

Released: 2026-04-07

Type: LLM

Context: 200K tokens

License: Mit

Z AI GLM-5.1 — 200K tokens context, reasoning model.

💰 $1.40 in / $4.40 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

86.8%

GPQA Diamond

28.0%

HLE

43.8%

SciCode

97.7%

TAU2-bench

43.2%

TerminalBench-Hard

76.3%

IF-Bench

62.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Grok 4.20 0309 v2

xAI

Released: 2026-04-07

Type: LLM

Context: 2M tokens

License: Proprietary

xAI Grok 4.20 0309 v2 — 2M tokens context, reasoning model.

💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

91.1%

GPQA Diamond

32.2%

HLE

45.6%

SciCode

93.0%

TAU2-bench

37.9%

TerminalBench-Hard

81.2%

IF-Bench

58.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

Solar Pro 3

Upstage

Released: 2026-04-06

Type: LLM

Context: 128K tokens

License: Proprietary

Upstage Solar Pro 3 — 128K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

72.4%

GPQA Diamond

10.1%

HLE

24.7%

SciCode

86.3%

TAU2-bench

7.6%

TerminalBench-Hard

71.2%

IF-Bench

27.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

Gemma 4 E4B

Google

Released: 2026-04-03

Type: LLM

Context: 128K tokens

License: Apache 2.0

Google Gemma 4 E4B — 128K tokens context, reasoning model.

💰 $0.30 in / $1.25 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

57.6%

GPQA Diamond

3.7%

HLE

24.4%

SciCode

20.8%

TAU2-bench

8.3%

TerminalBench-Hard

44.2%

IF-Bench

30.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.6 Plus

Alibaba

Released: 2026-04-02

Type: LLM

Context: 1M tokens

License: Proprietary

Alibaba Qwen3.6 Plus — 1M tokens context, reasoning model.

💰 $0.50 in / $3.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

88.2%

GPQA Diamond

25.7%

HLE

40.7%

SciCode

97.7%

TAU2-bench

43.9%

TerminalBench-Hard

75.2%

IF-Bench

69.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemma 4 26B A4B

Google

Released: 2026-04-02

Type: LLM

Context: 256K tokens

License: Apache 2.0

Google Gemma 4 26B A4B — 256K tokens context, reasoning model.

💰 $0.13 in / $0.40 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

79.2%

GPQA Diamond

18.3%

HLE

40.0%

SciCode

43.6%

TAU2-bench

13.6%

TerminalBench-Hard

72.4%

IF-Bench

55.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemma 4 31B

Google

Released: 2026-04-02

Type: LLM

Context: 256K tokens

License: Apache 2.0

Google Gemma 4 31B — 256K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

85.7%

GPQA Diamond

22.7%

HLE

43.4%

SciCode

59.9%

TAU2-bench

36.4%

TerminalBench-Hard

75.6%

IF-Bench

62.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemma 4 E2B

Google

Released: 2026-04-02

Type: LLM

Context: 128K tokens

License: Apache 2.0

Google Gemma 4 E2B — 128K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

43.3%

GPQA Diamond

4.8%

HLE

20.9%

SciCode

20.8%

TAU2-bench

3.0%

TerminalBench-Hard

38.0%

IF-Bench

15.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Step 3.5 Flash 2603

StepFun

Released: 2026-04-02

Type: LLM

Context: 256K tokens

License: Proprietary

StepFun Step 3.5 Flash 2603 — 256K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

82.6%

GPQA Diamond

22.6%

HLE

38.5%

SciCode

87.4%

TAU2-bench

32.6%

TerminalBench-Hard

66.5%

IF-Bench

54.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Trinity Large Thinking

Arcee AI

Released: 2026-04-01

Type: LLM

Context: 512K tokens

License: Apache 2.0

Arcee AI Trinity Large Thinking — 512K tokens context, reasoning model.

💰 $0.23 in / $0.88 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.2%

GPQA Diamond

14.7%

HLE

36.1%

SciCode

90.1%

TAU2-bench

22.7%

TerminalBench-Hard

56.3%

IF-Bench

33.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

GLM 5V Turbo

Z AI

Released: 2026-04-01

Type: LLM

Context: 200K tokens

License: Proprietary

Z AI GLM 5V Turbo — 200K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

80.9%

GPQA Diamond

15.8%

HLE

43.5%

SciCode

98.5%

TAU2-bench

32.6%

TerminalBench-Hard

61.1%

IF-Bench

61.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

March

Major Release

Qwen3.5 Omni Flash

Alibaba

Released: 2026-03-30

Type: LLM

Context: 256K tokens

License: Proprietary

Alibaba Qwen3.5 Omni Flash — 256K tokens context.

💰 $0.10 in / $0.80 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

74.2%

GPQA Diamond

7.1%

HLE

25.5%

SciCode

84.5%

TAU2-bench

8.3%

TerminalBench-Hard

38.0%

IF-Bench

44.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

Qwen3.5 Omni Plus

Alibaba

Released: 2026-03-30

Type: LLM

Context: 256K tokens

License: Proprietary

Alibaba Qwen3.5 Omni Plus — 256K tokens context.

💰 $0.40 in / $4.80 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.6%

GPQA Diamond

13.9%

HLE

40.5%

SciCode

88.3%

TAU2-bench

21.2%

TerminalBench-Hard

51.2%

IF-Bench

52.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

MiMo-V2-Omni-0327

Xiaomi

Released: 2026-03-27

Type: LLM

Context: 256K tokens

License: Proprietary

Xiaomi MiMo-V2-Omni-0327 — 256K tokens context, reasoning model.

💰 $0.40 in / $2.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.5%

GPQA Diamond

20.4%

HLE

39.5%

SciCode

88.0%

TAU2-bench

35.6%

TerminalBench-Hard

67.3%

IF-Bench

63.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

Nemotron Cascade 2 30B A3B

NVIDIA

Released: 2026-03-19

Type: LLM

Context: 1M tokens

License: Nvidia Open Model License

NVIDIA Nemotron Cascade 2 30B A3B — 1M tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

75.8%

GPQA Diamond

11.4%

HLE

34.8%

SciCode

53.2%

TAU2-bench

21.2%

TerminalBench-Hard

80.4%

IF-Bench

34.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

MiMo-V2-Omni

Xiaomi

Released: 2026-03-19

Type: LLM

Context: 256K tokens

License: Proprietary

Xiaomi MiMo-V2-Omni — 256K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

82.8%

GPQA Diamond

19.9%

HLE

36.7%

SciCode

91.2%

TAU2-bench

34.8%

TerminalBench-Hard

53.5%

IF-Bench

66.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

MiniMax-M2.7

MiniMax

Released: 2026-03-18

Type: LLM

Context: 204K tokens

License: NON-COMMERCIAL LICENSE

MiniMax MiniMax-M2.7 — 204K tokens context, reasoning model.

💰 $0.30 in / $1.20 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

87.4%

GPQA Diamond

28.1%

HLE

47.0%

SciCode

84.8%

TAU2-bench

39.4%

TerminalBench-Hard

75.7%

IF-Bench

68.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

MiMo-V2-Pro

Xiaomi

Released: 2026-03-18

Type: LLM

Context: 1M tokens

License: Proprietary

Xiaomi MiMo-V2-Pro — 1M tokens context, reasoning model.

💰 $1.00 in / $3.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

87.0%

GPQA Diamond

28.3%

HLE

42.5%

SciCode

95.0%

TAU2-bench

40.9%

TerminalBench-Hard

68.8%

IF-Bench

60.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

GPT-5.4 mini

OpenAI

Released: 2026-03-17

Type: LLM

Context: 400K tokens

Knowledge cutoff: 2025-08-31

License: Proprietary

OpenAI GPT-5.4 mini — 400K tokens context, reasoning model.

💰 $0.75 in / $4.50 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

87.5%

GPQA Diamond

26.6%

HLE

49.9%

SciCode

83.3%

TAU2-bench

52.3%

TerminalBench-Hard

73.3%

IF-Bench

69.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

GPT-5.4 nano

OpenAI

Released: 2026-03-17

Type: LLM

Context: 400K tokens

Knowledge cutoff: 2025-08-31

License: Proprietary

OpenAI GPT-5.4 nano — 400K tokens context, reasoning model.

💰 $0.20 in / $1.25 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

81.7%

GPQA Diamond

26.5%

HLE

46.9%

SciCode

76.0%

TAU2-bench

42.4%

TerminalBench-Hard

75.9%

IF-Bench

66.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Mistral Small 4

Mistral

Released: 2026-03-16

Type: LLM

Context: 256K tokens

License: Apache 2.0

Mistral Mistral Small 4 — 256K tokens context, reasoning model.

💰 $0.15 in / $0.60 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

76.9%

GPQA Diamond

9.5%

HLE

38.0%

SciCode

41.2%

TAU2-bench

17.4%

TerminalBench-Hard

48.2%

IF-Bench

44.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

NVIDIA Nemotron 3 Nano 4B

NVIDIA

Released: 2026-03-16

Type: LLM

Context: 262K tokens

License: Nvidia Nemotron Open Model License

NVIDIA NVIDIA Nemotron 3 Nano 4B — 262K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

51.3%

GPQA Diamond

4.8%

HLE

16.4%

SciCode

28.1%

TAU2-bench

6.8%

TerminalBench-Hard

58.2%

IF-Bench

16.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

GLM-5-Turbo

Z AI

Released: 2026-03-15

Type: LLM

Context: 200K tokens

License: Proprietary

Z AI GLM-5-Turbo — 200K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

84.7%

GPQA Diamond

25.4%

HLE

43.6%

SciCode

98.5%

TAU2-bench

33.3%

TerminalBench-Hard

73.2%

IF-Bench

60.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA

Released: 2026-03-11

Type: LLM

Context: 1M tokens

License: Nvidia Nemotron Open Model License

NVIDIA NVIDIA Nemotron 3 Super 120B A12B — 1M tokens context, reasoning model.

💰 $0.30 in / $0.75 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

80.0%

GPQA Diamond

19.2%

HLE

36.0%

SciCode

67.8%

TAU2-bench

28.8%

TerminalBench-Hard

71.5%

IF-Bench

60.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Grok 4.20 0309

xAI

Released: 2026-03-10

Type: LLM

Context: 2M tokens

License: Proprietary

xAI Grok 4.20 0309 — 2M tokens context, reasoning model.

💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

88.5%

GPQA Diamond

30.0%

HLE

44.7%

SciCode

96.5%

TAU2-bench

40.9%

TerminalBench-Hard

82.9%

IF-Bench

59.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet →

Major Release

Sarvam 105B

Sarvam

Released: 2026-03-06

Type: LLM

Context: 128K tokens

License: Apache 2.0

Sarvam Sarvam 105B — 128K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

73.8%

GPQA Diamond

10.1%

HLE

26.4%

SciCode

46.8%

TAU2-bench

1.5%

TerminalBench-Hard

34.4%

IF-Bench

0.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Sarvam 30B

Sarvam

Released: 2026-03-06

Type: LLM

Context: 65K tokens

License: Apache 2.0

Sarvam Sarvam 30B — 65K tokens context, reasoning model.

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

63.3%

GPQA Diamond

7.0%

HLE

19.2%

SciCode

34.5%

TAU2-bench

2.3%

TerminalBench-Hard

26.5%

IF-Bench

0.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

GPT-5.4

OpenAI

Released: 2026-03-05

Type: LLM

Context: 1M tokens

Knowledge cutoff: 2025-08-31

License: Proprietary

OpenAI GPT-5.4 — 1M tokens context, reasoning model.

💰 $2.50 in / $15.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

92.0%

GPQA Diamond

41.6%

HLE

56.6%

SciCode

87.1%

TAU2-bench

57.6%

TerminalBench-Hard

73.9%

IF-Bench

74.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemini 3.1 Flash-Lite Preview

Google

Released: 2026-03-03

Type: LLM

Context: 1M tokens

Knowledge cutoff: 2025-01-01

License: Proprietary

Google Gemini 3.1 Flash-Lite Preview — 1M tokens context, reasoning model.

💰 $0.25 in / $1.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text

📊 Benchmarks

82.2%

GPQA Diamond

16.2%

HLE

41.9%

SciCode

31.3%

TAU2-bench

24.2%

TerminalBench-Hard

77.2%

IF-Bench

65.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Qwen3.5 0.8B

Alibaba

Released: 2026-03-02

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 0.8B — 262K tokens context, reasoning model.

💰 $0.01 in / $0.05 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

11.1%

GPQA Diamond

1.2%

HLE

0.0%

SciCode

47.7%

TAU2-bench

0.0%

TerminalBench-Hard

21.5%

IF-Bench

5.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.5 2B

Alibaba

Released: 2026-03-02

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 2B — 262K tokens context, reasoning model.

💰 $0.02 in / $0.10 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

45.6%

GPQA Diamond

2.1%

HLE

2.8%

SciCode

69.0%

TAU2-bench

3.8%

TerminalBench-Hard

31.5%

IF-Bench

23.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.5 4B

Alibaba

Released: 2026-03-02

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 4B — 262K tokens context, reasoning model.

💰 $0.03 in / $0.15 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

77.1%

GPQA Diamond

7.8%

HLE

16.1%

SciCode

92.1%

TAU2-bench

18.2%

TerminalBench-Hard

52.0%

IF-Bench

55.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.5 9B

Alibaba

Released: 2026-03-02

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 9B — 262K tokens context, reasoning model.

💰 $0.10 in / $0.15 out per 1M tok 🎛 In: text, image, video 📤 Out: text

📊 Benchmarks

80.6%

GPQA Diamond

13.3%

HLE

27.5%

SciCode

86.8%

TAU2-bench

24.2%

TerminalBench-Hard

66.7%

IF-Bench

59.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

February

Major Release

LFM2 24B A2B

Liquid AI

Released: 2026-02-25

Type: LLM

Context: 32K tokens

License: lfm 1.0

Liquid AI LFM2 24B A2B — 32K tokens context.

💰 $0.03 in / $0.12 out per 1M tok 🎛 In: text 📤 Out: text

📊 Benchmarks

47.4%

GPQA Diamond

4.4%

HLE

10.9%

SciCode

11.1%

TAU2-bench

0.0%

TerminalBench-Hard

45.9%

IF-Bench

0.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet →

Major Release

Qwen3.5 122B A10B

Alibaba

Released: 2026-02-24

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 122B A10B — 262K tokens context, reasoning model.

💰 $0.40 in / $3.20 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.7%

GPQA Diamond

23.4%

HLE

42.0%

SciCode

93.6%

TAU2-bench

31.1%

TerminalBench-Hard

75.7%

IF-Bench

66.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Qwen3.5 27B

Alibaba

Released: 2026-02-24

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 27B — 262K tokens context, reasoning model.

💰 $0.30 in / $2.40 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

85.8%

GPQA Diamond

22.2%

HLE

39.5%

SciCode

93.9%

TAU2-bench

32.6%

TerminalBench-Hard

75.6%

IF-Bench

67.3%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Qwen3.5 35B A3B

Alibaba

Released: 2026-02-24

Type: LLM

Context: 262K tokens

License: Apache 2.0

Alibaba Qwen3.5 35B A3B — 262K tokens context, reasoning model.

💰 $0.25 in / $2.00 out per 1M tok 🎛 In: text, image 📤 Out: text

📊 Benchmarks

84.5%

GPQA Diamond

19.7%

HLE

37.7%

SciCode

89.2%

TAU2-bench

26.5%

TerminalBench-Hard

72.5%

IF-Bench

62.7%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemini 3.1 Pro

Google

Released: 2026-02-19

Type: LLM

Size: ~1T MoE

Architecture: Sparse Mixture-of-Experts (MoE)

Context: 2M tokens

Knowledge cutoff: 2025-12

License: Proprietary

Google's flagship reasoning model with a 2x jump on hard multi-step tasks. — Google's latest flagship model with a major 2x jump in reasoning capabilities

💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

2x reasoning improvement
ARC-AGI-2 score of 77.1%
Enhanced multimodal understanding
Deep Think mode

📊 Benchmarks

77.1%

ARC-AGI-2

93.8%

MMLU

89.4%

MATH

93.8%

MMLU-Pro

84.2%

GPQA Diamond

72.3%

SWE-bench Verified

78.9%

LiveCodeBench

🔄 What's new vs previous version

2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
Context window expanded to 2M tokens
Deep Think mode enabled by default on the Pro tier
Lower latency on first-token despite larger context

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Claude Sonnet 4.6

Anthropic

Released: 2026-02-17

Type: LLM

Size: ~500B

Architecture: Dense Transformer (proprietary)

Context: 500K tokens

Knowledge cutoff: 2025-10

License: Proprietary

Near-Opus quality at a fraction of the cost, with Agent Teams orchestration. — Anthropic's latest Sonnet with Agent Teams capability and near-Opus performance at a fraction of the cost

💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

Agent Teams: orchestrate 2-16 Claude instances
Near-Opus performance at 1/5th cost
80.8% SWE-bench Verified
Fast mode research preview

📊 Benchmarks

80.8%

SWE-bench

92.1%

MMLU

95.2%

HumanEval

80.8%

SWE-bench Verified

79.7%

GPQA Diamond

88.5%

AIME 2025

71.2%

TAU-bench

10.8%

HLE

44.1%

SciCode

78.9%

TAU2-bench

42.4%

TerminalBench-Hard

42.4%

IF-Bench

58.7%

LiveCodeBench Reasoning

🔄 What's new vs previous version

Agent Teams: orchestrate 2–16 Claude instances in parallel
+8.5pt on SWE-bench Verified vs Sonnet 4
1/5 the cost of Opus 4.5 at ~95% of coding quality
Fast mode research preview for lower-latency inference

API Only

🔍 Full spec sheet → 📢 Announcement 🛡 System card

Update

DeepSeek V3.2

DeepSeek

Released: 2026-02-12

Type: LLM

Size: 671B MoE

Architecture: Sparse MoE (37B active / 671B total)

Context: 1M tokens

Knowledge cutoff: 2025-09

License: DeepSeek License (open weights, commercial OK)

Open-weight MoE with a 1M+ token context window and strong coding. — Major update with 10x context window expansion to over 1 million tokens

💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI · Fireworks AI

✨ Key Features

1M+ token context window (10x expansion)
Improved reasoning capabilities
Open source release
Cost-effective inference

📊 Benchmarks

90.1%

MMLU

92.5%

HumanEval

1M+ tokens

Context Window

86.2%

MMLU-Pro

84.0%

GPQA Diamond

85.6%

MATH

68.4%

GPQA

86.2%

LiveCodeBench

22.2%

HLE

38.9%

SciCode

90.6%

TAU2-bench

35.6%

TerminalBench-Hard

60.7%

IF-Bench

65.0%

LiveCodeBench Reasoning

🔄 What's new vs previous version

10x context window expansion (128K → 1M+ tokens)
Sliding-window attention for long-context throughput
Improved chain-of-thought reasoning
Native FP8 inference support

Open Source

🔍 Full spec sheet → 📢 Announcement 📄 Paper

Major Release

GLM-5

Zhipu AI

Released: 2026-02-11

Type: LLM

Size: 744B

Architecture: Dense Transformer (744B)

Context: 200K tokens

Knowledge cutoff: 2025-11

License: Proprietary (open weights for non-frontier sizes)

First frontier model trained entirely on Huawei Ascend silicon. — First frontier AI model trained entirely without NVIDIA GPUs, using Huawei Ascend chips

💰 $0.11 in / $0.28 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Zhipu BigModel API

✨ Key Features

First frontier model trained on Huawei Ascend chips (no NVIDIA)
#1 HLE score (50.4%)
1.2% hallucination rate via Slime RL
136x cheaper than Claude Opus 4.5

📊 Benchmarks

50.4%

HLE

1.2%

Hallucination Rate

$0.11/M tokens

Cost

88.7%

MMLU

92.1%

C-Eval

94.8%

GSM8K

🔄 What's new vs previous version

Trained entirely on Huawei Ascend 910B clusters (no NVIDIA)
Slime RL fine-tuning drops hallucination rate to 1.2%
136x cheaper than Claude Opus 4.5 at comparable quality

API Only

🔍 Full spec sheet → 📢 Announcement 📄 Paper

Major Release

GPT-5.3 Codex

OpenAI

Released: 2026-02-05

Type: Code

Size: ~200B

Architecture: MoE (coding-specialized fine-tune)

Context: 400K tokens

Knowledge cutoff: 2025-11

License: Proprietary

Coding-specialized variant of GPT-5.3, tuned for agentic IDE workflows. — OpenAI's specialized self-improving coding model with state-of-the-art software engineering performance

💰 $1.25 in / $10.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI · GitHub Copilot

✨ Key Features

Self-improving agentic coding
25% faster than GPT-5.2-Codex
1,000+ tokens/sec generation
First OpenAI model flagged 'high' on cybersecurity framework

📊 Benchmarks

77.3%

Terminal-Bench

SOTA

SWE-Bench Pro

1,000+ tok/s

Speed

82.4%

SWE-bench Verified

96.8%

HumanEval

91.5%

GPQA Diamond

84.2%

LiveCodeBench

79.5%

Aider Polyglot

39.9%

HLE

53.2%

SciCode

86.0%

TAU2-bench

53.0%

TerminalBench-Hard

75.4%

IF-Bench

74.0%

LiveCodeBench Reasoning

🔄 What's new vs previous version

+4pt on SWE-bench Verified vs GPT-5.2 Codex
Native IDE tool-calling at reduced latency
Extended max output to 100K for multi-file patches

API Only

🔍 Full spec sheet → 📢 Announcement

January

Major Release

Kimi K2

Moonshot AI

Released: 2026-01-20

Type: LLM

Size: 1.04T MoE

Architecture: MoE (32B active / ~1T total)

Context: 2M tokens

Knowledge cutoff: 2025-10

License: Modified MIT (open weights)

Moonshot's open-weight frontier MoE with strong agentic benchmarks. — First open-weight model to rank #1 on LMSYS Chatbot Arena with over 1 trillion parameters

💰 $0.15 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Moonshot API · Hugging Face · Together AI

✨ Key Features

First open-weight model #1 on LMSYS Chatbot Arena
1.04 trillion parameters
K2.5 agent swarms with up to 100 sub-agents
$0.15/M input tokens

📊 Benchmarks

LMSYS Arena

1.04T

Parameters

$0.15/M tokens

Cost

91.3%

MMLU

65.8%

SWE-bench Verified

74.1%

GPQA Diamond

68.9%

LiveCodeBench

🔄 What's new vs previous version

2M token context window (20x vs first Kimi)
Agentic tool-use tuning via MuonClip optimizer
Open weights under modified MIT

Open Source

🔍 Full spec sheet → 📢 Announcement

2025

December

Update

GPT-5.2 Codex

OpenAI

Released: 2025-12-18

Type: Code

Size: ~200B

Architecture: MoE (coding fine-tune)

Context: 256K tokens

Knowledge cutoff: 2025-08

License: Proprietary

Prior-gen Codex variant of GPT-5.2 for agentic coding. — Specialized coding variant of GPT-5.2 focused on software engineering tasks

💰 $1.50 in / $12.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

Specialized for software engineering
Enhanced agentic coding
Multi-file refactoring
Advanced debugging capabilities

📊 Benchmarks

SOTA

SWE-Bench

95.1%

HumanEval

72.8%

Terminal-Bench

78.2%

SWE-bench Verified

80.4%

LiveCodeBench

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Mistral Large 3

Mistral

Released: 2025-12-15

Type: LLM

Size: ~123B

Architecture: Dense Transformer

Context: 256K tokens

Knowledge cutoff: 2025-07

License: Mistral Commercial License

Mistral's flagship proprietary model, tuned for European enterprise. — Mistral's flagship model competing with GPT-5 class models at a fraction of the cost

💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Mistral La Plateforme · Azure · AWS Bedrock

✨ Key Features

128K context window
Improved multilingual capabilities
Enhanced function calling
Competitive with GPT-5 class models

📊 Benchmarks

89.4%

MMLU

91.2%

HumanEval

82.1%

MATH

80.7%

MMLU-Pro

68.0%

GPQA Diamond

76.8%

MMMU

46.5%

LiveCodeBench

4.1%

HLE

36.2%

SciCode

24.6%

TAU2-bench

15.9%

TerminalBench-Hard

36.2%

IF-Bench

34.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Update

GPT-5.2

OpenAI

Released: 2025-12-11

Type: LLM

Size: ~200B

Architecture: MoE

Context: 400K tokens

Knowledge cutoff: 2025-08

License: Proprietary

Late-2025 GPT-5 refresh with improved reasoning and steerability. — Iterative improvement on GPT-5.1 with enhanced reasoning and faster performance

💰 $2.00 in / $10.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

Enhanced reasoning capabilities
Improved adaptive reasoning
Better multimodal understanding
Faster inference

📊 Benchmarks

92.8%

MMLU

88.5%

MATH

95.8%

HumanEval

87.4%

MMLU-Pro

72.5%

SWE-bench Verified

90.3%

GPQA Diamond

92.1%

AIME 2025

88.9%

LiveCodeBench

35.4%

HLE

52.1%

SciCode

84.8%

TAU2-bench

47.0%

TerminalBench-Hard

75.4%

IF-Bench

72.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

November

Major Release

Claude Opus 4.5

Anthropic

Released: 2025-11-24

Type: LLM

Size: ~500B

Architecture: Dense Transformer (proprietary)

Context: 500K tokens

Knowledge cutoff: 2025-08

License: Proprietary

Anthropic's top-tier reasoning model for complex research and agents. — Anthropic's most capable model with breakthrough coding performance and major price reduction

💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

First model to break 80.9% on SWE-Bench Verified
67% price reduction vs previous Opus
Extended reasoning capabilities
Advanced coding performance

📊 Benchmarks

80.9%

SWE-bench

92.8%

MMLU

95.0%

HumanEval

89.5%

MMLU-Pro

78.9%

SWE-bench Verified

86.6%

GPQA Diamond

90.5%

AIME 2025

87.1%

LiveCodeBench

28.4%

HLE

49.5%

SciCode

89.5%

TAU2-bench

47.0%

TerminalBench-Hard

58.0%

IF-Bench

74.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemini 3 Pro

Google

Released: 2025-11-18

Type: Multimodal

Size: ~1T MoE

Architecture: Sparse MoE

Context: 1M tokens

Knowledge cutoff: 2025-09

License: Proprietary

First Gemini 3 tier release; strong multimodal + long-context. — Google's flagship model with Deep Think mode, ranked #1 on LMSYS Arena at launch

💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

1M token context window
Deep Think reasoning mode
Solved 5/6 IMO 2025 problems
#1 on LMSYS Arena

📊 Benchmarks

87.5%

ARC-AGI

93.2%

MMLU

LMSYS Arena

89.4%

MMLU-Pro

82.1%

MMMU

78.5%

GPQA Diamond

68.2%

SWE-bench Verified

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

GPT-5.1

OpenAI

Released: 2025-11-12

Type: LLM

Size: ~200B

Architecture: MoE

Context: 400K tokens

Knowledge cutoff: 2025-06

License: Proprietary

Maintenance update to GPT-5 with steerability + latency improvements. — Major GPT-5 iteration with adaptive reasoning and perfect scores on math competitions

💰 $2.25 in / $11.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

Adaptive reasoning modes
Perfect 100% on AIME 2025
87.5% on ARC-AGI
Enhanced multimodal capabilities

📊 Benchmarks

87.5%

ARC-AGI

100%

AIME 2025

92.5%

MMLU

87.0%

MMLU-Pro

70.1%

SWE-bench Verified

87.3%

GPQA Diamond

86.8%

LiveCodeBench

26.5%

HLE

43.3%

SciCode

81.9%

TAU2-bench

45.5%

TerminalBench-Hard

72.9%

IF-Bench

75.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

August

Major Release

GPT-5

OpenAI

Released: 2025-08-15

Type: LLM

Size: ~200B

Architecture: MoE with unified reasoning router

Context: 400K tokens

Knowledge cutoff: 2025-05

License: Proprietary

OpenAI's flagship unified reasoning + chat model replacing the GPT-4 line. — OpenAI's next-generation flagship model with adaptive reasoning capabilities

💰 $2.50 in / $12.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

Adaptive reasoning (routes between quick and deep thinking)
Improved math and coding
Enhanced multimodal reasoning
New safety architecture

📊 Benchmarks

91.0%

MMLU

95.1%

HumanEval

90.1%

MATH

80.6%

MMLU-Pro

67.4%

SWE-bench Verified

67.3%

GPQA Diamond

55.8%

LiveCodeBench

86.1%

MATH-500

36.7%

AIME 2025

5.4%

HLE

38.8%

SciCode

67.0%

TAU2-bench

18.2%

TerminalBench-Hard

45.6%

IF-Bench

25.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

July

Update

Claude Opus 4.1

Anthropic

Released: 2025-07-15

Type: LLM

Size: ~500B

Architecture: Dense Transformer (proprietary)

Context: 200K tokens

Knowledge cutoff: 2025-03

License: Proprietary

Mid-2025 Opus refresh focused on agentic coding reliability. — Iterative improvement on Claude Opus 4 with enhanced multi-file refactoring

💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

Improved multi-file refactoring
Enhanced agentic capabilities
Better long-context performance
Reduced hallucinations

📊 Benchmarks

75.2%

SWE-bench

91.2%

MMLU

94.0%

HumanEval

74.5%

SWE-bench Verified

79.1%

GPQA Diamond

API Only

🔍 Full spec sheet → 📢 Announcement

June

Update

Gemini 2.5 Flash

Google

Released: 2025-06-20

Type: Multimodal

Size: ~175B

Architecture: Dense multimodal transformer

Context: 1M tokens

Knowledge cutoff: 2025-01

License: Proprietary

Google's cost-optimized multimodal model with thinking mode. — Google's fast and cost-effective model with enhanced image capabilities

💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

Enhanced image editing stabilization
Faster inference
Improved multimodal understanding
Cost-effective deployment

📊 Benchmarks

87.5%

MMLU

2x Gemini 2.0 Flash

Speed

High

Image Quality

83.2%

MMLU-Pro

96.2%

HumanEval

79.0%

GPQA Diamond

79.7%

MMMU

69.5%

LiveCodeBench

98.1%

MATH-500

82.3%

AIME 2025

11.1%

HLE

39.4%

SciCode

31.6%

TAU2-bench

13.6%

TerminalBench-Hard

50.3%

IF-Bench

61.7%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

May

Major Release

Claude Sonnet 4

Anthropic

Released: 2025-05-22

Type: LLM

Size: ~500B

Architecture: Dense Transformer (proprietary)

Context: 200K tokens

Knowledge cutoff: 2025-03

License: Proprietary

Claude 4 mid-tier with strong coding and long-horizon agentic reliability. — Latest generation Claude model with significant performance improvements

💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

Enhanced reasoning capabilities
Improved safety measures
Advanced multimodal understanding
Extended context window

📊 Benchmarks

88.7%

MMLU

94.5%

HumanEval

76.8%

MATH

72.3%

SWE-bench Verified

74.0%

GPQA Diamond

API Only

🔍 Full spec sheet → 📢 Announcement 🛡 System card

February

Update

Claude Sonnet 3.7

Anthropic

Released: 2025-02-24

Type: LLM

Size: ~300B

Architecture: Dense Transformer (proprietary)

Context: 200K tokens

Knowledge cutoff: 2024-11

License: Proprietary

Extended-thinking update to Sonnet 3.5 with visible reasoning toggle. — Iterative improvement on Claude 3.5 with enhanced capabilities

💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

Improved reasoning
Better code generation
Enhanced safety
Reduced hallucinations

📊 Benchmarks

86.1%

MMLU

93.2%

HumanEval

74.1%

MATH

62.3%

SWE-bench Verified

68.3%

GPQA Diamond

API Only

🔍 Full spec sheet → 📢 Announcement

2024

December

Major Release

DeepSeek-V3

DeepSeek

Released: 2024-12-26

Type: LLM

Size: 671B

Architecture: Sparse MoE (37B active / 671B total)

Context: 128K tokens

Knowledge cutoff: 2024-07

License: DeepSeek License (open weights)

DeepSeek's breakthrough open-weight MoE rivaling GPT-4-class quality. — DeepSeek's most advanced open-source model with MoE architecture

💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI

✨ Key Features

Mixture of Experts architecture
Cost-effective training
Open source release
Strong reasoning capabilities

📊 Benchmarks

88.5%

MMLU

90.6%

HumanEval

61.6%

MATH

75.2%

MMLU-Pro

55.7%

GPQA Diamond

35.9%

LiveCodeBench

88.7%

MATH-500

25.3%

AIME 2025

3.6%

HLE

35.4%

SciCode

22.8%

TAU2-bench

6.8%

TerminalBench-Hard

34.8%

IF-Bench

29.0%

LiveCodeBench Reasoning

Open Source

🔍 Full spec sheet → 📢 Announcement 📄 Paper

Major Release

Gemini 2.0 Flash

Google

Released: 2024-12-11

Type: Multimodal

Size: ~175B

Architecture: Dense multimodal transformer

Context: 1M tokens

Knowledge cutoff: 2024-08

License: Proprietary

First Gemini 2 model — fast, cheap, multimodal, with tool use native. — Google's next-generation model with native multimodal capabilities

💰 $0.10 in / $0.40 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text, image, audio 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

Native multimodal generation
Real-time API
Agentic capabilities
Enhanced speed

📊 Benchmarks

85.8%

MMLU

90.7%

HumanEval

58.8%

MATH

78.2%

MMLU-Pro

63.6%

GPQA Diamond

70.7%

MMMU

21.0%

LiveCodeBench

91.1%

MATH-500

30.0%

AIME 2025

4.7%

HLE

34.0%

SciCode

29.5%

TAU2-bench

3.8%

TerminalBench-Hard

40.2%

IF-Bench

28.3%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

August

Major Release

Grok-2

xAI

Released: 2024-08-13

Type: LLM

Size: ~314B

Architecture: Transformer

Context: 128K tokens

Knowledge cutoff: Real-time (X feed)

License: Proprietary

Elon's second-gen Grok — real-time X/Twitter data access — xAI's flagship model with real-time web access and multimodal capabilities

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 xAI API · x.com (Grok)

✨ Key Features

Real-time information access
Multimodal understanding
X platform integration
Conversational AI

📊 Benchmarks

84.0%

MMLU

86.3%

HumanEval

56.0%

MATH

70.9%

MMLU-Pro

51.0%

GPQA Diamond

26.7%

LiveCodeBench

77.8%

MATH-500

13.3%

AIME 2025

3.8%

HLE

28.5%

SciCode

🔄 What's new vs previous version

Vision input
Real-time X data
Improved reasoning

Platform Exclusive

🔍 Full spec sheet → 📢 Announcement

June

Major Release

Claude 3.5 Sonnet

Anthropic

Released: 2024-06-20

Type: LLM

Size: ~175B

Architecture: Dense Transformer (proprietary)

Context: 200K tokens

Knowledge cutoff: 2024-04

License: Proprietary

The mid-2024 Sonnet release that set the SOTA bar for coding and agents. — Anthropic's most intelligent model with significantly improved capabilities

💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

200K context window
Improved coding capabilities
Enhanced reasoning
Vision capabilities

📊 Benchmarks

88.7%

MMLU

89.9%

HumanEval

71.1%

MATH

75.1%

MMLU-Pro

49.0%

SWE-bench Verified

56.0%

GPQA Diamond

38.1%

LiveCodeBench

69.5%

MATH-500

9.7%

AIME 2025

3.7%

HLE

31.6%

SciCode

API Only

🔍 Full spec sheet → 📢 Announcement

March

Major Release

Claude 3 Opus

Anthropic

Released: 2024-03-04

Type: LLM

Size: ~175B

Architecture: Transformer

Context: 200K tokens

Knowledge cutoff: 2023-08

License: Proprietary

Anthropic's most powerful pre-Claude 4 model — tops GPT-4 on reasoning — Most capable model in the Claude 3 family with near-human performance on complex tasks

💰 $18.75 in / $75.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

200K context window
Advanced reasoning
Multimodal capabilities
Constitutional AI training

📊 Benchmarks

86.8%

MMLU

84.8%

HumanEval

60.1%

MATH

69.6%

MMLU-Pro

48.9%

GPQA Diamond

95.0%

GSM8K

27.9%

LiveCodeBench

64.1%

MATH-500

3.3%

AIME 2025

3.1%

HLE

23.3%

SciCode

🔄 What's new vs previous version

200K context
Vision input
+15% MMLU vs Claude 2
Tool use

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Claude 3 Sonnet

Anthropic

Released: 2024-03-04

Type: LLM

Size: ~175B

Architecture: Transformer

Context: 200K tokens

Knowledge cutoff: 2023-08

License: Proprietary

Balanced Claude 3 variant — best price/performance in the family — Balanced model offering strong performance with faster response times

💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

200K context window
Balanced capability and speed
Multimodal input
Strong reasoning

📊 Benchmarks

79.0%

MMLU

71.3%

HumanEval

40.5%

MATH

57.9%

MMLU-Pro

40.0%

GPQA Diamond

17.5%

LiveCodeBench

41.4%

MATH-500

4.7%

AIME 2025

3.8%

HLE

22.9%

SciCode

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Claude 3 Haiku

Anthropic

Released: 2024-03-04

Type: LLM

Size: ~25B

Architecture: Transformer

Context: 200K tokens

Knowledge cutoff: 2023-08

License: Proprietary

Fastest and cheapest Claude 3 — sub-second latency at $0.25/M — Fastest and most compact model in the Claude 3 family

💰 $0.25 in / $1.25 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

200K context window
Fastest response times
Multimodal input
Cost-effective

📊 Benchmarks

75.2%

MMLU

75.7%

HumanEval

38.9%

MATH

37.4%

GPQA Diamond

15.4%

LiveCodeBench

39.4%

MATH-500

1.0%

AIME 2025

3.9%

HLE

18.6%

SciCode

21.1%

TAU2-bench

0.8%

TerminalBench-Hard

36.1%

IF-Bench

21.0%

LiveCodeBench Reasoning

API Only

🔍 Full spec sheet → 📢 Announcement

February

Major Release

Mistral Large

Mistral

Released: 2024-02-26

Type: LLM

Size: ~70B

Context: 32K tokens

Top-tier reasoning model with strong multilingual capabilities

💰 $4.00 in / $12.00 out per 1M tok 🎛 In: text 📤 Out: text

✨ Key Features

32K context window
Multilingual capabilities
Function calling
JSON mode

📊 Benchmarks

81.2%

MMLU

70.6%

HumanEval

89.2%

HellaSwag

51.5%

MMLU-Pro

35.1%

GPQA Diamond

17.8%

LiveCodeBench

52.7%

MATH-500

0.0%

AIME 2025

3.4%

HLE

20.8%

SciCode

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemini 1.5 Pro

Google

Released: 2024-02-15

Type: Multimodal

Size: ~175B

Architecture: MoE Transformer

Context: 1M tokens

Knowledge cutoff: 2023-11

License: Proprietary

Google's first 1M-context model — multimodal needle-in-haystack champion — Google's next-generation model with breakthrough long context capabilities

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

1M token context window
Multimodal understanding
Video analysis
Audio processing

📊 Benchmarks

81.9%

MMLU

83.4%

HumanEval

58.5%

MATH

65.7%

MMLU-Pro

37.1%

GPQA Diamond

24.4%

LiveCodeBench

67.3%

MATH-500

8.0%

AIME 2025

3.9%

HLE

27.4%

SciCode

🔄 What's new vs previous version

1M token context
Multi-hour video understanding
MoE architecture

API Only

🔍 Full spec sheet → 📢 Announcement

January

Major Release

text-embedding-3-large

OpenAI

Released: 2024-01-25

Type: Embedding

Size: ~7B

Architecture: Transformer encoder

License: Proprietary

OpenAI's best embedding model — 3× cheaper than ada-002 with better MTEB — OpenAI's most powerful text embedding model

🎛 In: text 📤 Out: embeddings 🌐 OpenAI API · Azure OpenAI

✨ Key Features

3072 embedding dimensions
Improved retrieval performance
Reduced hallucinations
Multi-language support

📊 Benchmarks

64.6%

MTEB Score

3072

Dimensions

100+

Languages

64.6%

MTEB avg

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

GPT-4 Turbo

OpenAI

Released: 2024-01-25

Type: LLM

Size: ~175B

Architecture: Transformer

Context: 128K tokens

Knowledge cutoff: 2023-04

License: Proprietary

GPT-4 with 128K context and knowledge through April 2023 — 3× cheaper than GPT-4 — Latest iteration of GPT-4 with improved performance and longer context window

💰 $10.00 in / $30.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

128K context window
Improved instruction following
Enhanced reasoning capabilities
Reduced hallucinations

📊 Benchmarks

86.4%

MMLU

91.8%

HumanEval

95.3%

HellaSwag

69.4%

MMLU-Pro

29.1%

LiveCodeBench

73.7%

MATH-500

15.0%

AIME 2025

3.3%

HLE

31.9%

SciCode

🔄 What's new vs previous version

128K context (8× increase)
Updated knowledge cutoff
3× cheaper than GPT-4

API Only

🔍 Full spec sheet → 📢 Announcement

2023

December

Major Release

Grok-1

xAI

Released: 2023-12-07

Type: LLM

Size: ~314B

Architecture: MoE Transformer

Context: 8K tokens

Knowledge cutoff: 2023-10-01

License: Apache 2.0

xAI's open-source release — 314B MoE, first frontier model fully open-sourced — xAI's first major language model with real-time internet access

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Self-hosted (HuggingFace)

✨ Key Features

Real-time information
Conversational interface
X platform integration
Uncensored responses

📊 Benchmarks

73.0%

MMLU

63.2%

HumanEval

62.9%

GSM8K

Platform Exclusive

🔍 Full spec sheet → 📢 Announcement

Research

AlphaCode 2

Google DeepMind

Released: 2023-12-06

Type: Code

Size: ~340B

Architecture: Transformer (Gemini-based)

License: Proprietary (research)

DeepMind's coding specialist — top 15% of competitive programmers — DeepMind's advanced code generation system for competitive programming

🎛 In: text, code 📤 Out: code

✨ Key Features

Advanced code generation
Competitive programming
Multi-language support
Problem decomposition

📊 Benchmarks

1747

Codeforces Rating

85th percentile

Problem Solving

10+ languages

Language Support

Top 15%

Codeforces percentile

Research Only

🔍 Full spec sheet → 📢 Announcement 📄 Paper

Major Release

Gemini Ultra

Google

Released: 2023-12-06

Type: Multimodal

Size: ~540B

Architecture: MoE Transformer

Knowledge cutoff: 2023-06

License: Proprietary

Google's first model to beat GPT-4 on MMLU — 90%+ with CoT — Google's most capable multimodal AI model

🎛 In: text, image, audio, video 📤 Out: text 🌐 Google One AI Premium · Vertex AI

✨ Key Features

Multimodal reasoning
Text, image, audio, video understanding
Advanced mathematical reasoning
Code generation

📊 Benchmarks

90.0%

MMLU

74.4%

HumanEval

53.2%

MATH

🔄 What's new vs previous version

First model to exceed human expert on MMLU
Native multimodal
32K context

Limited Access

🔍 Full spec sheet → 📢 Announcement

Major Release

Gemini Pro

Google

Released: 2023-12-06

Type: Multimodal

Size: ~175B

Architecture: Transformer

Context: 32K tokens

Knowledge cutoff: 2023-06

License: Proprietary

Google's workhorse Gemini model — free tier in Google AI Studio — Google's balanced model for wide range of tasks

🎛 In: text, image 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

Multimodal capabilities
32K context window
Fast inference
Scalable deployment

📊 Benchmarks

79.1%

MMLU

67.7%

HumanEval

32.6%

MATH

API Only

🔍 Full spec sheet → 📢 Announcement

November

Update

Claude 2.1

Anthropic

Released: 2023-11-21

Type: LLM

Size: ~175B

Architecture: Transformer

Context: 200K tokens

Knowledge cutoff: 2023-01

License: Proprietary

Claude 2 update — 200K context and reduced hallucinations — Significant improvements in accuracy and honesty over Claude 2

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

200K context window
Reduced hallucination rates
Enhanced accuracy
Tool use capabilities

📊 Benchmarks

73.1%

MMLU

15.9%

HumanEval

71.1%

MATH

49.5%

MMLU-Pro

31.9%

GPQA Diamond

19.5%

LiveCodeBench

37.4%

MATH-500

3.3%

AIME 2025

4.2%

HLE

18.4%

SciCode

🔄 What's new vs previous version

200K context (2× Claude 2)
50% fewer hallucinations
Tool use beta

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Whisper v3

OpenAI

Released: 2023-11-06

Type: Audio

Size: ~1.55B

Architecture: Transformer encoder-decoder

License: MIT

State-of-the-art open speech recognition — 99 languages, open weights — OpenAI's multilingual speech recognition system

🎛 In: audio 📤 Out: text 🌐 OpenAI API · Self-hosted (HuggingFace) · Groq

✨ Key Features

Multilingual speech recognition
99 language support
Robust noise handling
Real-time transcription

📊 Benchmarks

5.1%

WER English

99 languages

Language Coverage

0.8x

Real-time Factor

~2.7%

WER (English)

Open Source

🔍 Full spec sheet → 📢 Announcement

August

Major Release

Code Llama 34B

✨ Key Features

Code generation
Code completion
Multiple programming languages
Large context window

📊 Benchmarks

48.8%

HumanEval

55.0%

MBPP

45.9%

MultiPL-E

Open Source

🔍 Full spec sheet → 📢 Announcement 📄 Paper

July

Major Release

Llama 2 70B

✨ Key Features

Open source
Commercial license
Improved safety
Enhanced performance

📊 Benchmarks

68.9%

MMLU

29.9%

HumanEval

13.5%

MATH

Open Source

🔍 Full spec sheet → 📢 Announcement 📄 Paper

Major Release

Claude 2

Anthropic

Released: 2023-07-11

Type: LLM

Size: ~175B

Architecture: Transformer

Context: 100K tokens

Knowledge cutoff: 2023-01

License: Proprietary

Claude's first major leap — 100K context and better at instructions — Significant improvement over Claude 1 with enhanced capabilities

🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

100K context window
Improved safety
Enhanced reasoning
Better code generation

📊 Benchmarks

78.5%

MMLU

71.2%

HumanEval

88.0%

MATH

🔄 What's new vs previous version

100K context (10× Claude 1)
Improved reasoning
Reduced refusals

API Only

🔍 Full spec sheet → 📢 Announcement

May

Major Release

PaLM 2

Google

Released: 2023-05-10

Type: LLM

Size: ~340B

Architecture: Transformer

Context: 8K tokens

Knowledge cutoff: 2023-02

License: Proprietary

Google's multilingual flagship — powers Bard 2023, 100+ languages — Google's improved large language model powering Bard and other services

💰 $0.00 in / $0.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 Google Cloud Vertex AI · Google AI Studio

✨ Key Features

Multilingual capabilities
Reasoning improvements
Coding abilities
Multiple model sizes

📊 Benchmarks

78.3%

MMLU

77.2%

HumanEval

34.3%

MATH

API Only

🔍 Full spec sheet → 📢 Announcement 📄 Paper

March

Major Release

GPT-4

OpenAI

Released: 2023-03-14

Type: LLM

Size: ~175B

Architecture: Transformer (reported MoE)

Context: 8K tokens (32K with gpt-4-32k)

Knowledge cutoff: 2021-09

License: Proprietary

The model that changed everything — GPT-4 set the standard for capable AI — OpenAI's most advanced system producing safer and more useful responses

💰 $30.00 in / $60.00 out per 1M tok 🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

8K context window
Multimodal capabilities
Enhanced reasoning
Improved factual accuracy

📊 Benchmarks

86.4%

MMLU

67.0%

HumanEval

52.9%

MATH

~90th percentile

Bar exam

🔄 What's new vs previous version

Passed bar exam (top 10%)
Vision input (GPT-4V)
Multimodal

API Only

🔍 Full spec sheet → 📢 Announcement

Major Release

Claude 1.3

Anthropic

Released: 2023-03-14

Type: LLM

Size: ~52B

Architecture: Transformer

Context: 100K tokens

Knowledge cutoff: 2022-12

License: Proprietary

Anthropic's first public model — 100K context ahead of its time — Anthropic's AI assistant built using Constitutional AI methods

🎛 In: text 📤 Out: text 🌐 Anthropic API

✨ Key Features

Constitutional AI
Helpful and harmless
Long conversations
Improved reasoning

📊 Benchmarks

75.0%

MMLU

56.0%

HumanEval

36.0%

MATH

API Only

🔍 Full spec sheet → 📢 Announcement

2022

November

Major Release

ChatGPT (GPT-3.5 Turbo)

OpenAI

Released: 2022-11-30

Type: LLM

Size: ~175B

Architecture: Transformer (RLHF fine-tune of GPT-3.5)

Context: 16K tokens

Knowledge cutoff: 2021-09

License: Proprietary

The product that launched the AI era — 100M users in 2 months — Conversational AI that sparked mainstream adoption of large language models

🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

Conversational interface
Fine-tuned for chat
RLHF training
Fast response times

📊 Benchmarks

70.0%

MMLU

48.1%

HumanEval

34.1%

MATH

🔄 What's new vs previous version

Conversational interface
RLHF alignment
Faster and cheaper than GPT-4

Free + Paid Tiers

🔍 Full spec sheet → 📢 Announcement

April

Research

PaLM

Google

Released: 2022-04-04

Type: LLM

Size: 540B

Architecture: Pathways Transformer

License: Proprietary (research)

Google's 540B pathways model — first to demonstrate chain-of-thought at scale — Google's 540-billion parameter language model demonstrating breakthrough capabilities

🎛 In: text 📤 Out: text

✨ Key Features

Large parameter count
Few-shot learning
Reasoning capabilities
Code generation

📊 Benchmarks

69.3%

MMLU

26.2%

HumanEval

8.8%

MATH

58.1%

BIG-bench

Research Only

🔍 Full spec sheet → 📢 Announcement 📄 Paper

2021

August

Major Release

Codex

OpenAI

Released: 2021-08-10

Type: Code

Size: ~12B

Architecture: Transformer (GPT-3 fine-tune)

Context: 4K tokens

License: Proprietary

GPT-3 trained on code — the engine behind GitHub Copilot v1 — AI system that translates natural language to code

🎛 In: text, code 📤 Out: code 🌐 OpenAI API (deprecated) · GitHub Copilot

✨ Key Features

Code generation
Natural language to code
Multiple programming languages
GitHub Copilot integration

📊 Benchmarks

28.8%

HumanEval

59.6%

MBPP

25.0%

APPS

API Only

🔍 Full spec sheet → 📢 Announcement 📄 Paper

2020

June

Major Release

GPT-3

OpenAI

Released: 2020-06-11

Type: LLM

Size: 175B

Architecture: Transformer

Context: 2K tokens

Knowledge cutoff: 2019-10

License: Proprietary (API)

The model that showed scaling works — 175B parameters, few-shot learning pioneer — Breakthrough large language model that demonstrated emergent capabilities

🎛 In: text 📤 Out: text 🌐 OpenAI API

✨ Key Features

175 billion parameters
Few-shot learning
Text generation
Multiple capabilities

📊 Benchmarks

43.9%

MMLU

0.0%

HumanEval

5.2%

MATH

API Only

🔍 Full spec sheet → 📢 Announcement 📄 Paper