AI Model Release Timeline 2025–2026

Every LLM launch tracked — GPT-5, Claude 4, Gemini 2, Llama 4 and more. Updated weekly with launch dates, benchmarks, and capabilities.

59 model releases tracked | Last updated: 2026-04-20 09:40 UTC | Compare models side by side →

2026

February

Major Release

Gemini 3.1 Pro

Google
Released: 2026-02-19
Type: LLM
Size: ~1T MoE
Architecture: Sparse Mixture-of-Experts (MoE)
Context: 2M tokens
Knowledge cutoff: 2025-12
License: Proprietary
Google's flagship reasoning model with a 2x jump on hard multi-step tasks. — Google's latest flagship model with a major 2x jump in reasoning capabilities
💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • 2x reasoning improvement
  • ARC-AGI-2 score of 77.1%
  • Enhanced multimodal understanding
  • Deep Think mode

📊 Benchmarks

77.1%
ARC-AGI-2
93.8%
MMLU
89.4%
MATH
93.8%
MMLU-Pro
84.2%
GPQA Diamond
72.3%
SWE-bench Verified
78.9%
LiveCodeBench

🔄 What's new vs previous version

  • 2x reasoning score on ARC-AGI-2 vs Gemini 3 Pro
  • Context window expanded to 2M tokens
  • Deep Think mode enabled by default on the Pro tier
  • Lower latency on first-token despite larger context
Major Release

Claude Sonnet 4.6

Anthropic
Released: 2026-02-17
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 500K tokens
Knowledge cutoff: 2025-10
License: Proprietary
Near-Opus quality at a fraction of the cost, with Agent Teams orchestration. — Anthropic's latest Sonnet with Agent Teams capability and near-Opus performance at a fraction of the cost
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Agent Teams: orchestrate 2-16 Claude instances
  • Near-Opus performance at 1/5th cost
  • 80.8% SWE-bench Verified
  • Fast mode research preview

📊 Benchmarks

80.8%
SWE-bench
92.1%
MMLU
95.2%
HumanEval
80.8%
SWE-bench Verified
78.4%
GPQA Diamond
88.5%
AIME 2025
71.2%
TAU-bench

🔄 What's new vs previous version

  • Agent Teams: orchestrate 2–16 Claude instances in parallel
  • +8.5pt on SWE-bench Verified vs Sonnet 4
  • 1/5 the cost of Opus 4.5 at ~95% of coding quality
  • Fast mode research preview for lower-latency inference
Update

DeepSeek V3.2

DeepSeek
Released: 2026-02-12
Type: LLM
Size: 671B MoE
Architecture: Sparse MoE (37B active / 671B total)
Context: 1M tokens
Knowledge cutoff: 2025-09
License: DeepSeek License (open weights, commercial OK)
Open-weight MoE with a 1M+ token context window and strong coding. — Major update with 10x context window expansion to over 1 million tokens
💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI · Fireworks AI

✨ Key Features

  • 1M+ token context window (10x expansion)
  • Improved reasoning capabilities
  • Open source release
  • Cost-effective inference

📊 Benchmarks

90.1%
MMLU
92.5%
HumanEval
1M+ tokens
Context Window
85.6%
MATH
68.4%
GPQA
72.1%
LiveCodeBench

🔄 What's new vs previous version

  • 10x context window expansion (128K → 1M+ tokens)
  • Sliding-window attention for long-context throughput
  • Improved chain-of-thought reasoning
  • Native FP8 inference support
Major Release

GLM-5

Zhipu AI
Released: 2026-02-11
Type: LLM
Size: 744B
Architecture: Dense Transformer (744B)
Context: 200K tokens
Knowledge cutoff: 2025-11
License: Proprietary (open weights for non-frontier sizes)
First frontier model trained entirely on Huawei Ascend silicon. — First frontier AI model trained entirely without NVIDIA GPUs, using Huawei Ascend chips
💰 $0.11 in / $0.28 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Zhipu BigModel API

✨ Key Features

  • First frontier model trained on Huawei Ascend chips (no NVIDIA)
  • #1 HLE score (50.4%)
  • 1.2% hallucination rate via Slime RL
  • 136x cheaper than Claude Opus 4.5

📊 Benchmarks

50.4%
HLE
1.2%
Hallucination Rate
$0.11/M tokens
Cost
88.7%
MMLU
92.1%
C-Eval
94.8%
GSM8K

🔄 What's new vs previous version

  • Trained entirely on Huawei Ascend 910B clusters (no NVIDIA)
  • Slime RL fine-tuning drops hallucination rate to 1.2%
  • 136x cheaper than Claude Opus 4.5 at comparable quality
Major Release

Seedance 2.0

ByteDance
Released: 2026-02-10
Type: Video
Size: ~10B
Architecture: Diffusion Transformer
Knowledge cutoff: 2025-12
License: Proprietary
ByteDance's video generation flagship — cinematic quality at scale — First commercial video model generating synchronized audio and video in a single pass
🎛 In: text, image 📤 Out: video 🌐 Volcengine API

✨ Key Features

  • Synchronized audio + video generation in one pass
  • Lip-sync in 8+ languages
  • 10-30x cheaper than Sora 2
  • Commercial pricing: $0.10-$0.80/min

📊 Benchmarks

Excellent
Audio Sync
8+
Languages
$0.10-$0.80/min
Cost
Major Release

GPT-5.3 Codex

OpenAI
Released: 2026-02-05
Type: Code
Size: ~200B
Architecture: MoE (coding-specialized fine-tune)
Context: 400K tokens
Knowledge cutoff: 2025-11
License: Proprietary
Coding-specialized variant of GPT-5.3, tuned for agentic IDE workflows. — OpenAI's specialized self-improving coding model with state-of-the-art software engineering performance
💰 $1.25 in / $10.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI · GitHub Copilot

✨ Key Features

  • Self-improving agentic coding
  • 25% faster than GPT-5.2-Codex
  • 1,000+ tokens/sec generation
  • First OpenAI model flagged 'high' on cybersecurity framework

📊 Benchmarks

77.3%
Terminal-Bench
SOTA
SWE-Bench Pro
1,000+ tok/s
Speed
82.4%
SWE-bench Verified
96.8%
HumanEval
84.2%
LiveCodeBench
79.5%
Aider Polyglot

🔄 What's new vs previous version

  • +4pt on SWE-bench Verified vs GPT-5.2 Codex
  • Native IDE tool-calling at reduced latency
  • Extended max output to 100K for multi-file patches

January

Major Release

Kimi K2

Moonshot AI
Released: 2026-01-20
Type: LLM
Size: 1.04T MoE
Architecture: MoE (32B active / ~1T total)
Context: 2M tokens
Knowledge cutoff: 2025-10
License: Modified MIT (open weights)
Moonshot's open-weight frontier MoE with strong agentic benchmarks. — First open-weight model to rank #1 on LMSYS Chatbot Arena with over 1 trillion parameters
💰 $0.15 in / $2.50 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Moonshot API · Hugging Face · Together AI

✨ Key Features

  • First open-weight model #1 on LMSYS Chatbot Arena
  • 1.04 trillion parameters
  • K2.5 agent swarms with up to 100 sub-agents
  • $0.15/M input tokens

📊 Benchmarks

#1
LMSYS Arena
1.04T
Parameters
$0.15/M tokens
Cost
91.3%
MMLU
65.8%
SWE-bench Verified
74.1%
GPQA Diamond
68.9%
LiveCodeBench

🔄 What's new vs previous version

  • 2M token context window (20x vs first Kimi)
  • Agentic tool-use tuning via MuonClip optimizer
  • Open weights under modified MIT

2025

December

Update

GPT-5.2 Codex

OpenAI
Released: 2025-12-18
Type: Code
Size: ~200B
Architecture: MoE (coding fine-tune)
Context: 256K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Prior-gen Codex variant of GPT-5.2 for agentic coding. — Specialized coding variant of GPT-5.2 focused on software engineering tasks
💰 $1.50 in / $12.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Specialized for software engineering
  • Enhanced agentic coding
  • Multi-file refactoring
  • Advanced debugging capabilities

📊 Benchmarks

SOTA
SWE-Bench
95.1%
HumanEval
72.8%
Terminal-Bench
78.2%
SWE-bench Verified
80.4%
LiveCodeBench
Major Release

Mistral Large 3

Mistral
Released: 2025-12-15
Type: LLM
Size: ~123B
Architecture: Dense Transformer
Context: 256K tokens
Knowledge cutoff: 2025-07
License: Mistral Commercial License
Mistral's flagship proprietary model, tuned for European enterprise. — Mistral's flagship model competing with GPT-5 class models at a fraction of the cost
💰 $2.00 in / $6.00 out per 1M tok 🎛 In: text, image 📤 Out: text 🌐 Mistral La Plateforme · Azure · AWS Bedrock

✨ Key Features

  • 128K context window
  • Improved multilingual capabilities
  • Enhanced function calling
  • Competitive with GPT-5 class models

📊 Benchmarks

89.4%
MMLU
91.2%
HumanEval
82.1%
MATH
76.8%
MMMU
Update

GPT-5.2

OpenAI
Released: 2025-12-11
Type: LLM
Size: ~200B
Architecture: MoE
Context: 400K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Late-2025 GPT-5 refresh with improved reasoning and steerability. — Iterative improvement on GPT-5.1 with enhanced reasoning and faster performance
💰 $2.00 in / $10.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Enhanced reasoning capabilities
  • Improved adaptive reasoning
  • Better multimodal understanding
  • Faster inference

📊 Benchmarks

92.8%
MMLU
88.5%
MATH
95.8%
HumanEval
90.8%
MMLU-Pro
80.1%
GPQA Diamond
72.5%
SWE-bench Verified
92.1%
AIME 2025

November

Major Release

Claude Opus 4.5

Anthropic
Released: 2025-11-24
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 500K tokens
Knowledge cutoff: 2025-08
License: Proprietary
Anthropic's top-tier reasoning model for complex research and agents. — Anthropic's most capable model with breakthrough coding performance and major price reduction
💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • First model to break 80.9% on SWE-Bench Verified
  • 67% price reduction vs previous Opus
  • Extended reasoning capabilities
  • Advanced coding performance

📊 Benchmarks

80.9%
SWE-bench
92.8%
MMLU
95.0%
HumanEval
78.9%
SWE-bench Verified
82.4%
GPQA Diamond
90.5%
AIME 2025
Major Release

Gemini 3 Pro

Google
Released: 2025-11-18
Type: Multimodal
Size: ~1T MoE
Architecture: Sparse MoE
Context: 1M tokens
Knowledge cutoff: 2025-09
License: Proprietary
First Gemini 3 tier release; strong multimodal + long-context. — Google's flagship model with Deep Think mode, ranked #1 on LMSYS Arena at launch
💰 $2.50 in / $10.00 out per 1M tok 🎛 In: text, image, audio, video, PDF 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • 1M token context window
  • Deep Think reasoning mode
  • Solved 5/6 IMO 2025 problems
  • #1 on LMSYS Arena

📊 Benchmarks

87.5%
ARC-AGI
93.2%
MMLU
#1
LMSYS Arena
89.4%
MMLU-Pro
82.1%
MMMU
78.5%
GPQA Diamond
68.2%
SWE-bench Verified
Major Release

GPT-5.1

OpenAI
Released: 2025-11-12
Type: LLM
Size: ~200B
Architecture: MoE
Context: 400K tokens
Knowledge cutoff: 2025-06
License: Proprietary
Maintenance update to GPT-5 with steerability + latency improvements. — Major GPT-5 iteration with adaptive reasoning and perfect scores on math competitions
💰 $2.25 in / $11.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Adaptive reasoning modes
  • Perfect 100% on AIME 2025
  • 87.5% on ARC-AGI
  • Enhanced multimodal capabilities

📊 Benchmarks

87.5%
ARC-AGI
100%
AIME 2025
92.5%
MMLU
89.2%
MMLU-Pro
77.8%
GPQA Diamond
70.1%
SWE-bench Verified

August

Major Release

GPT-5

OpenAI
Released: 2025-08-15
Type: LLM
Size: ~200B
Architecture: MoE with unified reasoning router
Context: 400K tokens
Knowledge cutoff: 2025-05
License: Proprietary
OpenAI's flagship unified reasoning + chat model replacing the GPT-4 line. — OpenAI's next-generation flagship model with adaptive reasoning capabilities
💰 $2.50 in / $12.00 out per 1M tok 🎛 In: text, image, audio 📤 Out: text, audio 🌐 OpenAI API · Azure OpenAI · ChatGPT

✨ Key Features

  • Adaptive reasoning (routes between quick and deep thinking)
  • Improved math and coding
  • Enhanced multimodal reasoning
  • New safety architecture

📊 Benchmarks

91.0%
MMLU
93.5%
HumanEval
90.1%
MATH
87.5%
MMLU-Pro
74.2%
GPQA Diamond
67.4%
SWE-bench Verified

July

Update

Claude Opus 4.1

Anthropic
Released: 2025-07-15
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2025-03
License: Proprietary
Mid-2025 Opus refresh focused on agentic coding reliability. — Iterative improvement on Claude Opus 4 with enhanced multi-file refactoring
💰 $15.00 in / $75.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Improved multi-file refactoring
  • Enhanced agentic capabilities
  • Better long-context performance
  • Reduced hallucinations

📊 Benchmarks

75.2%
SWE-bench
91.2%
MMLU
94.0%
HumanEval
74.5%
SWE-bench Verified
79.1%
GPQA Diamond

June

Update

Gemini 2.5 Flash

Google
Released: 2025-06-20
Type: Multimodal
Size: ~175B
Architecture: Dense multimodal transformer
Context: 1M tokens
Knowledge cutoff: 2025-01
License: Proprietary
Google's cost-optimized multimodal model with thinking mode. — Google's fast and cost-effective model with enhanced image capabilities
💰 $0.30 in / $2.50 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • Enhanced image editing stabilization
  • Faster inference
  • Improved multimodal understanding
  • Cost-effective deployment

📊 Benchmarks

87.5%
MMLU
2x Gemini 2.0 Flash
Speed
High
Image Quality
82.8%
MMLU-Pro
79.7%
MMMU
63.5%
LiveCodeBench

May

Major Release

Claude Sonnet 4

Anthropic
Released: 2025-05-22
Type: LLM
Size: ~500B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2025-03
License: Proprietary
Claude 4 mid-tier with strong coding and long-horizon agentic reliability. — Latest generation Claude model with significant performance improvements
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Enhanced reasoning capabilities
  • Improved safety measures
  • Advanced multimodal understanding
  • Extended context window

📊 Benchmarks

88.7%
MMLU
94.5%
HumanEval
76.8%
MATH
72.3%
SWE-bench Verified
74.0%
GPQA Diamond

February

Update

Claude Sonnet 3.7

Anthropic
Released: 2025-02-24
Type: LLM
Size: ~300B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2024-11
License: Proprietary
Extended-thinking update to Sonnet 3.5 with visible reasoning toggle. — Iterative improvement on Claude 3.5 with enhanced capabilities
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • Improved reasoning
  • Better code generation
  • Enhanced safety
  • Reduced hallucinations

📊 Benchmarks

86.1%
MMLU
93.2%
HumanEval
74.1%
MATH
62.3%
SWE-bench Verified
68.3%
GPQA Diamond

2024

December

Major Release

DeepSeek-V3

DeepSeek
Released: 2024-12-26
Type: LLM
Size: 671B
Architecture: Sparse MoE (37B active / 671B total)
Context: 128K tokens
Knowledge cutoff: 2024-07
License: DeepSeek License (open weights)
DeepSeek's breakthrough open-weight MoE rivaling GPT-4-class quality. — DeepSeek's most advanced open-source model with MoE architecture
💰 $0.27 in / $1.10 out per 1M tok 🎛 In: text 📤 Out: text 🌐 DeepSeek API · Hugging Face · Together AI

✨ Key Features

  • Mixture of Experts architecture
  • Cost-effective training
  • Open source release
  • Strong reasoning capabilities

📊 Benchmarks

88.5%
MMLU
82.6%
HumanEval
61.6%
MATH
Major Release

Gemini 2.0 Flash

Google
Released: 2024-12-11
Type: Multimodal
Size: ~175B
Architecture: Dense multimodal transformer
Context: 1M tokens
Knowledge cutoff: 2024-08
License: Proprietary
First Gemini 2 model — fast, cheap, multimodal, with tool use native. — Google's next-generation model with native multimodal capabilities
💰 $0.10 in / $0.40 out per 1M tok 🎛 In: text, image, audio, video 📤 Out: text, image, audio 🌐 Google AI Studio · Vertex AI · Gemini API

✨ Key Features

  • Native multimodal generation
  • Real-time API
  • Agentic capabilities
  • Enhanced speed

📊 Benchmarks

85.8%
MMLU
71.9%
HumanEval
58.8%
MATH
76.4%
MMLU-Pro
70.7%
MMMU
35.1%
LiveCodeBench

October

Major Release

Moonshot Kimi

Moonshot AI
Released: 2024-10-16
Type: LLM
Size: ~100B
Architecture: Transformer
Context: 1M tokens
Knowledge cutoff: 2024-08
License: Proprietary
China's long-context pioneer — 1M token window at launch — Chinese AI company's large language model optimized for Asian languages
🎛 In: text, file 📤 Out: text 🌐 Moonshot API

✨ Key Features

  • 200K context window
  • Multilingual support
  • Document processing
  • Chinese language optimization

📊 Benchmarks

77.9%
C-Eval
75.2%
CMMLU
65.8%
HumanEval

September

Major Release

GPT-o1-preview

OpenAI
Released: 2024-09-12
Type: LLM
Size: ~175B
Architecture: Transformer with chain-of-thought reasoning
Context: 128K tokens
Knowledge cutoff: 2024-04
License: Proprietary
OpenAI's first public reasoning model — 'thinking before answering' — OpenAI's most advanced reasoning model designed for complex problem-solving
🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Advanced reasoning capabilities
  • Chain-of-thought processing
  • Enhanced problem-solving
  • Improved mathematical reasoning

📊 Benchmarks

83rd percentile
AIME
78%
GPQA
83%
MATH

🔄 What's new vs previous version

  • Explicit reasoning tokens
  • Best-in-class math and science
  • +34% AIME 2024 vs GPT-4o

August

Major Release

Grok-2

xAI
Released: 2024-08-13
Type: LLM
Size: ~314B
Architecture: Transformer
Context: 128K tokens
Knowledge cutoff: Real-time (X feed)
License: Proprietary
Elon's second-gen Grok — real-time X/Twitter data access — xAI's flagship model with real-time web access and multimodal capabilities
🎛 In: text, image 📤 Out: text 🌐 xAI API · x.com (Grok)

✨ Key Features

  • Real-time information access
  • Multimodal understanding
  • X platform integration
  • Conversational AI

📊 Benchmarks

84.0%
MMLU
74.1%
HumanEval
56.0%
MATH

🔄 What's new vs previous version

  • Vision input
  • Real-time X data
  • Improved reasoning

July

Major Release

GitHub Copilot Chat

Microsoft
Released: 2024-07-25
Type: Code
Size: ~70B
Architecture: GPT-4 based
License: Proprietary (SaaS)
AI pair programmer embedded in VS Code and GitHub — 1M+ enterprise seats — AI-powered coding assistant with conversational interface
🎛 In: text, code 📤 Out: text, code 🌐 GitHub / Microsoft

✨ Key Features

  • Conversational coding
  • IDE integration
  • Code explanation
  • Multi-language support

📊 Benchmarks

84.2%
HumanEval
78.9%
MBPP
71.4%
CodeT5

June

Major Release

Claude 3.5 Sonnet

Anthropic
Released: 2024-06-20
Type: LLM
Size: ~175B
Architecture: Dense Transformer (proprietary)
Context: 200K tokens
Knowledge cutoff: 2024-04
License: Proprietary
The mid-2024 Sonnet release that set the SOTA bar for coding and agents. — Anthropic's most intelligent model with significantly improved capabilities
💰 $3.00 in / $15.00 out per 1M tok 🎛 In: text, image, PDF 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Improved coding capabilities
  • Enhanced reasoning
  • Vision capabilities

📊 Benchmarks

88.7%
MMLU
92.0%
HumanEval
71.1%
MATH
49.0%
SWE-bench Verified
59.4%
GPQA Diamond

March

Update

Grok-1.5

xAI
Released: 2024-03-28
Type: LLM
Size: ~314B
Architecture: Transformer
Context: 128K tokens
License: Proprietary
Grok's reasoning upgrade — first xAI model with strong STEM benchmarks — Enhanced version of Grok with improved performance across benchmarks
🎛 In: text 📤 Out: text 🌐 xAI API

✨ Key Features

  • Improved reasoning
  • Code generation
  • 128K context window
  • Real-time data access

📊 Benchmarks

73.0%
MMLU
63.2%
HumanEval
50.6%
MATH

🔄 What's new vs previous version

  • 128K context (up from 8K)
  • +18% MATH vs Grok-1
Major Release

Claude 3 Opus

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Anthropic's most powerful pre-Claude 4 model — tops GPT-4 on reasoning — Most capable model in the Claude 3 family with near-human performance on complex tasks
🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Advanced reasoning
  • Multimodal capabilities
  • Constitutional AI training

📊 Benchmarks

86.8%
MMLU
84.9%
HumanEval
60.1%
MATH
95.0%
GSM8K

🔄 What's new vs previous version

  • 200K context
  • Vision input
  • +15% MMLU vs Claude 2
  • Tool use
Major Release

Claude 3 Sonnet

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Balanced Claude 3 variant — best price/performance in the family — Balanced model offering strong performance with faster response times
🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Balanced capability and speed
  • Multimodal input
  • Strong reasoning

📊 Benchmarks

79.0%
MMLU
73.0%
HumanEval
40.5%
MATH
Major Release

Claude 3 Haiku

Anthropic
Released: 2024-03-04
Type: LLM
Size: ~25B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-08
License: Proprietary
Fastest and cheapest Claude 3 — sub-second latency at $0.25/M — Fastest and most compact model in the Claude 3 family
🎛 In: text, image 📤 Out: text 🌐 Anthropic API · AWS Bedrock · Google Vertex AI

✨ Key Features

  • 200K context window
  • Fastest response times
  • Multimodal input
  • Cost-effective

📊 Benchmarks

75.2%
MMLU
75.9%
HumanEval
38.9%
MATH

February

Major Release

Mistral Large

Mistral
Released: 2024-02-26
Type: LLM
Size: ~70B
Top-tier reasoning model with strong multilingual capabilities

✨ Key Features

  • 32K context window
  • Multilingual capabilities
  • Function calling
  • JSON mode

📊 Benchmarks

81.2%
MMLU
45%
HumanEval
89.2%
HellaSwag
Major Release

Sora

OpenAI
Released: 2024-02-15
Type: Video
Size: ~1B
Architecture: Diffusion Transformer
License: Proprietary
OpenAI's text-to-video model — up to 60-second photorealistic clips — OpenAI's text-to-video AI model capable of generating minute-long videos
🎛 In: text, image 📤 Out: video 🌐 OpenAI (sora.com)

✨ Key Features

  • Text-to-video generation
  • 60-second video clips
  • Complex scene generation
  • Realistic motion physics

📊 Benchmarks

High
Video Quality
Excellent
Motion Coherence
95%
Text Adherence
Major Release

Gemini 1.5 Pro

Google
Released: 2024-02-15
Type: Multimodal
Size: ~175B
Architecture: MoE Transformer
Context: 1M tokens
Knowledge cutoff: 2023-11
License: Proprietary
Google's first 1M-context model — multimodal needle-in-haystack champion — Google's next-generation model with breakthrough long context capabilities
🎛 In: text, image, audio, video 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

  • 1M token context window
  • Multimodal understanding
  • Video analysis
  • Audio processing

📊 Benchmarks

81.9%
MMLU
71.9%
HumanEval
58.5%
MATH

🔄 What's new vs previous version

  • 1M token context
  • Multi-hour video understanding
  • MoE architecture

January

Major Release

text-embedding-3-large

OpenAI
Released: 2024-01-25
Type: Embedding
Size: ~7B
Architecture: Transformer encoder
License: Proprietary
OpenAI's best embedding model — 3× cheaper than ada-002 with better MTEB — OpenAI's most powerful text embedding model
🎛 In: text 📤 Out: embeddings 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 3072 embedding dimensions
  • Improved retrieval performance
  • Reduced hallucinations
  • Multi-language support

📊 Benchmarks

64.6%
MTEB Score
3072
Dimensions
100+
Languages
64.6%
MTEB avg
Major Release

GPT-4 Turbo

OpenAI
Released: 2024-01-25
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 128K tokens
Knowledge cutoff: 2023-04
License: Proprietary
GPT-4 with 128K context and knowledge through April 2023 — 3× cheaper than GPT-4 — Latest iteration of GPT-4 with improved performance and longer context window
🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 128K context window
  • Improved instruction following
  • Enhanced reasoning capabilities
  • Reduced hallucinations

📊 Benchmarks

86.4%
MMLU
67%
HumanEval
95.3%
HellaSwag

🔄 What's new vs previous version

  • 128K context (8× increase)
  • Updated knowledge cutoff
  • 3× cheaper than GPT-4

2023

December

Major Release

Midjourney v6

Midjourney
Released: 2023-12-21
Type: Vision
Size: ~5B
Architecture: Diffusion
License: Proprietary
Midjourney's photorealism leap — text rendering finally works — Midjourney's latest high-quality image generation model
🎛 In: text, image 📤 Out: image 🌐 Midjourney (Discord / Web)

✨ Key Features

  • Photorealistic image generation
  • Improved prompt understanding
  • Better human anatomy
  • Text rendering

📊 Benchmarks

Very High
Image Quality
92%
Prompt Adherence
Excellent
Style Versatility
Major Release

Grok-1

xAI
Released: 2023-12-07
Type: LLM
Size: ~314B
Architecture: MoE Transformer
Context: 8K tokens
License: Apache 2.0
xAI's open-source release — 314B MoE, first frontier model fully open-sourced — xAI's first major language model with real-time internet access
🎛 In: text 📤 Out: text 🌐 Self-hosted (HuggingFace)

✨ Key Features

  • Real-time information
  • Conversational interface
  • X platform integration
  • Uncensored responses

📊 Benchmarks

73.0%
MMLU
63.2%
HumanEval
62.9%
GSM8K
Research

AlphaCode 2

Google DeepMind
Released: 2023-12-06
Type: Code
Size: ~340B
Architecture: Transformer (Gemini-based)
License: Proprietary (research)
DeepMind's coding specialist — top 15% of competitive programmers — DeepMind's advanced code generation system for competitive programming
🎛 In: text, code 📤 Out: code

✨ Key Features

  • Advanced code generation
  • Competitive programming
  • Multi-language support
  • Problem decomposition

📊 Benchmarks

1747
Codeforces Rating
85th percentile
Problem Solving
10+ languages
Language Support
Top 15%
Codeforces percentile
Major Release

Gemini Ultra

Google
Released: 2023-12-06
Type: Multimodal
Size: ~540B
Architecture: MoE Transformer
Knowledge cutoff: 2023-06
License: Proprietary
Google's first model to beat GPT-4 on MMLU — 90%+ with CoT — Google's most capable multimodal AI model
🎛 In: text, image, audio, video 📤 Out: text 🌐 Google One AI Premium · Vertex AI

✨ Key Features

  • Multimodal reasoning
  • Text, image, audio, video understanding
  • Advanced mathematical reasoning
  • Code generation

📊 Benchmarks

90.0%
MMLU
74.4%
HumanEval
53.2%
MATH

🔄 What's new vs previous version

  • First model to exceed human expert on MMLU
  • Native multimodal
  • 32K context
Major Release

Gemini Pro

Google
Released: 2023-12-06
Type: Multimodal
Size: ~175B
Architecture: Transformer
Context: 32K tokens
Knowledge cutoff: 2023-06
License: Proprietary
Google's workhorse Gemini model — free tier in Google AI Studio — Google's balanced model for wide range of tasks
🎛 In: text, image 📤 Out: text 🌐 Google AI Studio · Vertex AI

✨ Key Features

  • Multimodal capabilities
  • 32K context window
  • Fast inference
  • Scalable deployment

📊 Benchmarks

79.1%
MMLU
67.7%
HumanEval
32.6%
MATH

November

Update

Claude 2.1

Anthropic
Released: 2023-11-21
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 200K tokens
Knowledge cutoff: 2023-01
License: Proprietary
Claude 2 update — 200K context and reduced hallucinations — Significant improvements in accuracy and honesty over Claude 2
🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

  • 200K context window
  • Reduced hallucination rates
  • Enhanced accuracy
  • Tool use capabilities

📊 Benchmarks

73.1%
MMLU
70.0%
HumanEval
71.1%
MATH

🔄 What's new vs previous version

  • 200K context (2× Claude 2)
  • 50% fewer hallucinations
  • Tool use beta
Major Release

Whisper v3

OpenAI
Released: 2023-11-06
Type: Audio
Size: ~1.55B
Architecture: Transformer encoder-decoder
License: MIT
State-of-the-art open speech recognition — 99 languages, open weights — OpenAI's multilingual speech recognition system
🎛 In: audio 📤 Out: text 🌐 OpenAI API · Self-hosted (HuggingFace) · Groq

✨ Key Features

  • Multilingual speech recognition
  • 99 language support
  • Robust noise handling
  • Real-time transcription

📊 Benchmarks

5.1%
WER English
99 languages
Language Coverage
0.8x
Real-time Factor
~2.7%
WER (English)

October

Major Release

DALL-E 3

OpenAI
Released: 2023-10-04
Type: Vision
Size: ~6B
Architecture: Diffusion
License: Proprietary
OpenAI's best image model — coherent text in images, integrated into ChatGPT — Latest iteration of DALL-E with significantly improved image generation
🎛 In: text 📤 Out: image 🌐 OpenAI API · ChatGPT Plus · Azure OpenAI

✨ Key Features

  • Improved prompt adherence
  • Higher resolution outputs
  • Better text rendering
  • Enhanced artistic styles

📊 Benchmarks

High
Image Quality
90%
Prompt Adherence
99%
Safety Filtering

September

Major Release

GPT-4V (Vision)

OpenAI
Released: 2023-09-25
Type: Multimodal
Size: ~175B
Architecture: Transformer + vision encoder
Context: 128K tokens
Knowledge cutoff: 2023-04
License: Proprietary
GPT-4 gains eyes — the model that made multimodal mainstream — GPT-4 enhanced with vision capabilities for image understanding
🎛 In: text, image 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Image understanding
  • Visual reasoning
  • Multimodal capabilities
  • Enhanced context understanding

📊 Benchmarks

86.4%
MMLU
49.9%
MathVista
78.2%
AI2D

August

Major Release

Code Llama 34B

Meta
Released: 2023-08-24
Type: Code
Size: 34B
Architecture: Transformer (Llama 2 fine-tune)
Context: 100K tokens
Knowledge cutoff: 2023-01
License: Llama 2 Community License
Meta's code-specialized open model — top open-source coding at launch — Specialized model for code generation built on Llama 2
🎛 In: text, code 📤 Out: code, text 🌐 Self-hosted · Together AI · Fireworks AI

✨ Key Features

  • Code generation
  • Code completion
  • Multiple programming languages
  • Large context window

📊 Benchmarks

48.8%
HumanEval
55.0%
MBPP
45.9%
MultiPL-E

July

Research

RT-2 (Robotics Transformer)

Google
Released: 2023-07-28
Type: Robotics
Size: ~55B
Architecture: Transformer (PaLI-X / PaLM-E based)
License: Proprietary (research)
Google's vision-language-action model — robots that reason from web knowledge — Google's vision-language-action model for robotic control
🎛 In: text, image 📤 Out: robot actions

✨ Key Features

  • Vision-language-action model
  • Robotic control
  • Web-scale training
  • Emergent skills

📊 Benchmarks

62%
Success Rate
3x improvement
Novel Tasks
High
Emergent Skills
Major Release

Stable Diffusion XL

Stability AI
Released: 2023-07-26
Type: Vision
Size: ~3.5B
Architecture: Latent Diffusion
License: CreativeML Open RAIL+M
Open-weight image generation flagship — photorealistic, runs locally — High-resolution text-to-image model with improved quality
🎛 In: text, image 📤 Out: image 🌐 Self-hosted · Stability AI API · AWS Bedrock · Replicate

✨ Key Features

  • 1024x1024 native resolution
  • Improved image quality
  • Better text rendering
  • Fine-tunable

📊 Benchmarks

1024x1024
Resolution
23.4
FID Score
31.8
CLIP Score
Major Release

Llama 2 70B

Meta
Released: 2023-07-18
Type: LLM
Size: 70B
Architecture: Transformer
Context: 4K tokens
Knowledge cutoff: 2023-01
License: Llama 2 Community License
Meta's open-weight landmark — 70B that matched GPT-3.5 and ignited open AI — Meta's open-source large language model with commercial license
🎛 In: text 📤 Out: text 🌐 Self-hosted · Together AI · AWS Bedrock · Azure AI

✨ Key Features

  • Open source
  • Commercial license
  • Improved safety
  • Enhanced performance

📊 Benchmarks

68.9%
MMLU
29.9%
HumanEval
13.5%
MATH
Major Release

Claude 2

Anthropic
Released: 2023-07-11
Type: LLM
Size: ~175B
Architecture: Transformer
Context: 100K tokens
Knowledge cutoff: 2023-01
License: Proprietary
Claude's first major leap — 100K context and better at instructions — Significant improvement over Claude 1 with enhanced capabilities
🎛 In: text 📤 Out: text 🌐 Anthropic API · AWS Bedrock

✨ Key Features

  • 100K context window
  • Improved safety
  • Enhanced reasoning
  • Better code generation

📊 Benchmarks

78.5%
MMLU
71.2%
HumanEval
88.0%
MATH

🔄 What's new vs previous version

  • 100K context (10× Claude 1)
  • Improved reasoning
  • Reduced refusals

May

Major Release

PaLM 2

Google
Released: 2023-05-10
Type: LLM
Size: ~340B
Architecture: Transformer
Knowledge cutoff: 2023-02
License: Proprietary
Google's multilingual flagship — powers Bard 2023, 100+ languages — Google's improved large language model powering Bard and other services
🎛 In: text 📤 Out: text 🌐 Google Cloud Vertex AI · Google AI Studio

✨ Key Features

  • Multilingual capabilities
  • Reasoning improvements
  • Coding abilities
  • Multiple model sizes

📊 Benchmarks

78.3%
MMLU
77.2%
HumanEval
34.3%
MATH

March

Major Release

GPT-4

OpenAI
Released: 2023-03-14
Type: LLM
Size: ~175B
Architecture: Transformer (reported MoE)
Context: 8K tokens (32K with gpt-4-32k)
Knowledge cutoff: 2021-09
License: Proprietary
The model that changed everything — GPT-4 set the standard for capable AI — OpenAI's most advanced system producing safer and more useful responses
🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • 8K context window
  • Multimodal capabilities
  • Enhanced reasoning
  • Improved factual accuracy

📊 Benchmarks

86.4%
MMLU
67.0%
HumanEval
52.9%
MATH
~90th percentile
Bar exam

🔄 What's new vs previous version

  • Passed bar exam (top 10%)
  • Vision input (GPT-4V)
  • Multimodal
Major Release

Claude 1.3

Anthropic
Released: 2023-03-14
Type: LLM
Size: ~52B
Architecture: Transformer
Context: 100K tokens
Knowledge cutoff: 2022-12
License: Proprietary
Anthropic's first public model — 100K context ahead of its time — Anthropic's AI assistant built using Constitutional AI methods
🎛 In: text 📤 Out: text 🌐 Anthropic API

✨ Key Features

  • Constitutional AI
  • Helpful and harmless
  • Long conversations
  • Improved reasoning

📊 Benchmarks

75.0%
MMLU
56.0%
HumanEval
36.0%
MATH

January

Research

MusicLM

Google
Released: 2023-01-26
Type: Audio
Size: ~1.2B
Architecture: Hierarchical sequence-to-sequence
License: Proprietary (research)
Google's text-to-music model — high-fidelity, 24kHz stereo from text prompts — Google's text-to-music generation system
🎛 In: text, audio (humming) 📤 Out: audio

✨ Key Features

  • Text-to-music generation
  • Hierarchical sequence modeling
  • Long-form music generation
  • Style control

📊 Benchmarks

24 kHz
Audio Quality
Several minutes
Generation Length
High
Style Adherence

2022

November

Major Release

ChatGPT (GPT-3.5 Turbo)

OpenAI
Released: 2022-11-30
Type: LLM
Size: ~175B
Architecture: Transformer (RLHF fine-tune of GPT-3.5)
Context: 16K tokens
Knowledge cutoff: 2021-09
License: Proprietary
The product that launched the AI era — 100M users in 2 months — Conversational AI that sparked mainstream adoption of large language models
🎛 In: text 📤 Out: text 🌐 OpenAI API · Azure OpenAI

✨ Key Features

  • Conversational interface
  • Fine-tuned for chat
  • RLHF training
  • Fast response times

📊 Benchmarks

70.0%
MMLU
48.1%
HumanEval
34.1%
MATH

🔄 What's new vs previous version

  • Conversational interface
  • RLHF alignment
  • Faster and cheaper than GPT-4

April

Major Release

DALL-E 2

OpenAI
Released: 2022-04-06
Type: Vision
Size: ~3.5B
Architecture: Diffusion with CLIP
License: Proprietary
OpenAI's image-generation breakthrough — photorealism and inpainting — AI system that creates realistic images and art from natural language descriptions
🎛 In: text, image 📤 Out: image 🌐 OpenAI API · Labs.openai.com

✨ Key Features

  • Text-to-image generation
  • High-resolution outputs
  • Style transfer
  • Inpainting capabilities

📊 Benchmarks

7.27
FID
35.7
CLIP Score
22.7
IS
Research

PaLM

Google
Released: 2022-04-04
Type: LLM
Size: 540B
Architecture: Pathways Transformer
License: Proprietary (research)
Google's 540B pathways model — first to demonstrate chain-of-thought at scale — Google's 540-billion parameter language model demonstrating breakthrough capabilities
🎛 In: text 📤 Out: text

✨ Key Features

  • Large parameter count
  • Few-shot learning
  • Reasoning capabilities
  • Code generation

📊 Benchmarks

69.3%
MMLU
26.2%
HumanEval
8.8%
MATH
58.1%
BIG-bench

January

Research

InstructGPT

OpenAI
Released: 2022-01-27
Type: LLM
Size: ~175B
Architecture: Transformer (GPT-3 + RLHF)
License: Proprietary
RLHF alignment applied to GPT-3 — the precursor to ChatGPT — GPT-3 fine-tuned to follow instructions using human feedback
🎛 In: text 📤 Out: text 🌐 OpenAI API

✨ Key Features

  • RLHF training
  • Instruction following
  • Reduced harmful outputs
  • Better alignment

📊 Benchmarks

58.0%
TruthfulQA
0.3%
RealToxicityPrompts
85.0%
Helpfulness

2021

August

Major Release

Codex

OpenAI
Released: 2021-08-10
Type: Code
Size: ~12B
Architecture: Transformer (GPT-3 fine-tune)
Context: 4K tokens
License: Proprietary
GPT-3 trained on code — the engine behind GitHub Copilot v1 — AI system that translates natural language to code
🎛 In: text, code 📤 Out: code 🌐 OpenAI API (deprecated) · GitHub Copilot

✨ Key Features

  • Code generation
  • Natural language to code
  • Multiple programming languages
  • GitHub Copilot integration

📊 Benchmarks

28.8%
HumanEval
59.6%
MBPP
25.0%
APPS

January

Research

DALL-E

OpenAI
Released: 2021-01-05
Type: Vision
Size: ~12B
Architecture: Transformer (discrete VAE + GPT-3)
License: Proprietary
The original image-generating AI — multimodal GPT-3 that shocked the world — Neural network that creates images from text captions
🎛 In: text 📤 Out: image

✨ Key Features

  • Text-to-image generation
  • Creative image synthesis
  • Concept combination
  • Style control

📊 Benchmarks

17.89
FID
17.9
IS
32.5
CLIP Score

2020

June

Major Release

GPT-3

OpenAI
Released: 2020-06-11
Type: LLM
Size: 175B
Architecture: Transformer
Context: 2K tokens
Knowledge cutoff: 2019-10
License: Proprietary (API)
The model that showed scaling works — 175B parameters, few-shot learning pioneer — Breakthrough large language model that demonstrated emergent capabilities
🎛 In: text 📤 Out: text 🌐 OpenAI API

✨ Key Features

  • 175 billion parameters
  • Few-shot learning
  • Text generation
  • Multiple capabilities

📊 Benchmarks

43.9%
MMLU
0.0%
HumanEval
5.2%
MATH