First Gemini 3 tier release; strong multimodal + long-context.
Google's flagship model with Deep Think mode, ranked #1 on LMSYS Arena at launch
| Benchmark | Gemini 3 Pro | Gemini 2.5 Flash | Δ |
|---|---|---|---|
| ARC-AGI | 87.5% | — | — |
| MMLU | 93.2% | 87.5% | +5.7 |
| LMSYS Arena | #1 | — | — |
| MMLU-Pro | 89.4% | 82.8% | +6.6 |
| MMMU | 82.1% | 79.7% | +2.4 |
| GPQA Diamond | 78.5% | — | — |
| SWE-bench Verified | 68.2% | — | — |