Google's flagship reasoning model with a 2x jump on hard multi-step tasks.
Google's latest flagship model with a major 2x jump in reasoning capabilities
| Benchmark | Gemini 3.1 Pro | Gemini 3 Pro | Δ |
|---|---|---|---|
| ARC-AGI-2 | 77.1% | — | — |
| MMLU | 93.8% | 93.2% | +0.6 |
| MATH | 89.4% | — | — |
| MMLU-Pro | 93.8% | 89.4% | +4.4 |
| GPQA Diamond | 84.2% | 78.5% | +5.7 |
| SWE-bench Verified | 72.3% | 68.2% | +4.1 |
| LiveCodeBench | 78.9% | — | — |