Granite 4.1 30B vs Mistral Medium 3.5: Benchmarks, Pricing & Capabilities Compared
TL;DR — Granite 4.1 30B wins for general use · Mistral Medium 3.5 wins for reasoning + long-context.
Granite 4.1 30B IBM
- Released
- 2026-04-29
- Context window
- 131K tokens
- Input price
- $0.00 / Mtok
- Output price
- $0.00 / Mtok
Mistral Medium 3.5 Mistral
- Released
- 2026-04-29
- Context window
- 256K tokens
- Input price
- $1.50 / Mtok
- Output price
- $7.50 / Mtok
Benchmark comparison
| Benchmark | Granite 4.1 30B | Mistral Medium 3.5 |
|---|---|---|
| AA Intelligence Index | 14.7 | 39.2 ✓ |
| GPQA Diamond | 48.1% | 74.8% ✓ |
| HLE | 4.2% | 12.8% ✓ |
| IF-Bench | 44.4% | 68.8% ✓ |
| LiveCodeBench Reasoning | 18.7% | 61.0% ✓ |
| SciCode | 25.8% | 39.6% ✓ |
| TAU2-bench | 42.1% | 94.2% ✓ |
| TerminalBench-Hard | 2.3% | 33.3% ✓ |
Pricing comparison
| Metric | Granite 4.1 30B | Mistral Medium 3.5 |
|---|---|---|
| Input ($/Mtok) | $0.00 | $1.50 |
| Output ($/Mtok) | $0.00 | $7.50 |
| Cached input ($/Mtok) | — | — |
| Cost per 1M-token roundtrip (1M in + 1M out) | $0.00 | $9.00 |
Context window & modalities
| Attribute | Granite 4.1 30B | Mistral Medium 3.5 |
|---|---|---|
| Context window | 131K tokens | 256K tokens |
| Input modalities | text | text, image |
| Output modalities | text | text |
| Knowledge cutoff | — | — |
Verdict by use case
Coding
Insufficient data
Basis: SWE-bench
No shared coding benchmark.
Reasoning
→ Mistral Medium 3.5
Basis: GPQA Diamond
Granite 4.1 30B 48.1% vs Mistral Medium 3.5 74.8% on GPQA Diamond.
Math
Insufficient data
Basis: MATH / AIME
No shared math benchmark.
Long context
→ Mistral Medium 3.5
Basis: Context window
Granite 4.1 30B 131K tokens vs Mistral Medium 3.5 256K tokens.
Cost
→ Granite 4.1 30B
Basis: Input $/Mtok
Granite 4.1 30B $0/Mtok vs Mistral Medium 3.5 $1.5/Mtok input.
Changelog & releases
Granite 4.1 30B
Released 2026-04-29
Mistral Medium 3.5
Released 2026-04-29