Claude Fable 5 vs Gemma 4 12B: Benchmarks, Pricing & Capabilities Compared
TL;DR — Claude Fable 5 wins for reasoning + long-context · Gemma 4 12B wins for general use.
Claude Fable 5 Anthropic
- Released
- 2026-06-09
- Context window
- 1M tokens
- Input price
- $10.00 / Mtok
- Output price
- $50.00 / Mtok
Gemma 4 12B Google
- Released
- 2026-06-03
- Context window
- 131K tokens
- Input price
- $0.00 / Mtok
- Output price
- $0.00 / Mtok
Benchmark comparison
| Benchmark | Claude Fable 5 | Gemma 4 12B |
|---|---|---|
| AA Intelligence Index | 64.9 ✓ | 29.2 |
| GPQA Diamond | 92.6% ✓ | 75.3% |
| HLE | 53.3% ✓ | 14.8% |
| IF-Bench | 63.5% | 73.5% ✓ |
| LiveCodeBench Reasoning | 70.0% ✓ | 55.3% |
| SciCode | 60.2% ✓ | 38.2% |
| TAU2-bench | 98.5% ✓ | 36.3% |
| TerminalBench-Hard | 62.9% ✓ | 18.2% |
Pricing comparison
| Metric | Claude Fable 5 | Gemma 4 12B |
|---|---|---|
| Input ($/Mtok) | $10.00 | $0.00 |
| Output ($/Mtok) | $50.00 | $0.00 |
| Cached input ($/Mtok) | — | — |
| Cost per 1M-token roundtrip (1M in + 1M out) | $60.00 | $0.00 |
Context window & modalities
| Attribute | Claude Fable 5 | Gemma 4 12B |
|---|---|---|
| Context window | 1M tokens | 131K tokens |
| Input modalities | text, image | text, image, video |
| Output modalities | text | text |
| Knowledge cutoff | — | — |
Verdict by use case
Coding
Insufficient data
Basis: SWE-bench
No shared coding benchmark.
Reasoning
→ Claude Fable 5
Basis: GPQA Diamond
Claude Fable 5 92.6% vs Gemma 4 12B 75.3% on GPQA Diamond.
Math
Insufficient data
Basis: MATH / AIME
No shared math benchmark.
Long context
→ Claude Fable 5
Basis: Context window
Claude Fable 5 1M tokens vs Gemma 4 12B 131K tokens.
Cost
→ Gemma 4 12B
Basis: Input $/Mtok
Claude Fable 5 $10/Mtok vs Gemma 4 12B $0/Mtok input.
Changelog & releases
Claude Fable 5
Released 2026-06-09
Gemma 4 12B
Released 2026-06-03