AI Flash Report

Claude Sonnet 4.6 vs GPT-5.3 Codex: Benchmarks, Pricing & Capabilities Compared

TL;DR — Claude Sonnet 4.6 wins for long-context · GPT-5.3 Codex wins for cost.

Claude Sonnet 4.6 Anthropic
Released
2026-02-17
Context window
500K tokens
Input price
$3.00 / Mtok
Output price
$15.00 / Mtok
Key features
  • Agent Teams: orchestrate 2-16 Claude instances
  • Near-Opus performance at 1/5th cost
  • 80.8% SWE-bench Verified
GPT-5.3 Codex OpenAI
Released
2026-02-05
Context window
400K tokens
Input price
$1.25 / Mtok
Output price
$10.00 / Mtok
Key features
  • Self-improving agentic coding
  • 25% faster than GPT-5.2-Codex
  • 1,000+ tokens/sec generation

Benchmark comparison

Benchmark Claude Sonnet 4.6 GPT-5.3 Codex
HumanEval 95.2% 96.8%
SWE-bench Verified 80.8% 82.4%

Pricing comparison

Metric Claude Sonnet 4.6 GPT-5.3 Codex
Input ($/Mtok) $3.00 $1.25
Output ($/Mtok) $15.00 $10.00
Cached input ($/Mtok) $0.30 $0.13
Cost per 1M-token roundtrip (1M in + 1M out) $18.00 $11.25

Context window & modalities

Attribute Claude Sonnet 4.6 GPT-5.3 Codex
Context window 500K tokens 400K tokens
Input modalities text, image, PDF text, image
Output modalities text text
Knowledge cutoff 2025-10 2025-11

Verdict by use case

Coding
→ GPT-5.3 Codex
Basis: SWE-bench

Claude Sonnet 4.6 80.8% vs GPT-5.3 Codex 82.4% on SWE-bench.

Reasoning
Insufficient data
Basis: GPQA / MMLU

No shared reasoning benchmark.

Math
Insufficient data
Basis: MATH / AIME

No shared math benchmark.

Long context
→ Claude Sonnet 4.6
Basis: Context window

Claude Sonnet 4.6 500K tokens vs GPT-5.3 Codex 400K tokens.

Cost
→ GPT-5.3 Codex
Basis: Input $/Mtok

Claude Sonnet 4.6 $3/Mtok vs GPT-5.3 Codex $1.25/Mtok input.

Changelog & releases

Claude Sonnet 4.6
Released 2026-02-17
  • Agent Teams: orchestrate 2–16 Claude instances in parallel
  • +8.5pt on SWE-bench Verified vs Sonnet 4
  • 1/5 the cost of Opus 4.5 at ~95% of coding quality
  • Fast mode research preview for lower-latency inference
GPT-5.3 Codex
Released 2026-02-05
Predecessor: openai-gpt-5-2-codex
  • +4pt on SWE-bench Verified vs GPT-5.2 Codex
  • Native IDE tool-calling at reduced latency
  • Extended max output to 100K for multi-file patches

Related comparisons