Updated May 24, 2026
llmleaderboard.in
Largest Context Window LLM in 2026
Maximum tokens per request — critical for long documents, full codebases, and agent traces. Effective usable context may be lower than advertised.
Llama 4 Scout advertises 10M tokens — the largest on our leaderboard. Gemini 3 Pro offers 2M; many GPT-5 and Claude variants support 1M tokens for enterprise RAG and repo-wide analysis.
| # | Model | Provider | Context | GPQA | Cost |
|---|---|---|---|---|---|
| 1 | Llama 4 Scout | Meta | 10M | 76.5% | Open |
| 2 | Gemini 3 Pro | 2M | 92.1% | $3.5 / $10.5 | |
| 3 | Claude Mythos Preview | Anthropic | 1M | 94.6% | Limited |
| 4 | Claude Opus 4.6 | Anthropic | 1M | 91.2% | $5 / $25 |
| 5 | Claude Sonnet 4.6 | Anthropic | 1M | 88.5% | $3 / $15 |
| 6 | GPT-5.5 | OpenAI | 1M | 93.6% | $5 / $30 |
| 7 | GPT-5.5 Pro | OpenAI | 1M | 94.2% | $30 / $180 |
| 8 | GPT-5.4 | OpenAI | 1M | 92.8% | $5 / $30 |
| 9 | GPT-5.4 Pro | OpenAI | 1M | 94.5% | $30 / $180 |
| 10 | GPT-5.4 Mini | OpenAI | 1M | 78.1% | $0.75 / $3 |
| 11 | GPT-4.1 | OpenAI | 1M | 82.4% | $2 / $8 |
| 12 | GPT-4.1 mini | OpenAI | 1M | 71.2% | $0.4 / $1.6 |
When context size matters
Large context helps ingest entire PDFs, monorepos, or long chat histories in one shot. For many apps, RAG with a smaller window plus retrieval is cheaper and more reliable than stuffing everything into the prompt.
See all 45 models with live benchmarks, speed, and pricing.
Open full LLM leaderboard →