Updated May 24, 2026
llmleaderboard.in
Best Multilingual LLMs in 2026
Top models for translation, cross-language reasoning, and regional language support. Ranked by GPQA and multilingual benchmark performance.
GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro lead benchmark performance while Sarvam 105B offers India-focused multilingual support with open weights. Use this guide when your app needs strong cross-language reasoning or regional language coverage.
| # | Model | Provider | GPQA | Context | Cost |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | Anthropic | 94.6% | 1M | Limited |
| 2 | GPT-5.4 Pro | OpenAI | 94.5% | 1M | $30 / $180 |
| 3 | Gemini 3.1 Pro | 94.3% | 1M | $2 / $12 | |
| 4 | Claude Opus 4.7 | Anthropic | 94.2% | 200K | $5 / $25 |
| 5 | GPT-5.5 Pro | OpenAI | 94.2% | 1M | $30 / $180 |
| 6 | GPT-5.5 | OpenAI | 93.6% | 1M | $5 / $30 |
| 7 | GPT-5.4 | OpenAI | 92.8% | 1M | $5 / $30 |
| 8 | Gemini 3 Pro | 92.1% | 2M | $3.5 / $10.5 | |
| 9 | Claude Opus 4.6 | Anthropic | 91.2% | 1M | $5 / $25 |
| 10 | Kimi K2.6 | Moonshot | 91.1% | 256K | $0.75 / $3.50 |
| 11 | DeepSeek R2 | DeepSeek | 89.3% | 128K | $0.55 / $2.19 |
| 12 | Claude Sonnet 4.6 | Anthropic | 88.5% | 1M | $3 / $15 |
How to choose a multilingual model
Choose frontier models for the best reasoning and translation accuracy across many languages. Choose India-focused or open-weight models like Sarvam when you need stronger local language support or a lower-cost open stack.
See all 45 models with live benchmarks, speed, and pricing.
Open full LLM leaderboard →