Top 25 AI Models — Complete Leaderboard
Compare the top 25 AI language models ranked by coding, reasoning, and tool-use benchmarks. All pricing and specifications verified against official provider documentation.
Need the cheapest strong model?
Start with MiniMax M2.5 or Grok 4.1 Fast if cost matters more than bragging rights.
See quick picks →Need self-hosted privacy?
2 models in this leaderboard support open-weights / local-first workflows.
Explore local models →Want the full tradeoff?
Compare open weights vs proprietary models before you commit to an API provider.
Read the guide → 1
Claude Opus 4.6
Anthropic • 200K context
9.7
Coding
9.8
Reasoning
9.5
Tool Use
Complex reasoningCritical decisionsLong-form analysis
$5
Input $/M tokens
$25
Output $/M tokens
9.69
Weighted Score
2
GPT-5.4
OpenAI • 1.05M context
9.8
Coding
9.5
Reasoning
9.7
Tool Use
CodingAgentsTool integration
$2.5
Input $/M tokens
$15
Output $/M tokens
9.67
Weighted Score
3
Gemini 3.1 Pro
Google • 1M context
9.5
Coding
9.5
Reasoning
9.3
Tool Use
Multimodal tasksLong contextSearch integration
$1.25
Input $/M tokens
$5
Output $/M tokens
9.45
Weighted Score
4
Claude Sonnet 4.6
Anthropic • 200K context
9.4
Coding
9.3
Reasoning
9.1
Tool Use
Balanced performanceProduction workloadsCost-efficient
$3
Input $/M tokens
$15
Output $/M tokens
9.29
Weighted Score
5
GPT-5.3 Codex
OpenAI • 200K context
9.7
Coding
9.3
Reasoning
9.4
Tool Use
Coding-focused tasksType inferenceAgentic coding
$3
Input $/M tokens
$15
Output $/M tokens
9.48
Weighted Score
6
GLM-5
Zhipu AI • 205K context
9.2
Coding
9.3
Reasoning
9
Tool Use
Bilingual (CN/EN)Value-focusedEnterprise
$0.5
Input $/M tokens
$2
Output $/M tokens
9.18
Weighted Score
7
Llama 4 (405B)
Meta • 128K context
9
Coding
9.1
Reasoning
8.7
Tool Use
Self-hostedOpen sourceCustomizable
$2
Input $/M tokens
$8
Output $/M tokens
8.96
Weighted Score
8
DeepSeek V3
DeepSeek • 128K context
8.8
Coding
8.9
Reasoning
8.5
Tool Use
Budget codingHigh-volumeCost-sensitive
$0.27
Input $/M tokens
$1.1
Output $/M tokens
8.76
Weighted Score
9
GPT-5.2
OpenAI • 128K context
9.3
Coding
9.2
Reasoning
9
Tool Use
General-purposeBalanced tasks
$1.75
Input $/M tokens
$14
Output $/M tokens
9.19
Weighted Score
10
Mistral Large 3
Mistral • 128K context
8.9
Coding
9
Reasoning
8.6
Tool Use
European complianceMultilingualEnterprise
$2
Input $/M tokens
$6
Output $/M tokens
8.86
Weighted Score
11
Kimi K2.5
Moonshot AI • 256K context
9.4
Coding
9.3
Reasoning
9.2
Tool Use
Visual codingLong contextAgent workflows
$0.6
Input $/M tokens
$2.5
Output $/M tokens
9.32
Weighted Score
12
MiniMax M2.5
MiniMax • 196K context
9.1
Coding
9.2
Reasoning
8.9
Tool Use
Real-world productivityCost-sensitiveHigh-volume
$0.3
Input $/M tokens
$1.2
Output $/M tokens
9.08
Weighted Score
13
Grok 4.1 Fast
xAI • 2M context
9
Coding
9.1
Reasoning
8.8
Tool Use
Long contextWeb searchX platform data
$0.2
Input $/M tokens
$0.5
Output $/M tokens
8.98
Weighted Score
14
Qwen 3 Max
Alibaba • 262K context
9.2
Coding
9.1
Reasoning
8.9
Tool Use
MultilingualEnterpriseChinese language
$1.2
Input $/M tokens
$6
Output $/M tokens
9.09
Weighted Score
15
GPT-OSS-120B
OpenAI • 128K context
9.3
Coding
9.2
Reasoning
9
Tool Use
Self-hostedPrivacyCustomization
$0
Input $/M tokens
$0
Output $/M tokens
9.19
Weighted Score