Top 25 AI Models — Complete Leaderboard
Compare the top 25 AI language models ranked by coding, reasoning, and tool-use benchmarks. All pricing and specifications verified against official provider documentation.
1
Claude Opus 4.6 (Adaptive)
Anthropic • 200K context
9.6
Coding
9.7
Reasoning
9.4
Tool Use
Highest intelligence tasksComplex reasoningCritical decisions
$18.75
Input $/M tokens
$93.75
Output $/M tokens
9.58
Weighted Score
2
GPT-5.2 (xhigh)
OpenAI • 400K context
9.6
Coding
9.5
Reasoning
9.5
Tool Use
CodingAgentsComplex tasks
$1.75
Input $/M tokens
$14
Output $/M tokens
9.54
Weighted Score
3
Claude Opus 4.5
Anthropic • 200K context
9.4
Coding
9.5
Reasoning
9.2
Tool Use
Complex reasoningLong-form analysisHigh-stakes drafting
$15
Input $/M tokens
$75
Output $/M tokens
9.38
Weighted Score
4
GLM-5
Zhipu AI • 128K context
9
Coding
9.1
Reasoning
8.8
Tool Use
Chinese + English workflowsValue-focused deploymentEnterprise usage
$0.75
Input $/M tokens
$3
Output $/M tokens
8.98
Weighted Score
5
Gemini 3 Pro
Google • 1M context
9.3
Coding
9.3
Reasoning
9.1
Tool Use
Multimodal understandingAgentic tasksVibe coding
$2
Input $/M tokens
$12
Output $/M tokens
9.25
Weighted Score
6
Gemini 2.5 Pro
Google • 1M context
9.2
Coding
9.2
Reasoning
9
Tool Use
Large context tasksMultimodal workflowsResearch synthesis
$1.25
Input $/M tokens
$10
Output $/M tokens
9.15
Weighted Score
7
Claude Sonnet 4
Anthropic • 200K context
9.1
Coding
9
Reasoning
8.9
Tool Use
Balanced performanceProduction workloadsGeneral-purpose tasks
$3
Input $/M tokens
$15
Output $/M tokens
9.02
Weighted Score
8
DeepSeek-R1
DeepSeek • 128K context
8.9
Coding
9.1
Reasoning
8.6
Tool Use
Budget-conscious reasoningMath-heavy tasksCost-sensitive coding
$0.55
Input $/M tokens
$2.19
Output $/M tokens
8.89
Weighted Score
9
DeepSeek-V3
DeepSeek • 128K context
8.8
Coding
8.9
Reasoning
8.5
Tool Use
Cost-effective codingGeneral tasksHigh-volume usage
$0.27
Input $/M tokens
$1.1
Output $/M tokens
8.76
Weighted Score
10
Grok 4.1
xAI • 128K context
8.8
Coding
8.8
Reasoning
8.6
Tool Use
Real-time informationWitty responsesCurrent events
$2
Input $/M tokens
$10
Output $/M tokens
8.75
Weighted Score
11
Grok 4.1 Fast
xAI • 2M context
8.5
Coding
8.5
Reasoning
8.3
Tool Use
Fast responsesLarge contextReal-time data
$1
Input $/M tokens
$5
Output $/M tokens
8.45
Weighted Score
12
Llama 4 Scout
Meta • 10M context
8.6
Coding
8.7
Reasoning
8.4
Tool Use
Extremely long contextDocument processingResearch
$0.1
Input $/M tokens
$0.3
Output $/M tokens
8.58
Weighted Score
13
Llama 4 Maverick
Meta • 1M context
8.7
Coding
8.6
Reasoning
8.3
Tool Use
Balanced open-sourceGeneral tasksSelf-hosting
$0.15
Input $/M tokens
$0.5
Output $/M tokens
8.57
Weighted Score
14
Qwen 2.5 Max
Alibaba • 128K context
8.8
Coding
8.8
Reasoning
8.5
Tool Use
Chinese languageMath reasoningCoding
$0.5
Input $/M tokens
$2
Output $/M tokens
8.73
Weighted Score
15
Mistral Large 2
Mistral AI • 128K context
8.7
Coding
8.7
Reasoning
8.4
Tool Use
European complianceMultilingual tasksEnterprise
$2
Input $/M tokens
$6
Output $/M tokens
8.63
Weighted Score
16
Claude Sonnet 3.7
Anthropic • 200K context
8.9
Coding
8.9
Reasoning
8.8
Tool Use
Extended thinkingComplex analysisCoding assistance
$3
Input $/M tokens
$15
Output $/M tokens
8.88
Weighted Score
17
GPT-5 mini
OpenAI • 128K context
8.5
Coding
8.4
Reasoning
8.3
Tool Use
Fast tasksHigh-volume usageCost optimization
$0.25
Input $/M tokens
$2
Output $/M tokens
8.41
Weighted Score
18
Gemini 2.5 Flash
Google • 1M context
8.7
Coding
8.7
Reasoning
8.5
Tool Use
Fast processingLow-latency tasksHigh throughput
$0.3
Input $/M tokens
$2.5
Output $/M tokens
8.65
Weighted Score
19
Gemini 2.5 Flash-Lite
Google • 1M context
8.2
Coding
8.1
Reasoning
7.9
Tool Use
Cost-sensitive tasksHigh-volume processingSimple queries
$0.1
Input $/M tokens
$0.4
Output $/M tokens
8.09
Weighted Score
20
Claude Haiku 3.5
Anthropic • 200K context
8.3
Coding
8.2
Reasoning
8.1
Tool Use
Fast responsesSimple tasksCost-conscious usage
$0.8
Input $/M tokens
$4
Output $/M tokens
8.21
Weighted Score
21
Nova Pro
Amazon • 300K context
8.5
Coding
8.5
Reasoning
8.3
Tool Use
AWS integrationEnterprise workloadsMultimodal
$0.8
Input $/M tokens
$3.2
Output $/M tokens
8.45
Weighted Score
22
Nova Micro
Amazon • 128K context
7.8
Coding
7.7
Reasoning
7.5
Tool Use
Lowest costSimple tasksHigh volume
$0.035
Input $/M tokens
$0.14
Output $/M tokens
7.69
Weighted Score
23
Qwen 2.5 72B
Alibaba • 128K context
8.6
Coding
8.5
Reasoning
8.2
Tool Use
Open-source alternativeSelf-hostingCustom fine-tuning
$0.35
Input $/M tokens
$1.4
Output $/M tokens
8.46
Weighted Score
24
Mistral Small 3
Mistral AI • 128K context
8.2
Coding
8.1
Reasoning
7.9
Tool Use
Fast processingCost-effectiveSimple tasks
$0.2
Input $/M tokens
$0.6
Output $/M tokens
8.09
Weighted Score
25
Cohere Command R+
Cohere • 128K context
8.3
Coding
8.4
Reasoning
8.6
Tool Use
RAG applicationsEnterprise searchTool use
$2.5
Input $/M tokens
$10
Output $/M tokens
8.41
Weighted Score
26
Reka Core
Reka • 128K context
8.4
Coding
8.5
Reasoning
8.2
Tool Use
Multimodal tasksVideo understandingLong context
$1
Input $/M tokens
$4
Output $/M tokens
8.38
Weighted Score