Home Models Compare Scorecards Evals Methodology FAQ

AI Model Comparison

Compare leading AI models side-by-side with provider-verified model names and pricing references.

26 models compared Verified 2026-02-16

Quick Reference: Pricing & Context

Input/output pricing per million tokens from provider docs. Notes are included where pricing tiers apply.

Model Pricing ($/M tokens) Context Best For
Claude Opus 4.6 (Adaptive)
Anthropic
$18.75 / $93.75
With prompt caching discounts available.
200K
Highest intelligence tasksComplex reasoningCritical decisions
GPT-5.2 (xhigh)
OpenAI
$1.75 / $14
400K
CodingAgentsComplex tasks
Claude Opus 4.5
Anthropic
$15 / $75
200K
Complex reasoningLong-form analysisHigh-stakes drafting
GLM-5
Zhipu AI
$0.75 / $3
128K
Chinese + English workflowsValue-focused deploymentEnterprise usage
Gemini 3 Pro
Google
$2 / $12
Paid tier pricing.
1M
Multimodal understandingAgentic tasksVibe coding
Gemini 2.5 Pro
Google
$1.25 / $10
For prompts up to 200K tokens.
1M
Large context tasksMultimodal workflowsResearch synthesis
Claude Sonnet 4
Anthropic
$3 / $15
200K
Balanced performanceProduction workloadsGeneral-purpose tasks
DeepSeek-R1
DeepSeek
$0.55 / $2.19
128K
Budget-conscious reasoningMath-heavy tasksCost-sensitive coding
DeepSeek-V3
DeepSeek
$0.27 / $1.1
128K
Cost-effective codingGeneral tasksHigh-volume usage
Grok 4.1
xAI
$2 / $10
Via xAI API.
128K
Real-time informationWitty responsesCurrent events
Grok 4.1 Fast
xAI
$1 / $5
Via xAI API.
2M
Fast responsesLarge contextReal-time data
Llama 4 Scout
Meta
$0.1 / $0.3
Via API providers, pricing varies.
10M
Extremely long contextDocument processingResearch
Llama 4 Maverick
Meta
$0.15 / $0.5
Via API providers, pricing varies.
1M
Balanced open-sourceGeneral tasksSelf-hosting
Qwen 2.5 Max
Alibaba
$0.5 / $2
Via Alibaba Cloud.
128K
Chinese languageMath reasoningCoding
Mistral Large 2
Mistral AI
$2 / $6
128K
European complianceMultilingual tasksEnterprise
Claude Sonnet 3.7
Anthropic
$3 / $15
200K
Extended thinkingComplex analysisCoding assistance
GPT-5 mini
OpenAI
$0.25 / $2
128K
Fast tasksHigh-volume usageCost optimization
Gemini 2.5 Flash
Google
$0.3 / $2.5
1M
Fast processingLow-latency tasksHigh throughput
Gemini 2.5 Flash-Lite
Google
$0.1 / $0.4
1M
Cost-sensitive tasksHigh-volume processingSimple queries
Claude Haiku 3.5
Anthropic
$0.8 / $4
200K
Fast responsesSimple tasksCost-conscious usage
Nova Pro
Amazon
$0.8 / $3.2
Via AWS Bedrock.
300K
AWS integrationEnterprise workloadsMultimodal
Nova Micro
Amazon
$0.035 / $0.14
Via AWS Bedrock.
128K
Lowest costSimple tasksHigh volume
Qwen 2.5 72B
Alibaba
$0.35 / $1.4
Via API providers.
128K
Open-source alternativeSelf-hostingCustom fine-tuning
Mistral Small 3
Mistral AI
$0.2 / $0.6
128K
Fast processingCost-effectiveSimple tasks
Cohere Command R+
Cohere
$2.5 / $10
128K
RAG applicationsEnterprise searchTool use
Reka Core
Reka
$1 / $4
Via API providers.
128K
Multimodal tasksVideo understandingLong context

Performance Scores

Internal task-level evaluations across coding, reasoning, and tool-use (scale: 1-10).

Model Coding Reasoning Tool-use Key Strengths
Claude Opus 4.6 (Adaptive)
Anthropic
9.6
9.7
9.4
  • Top-tier intelligence
  • Adaptive thinking
  • Best for complex reasoning
GPT-5.2 (xhigh)
OpenAI
9.6
9.5
9.5
  • Best coding performance
  • Strong agentic capabilities
  • Excellent tool integration
Claude Opus 4.5
Anthropic
9.4
9.5
9.2
  • Excellent reasoning depth
  • Strong instruction fidelity
  • Consistent long-context behavior
GLM-5
Zhipu AI
9
9.1
8.8
  • Top-tier intelligence
  • Low API cost
  • Strong bilingual support
Gemini 3 Pro
Google
9.3
9.3
9.1
  • Best multimodal model
  • Superior search integration
  • Powerful agentic capabilities
Gemini 2.5 Pro
Google
9.2
9.2
9
  • Very large context
  • Strong multimodal support
  • Competitive long-context pricing
Claude Sonnet 4
Anthropic
9.1
9
8.9
  • Great value proposition
  • Consistent quality
  • Fast response times
DeepSeek-R1
DeepSeek
8.9
9.1
8.6
  • Strong price/performance
  • Reasoning-focused behavior
  • Low token cost
DeepSeek-V3
DeepSeek
8.8
8.9
8.5
  • Excellent value
  • Good coding performance
  • Low latency
Grok 4.1
xAI
8.8
8.8
8.6
  • Real-time web access
  • Unique personality
  • Good reasoning
Grok 4.1 Fast
xAI
8.5
8.5
8.3
  • Very fast
  • Large context window
  • Real-time capabilities
Llama 4 Scout
Meta
8.6
8.7
8.4
  • Largest context window (10M)
  • Open source
  • Good for long documents
Llama 4 Maverick
Meta
8.7
8.6
8.3
  • Good performance
  • Open weights
  • Flexible deployment
Qwen 2.5 Max
Alibaba
8.8
8.8
8.5
  • Strong math skills
  • Good Chinese support
  • Competitive pricing
Mistral Large 2
Mistral AI
8.7
8.7
8.4
  • Strong multilingual
  • GDPR compliant
  • Good coding
Claude Sonnet 3.7
Anthropic
8.9
8.9
8.8
  • Strong extended thinking
  • Good coding support
  • Reliable performance
GPT-5 mini
OpenAI
8.5
8.4
8.3
  • Very low cost
  • Fast response times
  • Good for simple tasks
Gemini 2.5 Flash
Google
8.7
8.7
8.5
  • Excellent speed
  • Large context window
  • Hybrid reasoning support
Gemini 2.5 Flash-Lite
Google
8.2
8.1
7.9
  • Lowest cost option
  • Fastest output speed (499 t/s)
  • Good for simple tasks
Claude Haiku 3.5
Anthropic
8.3
8.2
8.1
  • Fast and affordable
  • Good for simple tasks
  • Reliable quality
Nova Pro
Amazon
8.5
8.5
8.3
  • AWS ecosystem
  • Good multimodal
  • Enterprise features
Nova Micro
Amazon
7.8
7.7
7.5
  • Very low cost
  • Fast inference
  • AWS integration
Qwen 2.5 72B
Alibaba
8.6
8.5
8.2
  • Open weights
  • Good performance
  • Flexible deployment
Mistral Small 3
Mistral AI
8.2
8.1
7.9
  • Very affordable
  • Fast inference
  • Good for simple tasks
Cohere Command R+
Cohere
8.3
8.4
8.6
  • Excellent RAG
  • Strong tool use
  • Enterprise focused
Reka Core
Reka
8.4
8.5
8.2
  • Strong multimodal
  • Good video understanding
  • Competitive pricing

Quick Pick Recommendations

Not sure which model? Here's our picks by use case.

💻

Best for Coding

GPT-5.2 (xhigh)

Top coding performance with excellent agentic capabilities and tool integration.

Alternative: Claude Opus 4.6 for code review quality

💰

Best on a Budget

DeepSeek-R1

Very strong reasoning with one of the lowest published API prices in this set.

Alternative: Gemini 2.5 Flash-Lite for high-volume simple tasks

📚

Best for Long Context

Llama 4 Scout

Up to 10M context window - the largest available. Ideal for massive document processing.

Alternative: Gemini 2.5 Pro for 1M context with better reasoning

🧠

Best for Reasoning

Claude Opus 4.6 (Adaptive)

Highest intelligence model with exceptional reasoning depth and adaptive thinking.

Alternative: Claude Opus 4.5 for consistent long-context behavior

🏆

Best All-Rounder

Claude Opus 4.6 (Adaptive)

Top-tier intelligence across coding, reasoning, and tool-use with adaptive thinking.

Alternative: GPT-5.2 (xhigh) for agentic tasks

🔧

Best for Tool-Use

GPT-5.2 (xhigh)

Reliable multi-step API and tooling behavior in operator-style tasks.

Alternative: Claude Sonnet 4 for balanced tool use at lower cost

Verification Sources

Official model and pricing references checked on 2026-02-16.

Model Official Sources
Claude Opus 4.6 (Adaptive)
Anthropic
GPT-5.2 (xhigh)
OpenAI
Claude Opus 4.5
Anthropic
GLM-5
Zhipu AI
Gemini 3 Pro
Google
Gemini 2.5 Pro
Google
Claude Sonnet 4
Anthropic
DeepSeek-R1
DeepSeek
DeepSeek-V3
DeepSeek
Grok 4.1
xAI
Grok 4.1 Fast
xAI
Llama 4 Scout
Meta
Llama 4 Maverick
Meta
Qwen 2.5 Max
Alibaba
Mistral Large 2
Mistral AI
Claude Sonnet 3.7
Anthropic
GPT-5 mini
OpenAI
Gemini 2.5 Flash
Google
Gemini 2.5 Flash-Lite
Google
Claude Haiku 3.5
Anthropic
Nova Pro
Amazon
Nova Micro
Amazon
Qwen 2.5 72B
Alibaba
Mistral Small 3
Mistral AI
Cohere Command R+
Cohere
Reka Core
Reka

Dive Deeper into Model Performance

See detailed daily scorecards with task-level breakdowns, failure cases, and cost analysis.

View Daily Scorecards Our Methodology