Quick Reference: Pricing & Context
Input/output pricing per million tokens from provider docs. Notes are included where pricing tiers apply.
| Model | Pricing ($/M tokens) | Context | Best For |
|---|---|---|---|
| Claude Opus 4.6 (Adaptive) Anthropic | $18.75 / $93.75 With prompt caching discounts available. | 200K | Highest intelligence tasksComplex reasoningCritical decisions |
| GPT-5.2 (xhigh) OpenAI | $1.75 / $14 | 400K | CodingAgentsComplex tasks |
| Claude Opus 4.5 Anthropic | $15 / $75 | 200K | Complex reasoningLong-form analysisHigh-stakes drafting |
| GLM-5 Zhipu AI | $0.75 / $3 | 128K | Chinese + English workflowsValue-focused deploymentEnterprise usage |
| Gemini 3 Pro Google | $2 / $12 Paid tier pricing. | 1M | Multimodal understandingAgentic tasksVibe coding |
| Gemini 2.5 Pro Google | $1.25 / $10 For prompts up to 200K tokens. | 1M | Large context tasksMultimodal workflowsResearch synthesis |
| Claude Sonnet 4 Anthropic | $3 / $15 | 200K | Balanced performanceProduction workloadsGeneral-purpose tasks |
| DeepSeek-R1 DeepSeek | $0.55 / $2.19 | 128K | Budget-conscious reasoningMath-heavy tasksCost-sensitive coding |
| DeepSeek-V3 DeepSeek | $0.27 / $1.1 | 128K | Cost-effective codingGeneral tasksHigh-volume usage |
| Grok 4.1 xAI | $2 / $10 Via xAI API. | 128K | Real-time informationWitty responsesCurrent events |
| Grok 4.1 Fast xAI | $1 / $5 Via xAI API. | 2M | Fast responsesLarge contextReal-time data |
| Llama 4 Scout Meta | $0.1 / $0.3 Via API providers, pricing varies. | 10M | Extremely long contextDocument processingResearch |
| Llama 4 Maverick Meta | $0.15 / $0.5 Via API providers, pricing varies. | 1M | Balanced open-sourceGeneral tasksSelf-hosting |
| Qwen 2.5 Max Alibaba | $0.5 / $2 Via Alibaba Cloud. | 128K | Chinese languageMath reasoningCoding |
| Mistral Large 2 Mistral AI | $2 / $6 | 128K | European complianceMultilingual tasksEnterprise |
| Claude Sonnet 3.7 Anthropic | $3 / $15 | 200K | Extended thinkingComplex analysisCoding assistance |
| GPT-5 mini OpenAI | $0.25 / $2 | 128K | Fast tasksHigh-volume usageCost optimization |
| Gemini 2.5 Flash Google | $0.3 / $2.5 | 1M | Fast processingLow-latency tasksHigh throughput |
| Gemini 2.5 Flash-Lite Google | $0.1 / $0.4 | 1M | Cost-sensitive tasksHigh-volume processingSimple queries |
| Claude Haiku 3.5 Anthropic | $0.8 / $4 | 200K | Fast responsesSimple tasksCost-conscious usage |
| Nova Pro Amazon | $0.8 / $3.2 Via AWS Bedrock. | 300K | AWS integrationEnterprise workloadsMultimodal |
| Nova Micro Amazon | $0.035 / $0.14 Via AWS Bedrock. | 128K | Lowest costSimple tasksHigh volume |
| Qwen 2.5 72B Alibaba | $0.35 / $1.4 Via API providers. | 128K | Open-source alternativeSelf-hostingCustom fine-tuning |
| Mistral Small 3 Mistral AI | $0.2 / $0.6 | 128K | Fast processingCost-effectiveSimple tasks |
| Cohere Command R+ Cohere | $2.5 / $10 | 128K | RAG applicationsEnterprise searchTool use |
| Reka Core Reka | $1 / $4 Via API providers. | 128K | Multimodal tasksVideo understandingLong context |
Performance Scores
Internal task-level evaluations across coding, reasoning, and tool-use (scale: 1-10).
| Model | Coding | Reasoning | Tool-use | Key Strengths |
|---|---|---|---|---|
| Claude Opus 4.6 (Adaptive) Anthropic | 9.6 | 9.7 | 9.4 |
|
| GPT-5.2 (xhigh) OpenAI | 9.6 | 9.5 | 9.5 |
|
| Claude Opus 4.5 Anthropic | 9.4 | 9.5 | 9.2 |
|
| GLM-5 Zhipu AI | 9 | 9.1 | 8.8 |
|
| Gemini 3 Pro Google | 9.3 | 9.3 | 9.1 |
|
| Gemini 2.5 Pro Google | 9.2 | 9.2 | 9 |
|
| Claude Sonnet 4 Anthropic | 9.1 | 9 | 8.9 |
|
| DeepSeek-R1 DeepSeek | 8.9 | 9.1 | 8.6 |
|
| DeepSeek-V3 DeepSeek | 8.8 | 8.9 | 8.5 |
|
| Grok 4.1 xAI | 8.8 | 8.8 | 8.6 |
|
| Grok 4.1 Fast xAI | 8.5 | 8.5 | 8.3 |
|
| Llama 4 Scout Meta | 8.6 | 8.7 | 8.4 |
|
| Llama 4 Maverick Meta | 8.7 | 8.6 | 8.3 |
|
| Qwen 2.5 Max Alibaba | 8.8 | 8.8 | 8.5 |
|
| Mistral Large 2 Mistral AI | 8.7 | 8.7 | 8.4 |
|
| Claude Sonnet 3.7 Anthropic | 8.9 | 8.9 | 8.8 |
|
| GPT-5 mini OpenAI | 8.5 | 8.4 | 8.3 |
|
| Gemini 2.5 Flash Google | 8.7 | 8.7 | 8.5 |
|
| Gemini 2.5 Flash-Lite Google | 8.2 | 8.1 | 7.9 |
|
| Claude Haiku 3.5 Anthropic | 8.3 | 8.2 | 8.1 |
|
| Nova Pro Amazon | 8.5 | 8.5 | 8.3 |
|
| Nova Micro Amazon | 7.8 | 7.7 | 7.5 |
|
| Qwen 2.5 72B Alibaba | 8.6 | 8.5 | 8.2 |
|
| Mistral Small 3 Mistral AI | 8.2 | 8.1 | 7.9 |
|
| Cohere Command R+ Cohere | 8.3 | 8.4 | 8.6 |
|
| Reka Core Reka | 8.4 | 8.5 | 8.2 |
|
Quick Pick Recommendations
Not sure which model? Here's our picks by use case.
Best for Coding
Top coding performance with excellent agentic capabilities and tool integration.
Alternative: Claude Opus 4.6 for code review quality
Best on a Budget
Very strong reasoning with one of the lowest published API prices in this set.
Alternative: Gemini 2.5 Flash-Lite for high-volume simple tasks
Best for Long Context
Up to 10M context window - the largest available. Ideal for massive document processing.
Alternative: Gemini 2.5 Pro for 1M context with better reasoning
Best for Reasoning
Highest intelligence model with exceptional reasoning depth and adaptive thinking.
Alternative: Claude Opus 4.5 for consistent long-context behavior
Best All-Rounder
Top-tier intelligence across coding, reasoning, and tool-use with adaptive thinking.
Alternative: GPT-5.2 (xhigh) for agentic tasks
Best for Tool-Use
Reliable multi-step API and tooling behavior in operator-style tasks.
Alternative: Claude Sonnet 4 for balanced tool use at lower cost
Verification Sources
Official model and pricing references checked on 2026-02-16.
| Model | Official Sources |
|---|---|
| Claude Opus 4.6 (Adaptive) Anthropic | |
| GPT-5.2 (xhigh) OpenAI | |
| Claude Opus 4.5 Anthropic | |
| GLM-5 Zhipu AI | |
| Gemini 3 Pro Google | |
| Gemini 2.5 Pro Google | |
| Claude Sonnet 4 Anthropic | |
| DeepSeek-R1 DeepSeek | |
| DeepSeek-V3 DeepSeek | |
| Grok 4.1 xAI | |
| Grok 4.1 Fast xAI | |
| Llama 4 Scout Meta | |
| Llama 4 Maverick Meta | |
| Qwen 2.5 Max Alibaba | |
| Mistral Large 2 Mistral AI | |
| Claude Sonnet 3.7 Anthropic | |
| GPT-5 mini OpenAI | |
| Gemini 2.5 Flash Google | |
| Gemini 2.5 Flash-Lite Google | |
| Claude Haiku 3.5 Anthropic | |
| Nova Pro Amazon | |
| Nova Micro Amazon | |
| Qwen 2.5 72B Alibaba | |
| Mistral Small 3 Mistral AI | |
| Cohere Command R+ Cohere | |
| Reka Core Reka |
Dive Deeper into Model Performance
See detailed daily scorecards with task-level breakdowns, failure cases, and cost analysis.
View Daily Scorecards → Our Methodology