Quick Reference: Pricing & Context
Input/output pricing per million tokens from provider docs. Notes are included where pricing tiers apply.
| Model | Pricing ($/M tokens) | Context | Best For |
|---|---|---|---|
| Claude Opus 4.6 Anthropic | $5 / $25 | 200K | Complex reasoningCritical decisionsLong-form analysis |
| GPT-5.4 OpenAI | $2.5 / $15 | 1.05M | CodingAgentsTool integration |
| Gemini 3.1 Pro Google | $1.25 / $5 | 1M | Multimodal tasksLong contextSearch integration |
| Claude Sonnet 4.6 Anthropic | $3 / $15 | 200K | Balanced performanceProduction workloadsCost-efficient |
| GPT-5.3 Codex OpenAI | $3 / $15 | 200K | Coding-focused tasksType inferenceAgentic coding |
| GLM-5 Zhipu AI | $0.5 / $2 | 205K | Bilingual (CN/EN)Value-focusedEnterprise |
| Llama 4 (405B) Meta | $2 / $8 Varies by host (Together, Fireworks, etc.) | 128K | Self-hostedOpen sourceCustomizable |
| DeepSeek V3 DeepSeek | $0.27 / $1.1 | 128K | Budget codingHigh-volumeCost-sensitive |
| GPT-5.2 OpenAI | $1.75 / $14 | 128K | General-purposeBalanced tasks |
| Mistral Large 3 Mistral | $2 / $6 | 128K | European complianceMultilingualEnterprise |
| Kimi K2.5 Moonshot AI | $0.6 / $2.5 | 256K | Visual codingLong contextAgent workflows |
| MiniMax M2.5 MiniMax | $0.3 / $1.2 | 196K | Real-world productivityCost-sensitiveHigh-volume |
| Grok 4.1 Fast xAI | $0.2 / $0.5 | 2M | Long contextWeb searchX platform data |
| Qwen 3 Max Alibaba | $1.2 / $6 | 262K | MultilingualEnterpriseChinese language |
| GPT-OSS-120B OpenAI | $0 / $0 Free — open weights, self-hosted | 128K | Self-hostedPrivacyCustomization |
Performance Scores
Internal task-level evaluations across coding, reasoning, and tool-use (scale: 1-10).
| Model | Coding | Reasoning | Tool-use | Key Strengths |
|---|---|---|---|---|
| Claude Opus 4.6 Anthropic | 9.7 | 9.8 | 9.5 |
|
| GPT-5.4 OpenAI | 9.8 | 9.5 | 9.7 |
|
| Gemini 3.1 Pro Google | 9.5 | 9.5 | 9.3 |
|
| Claude Sonnet 4.6 Anthropic | 9.4 | 9.3 | 9.1 |
|
| GPT-5.3 Codex OpenAI | 9.7 | 9.3 | 9.4 |
|
| GLM-5 Zhipu AI | 9.2 | 9.3 | 9 |
|
| Llama 4 (405B) Meta | 9 | 9.1 | 8.7 |
|
| DeepSeek V3 DeepSeek | 8.8 | 8.9 | 8.5 |
|
| GPT-5.2 OpenAI | 9.3 | 9.2 | 9 |
|
| Mistral Large 3 Mistral | 8.9 | 9 | 8.6 |
|
| Kimi K2.5 Moonshot AI | 9.4 | 9.3 | 9.2 |
|
| MiniMax M2.5 MiniMax | 9.1 | 9.2 | 8.9 |
|
| Grok 4.1 Fast xAI | 9 | 9.1 | 8.8 |
|
| Qwen 3 Max Alibaba | 9.2 | 9.1 | 8.9 |
|
| GPT-OSS-120B OpenAI | 9.3 | 9.2 | 9 |
|
Quick Pick Recommendations
Not sure which model? Here's our picks by use case.
Best for Coding
Elite coding scores, strong multimodal workflows, and long-context support at a sane price.
Alternative: GPT-OSS-120B if you want strong local/self-hosted coding
Best on a Budget
Best price/performance ratio in the current frontier set for real-world productivity.
Alternative: Grok 4.1 Fast for ultra-cheap long-context workloads
Best for Long Context
2M context makes it the obvious pick for giant docs, repos, and agent memory workloads.
Alternative: Qwen 3 Max for multilingual enterprise context needs
Best for Reasoning
Strong reasoning plus practical tool-use makes it one of the most useful frontier picks right now.
Alternative: Qwen 3 Max for bilingual enterprise reasoning
Best All-Rounder
One of the best blends of coding, reasoning, tool-use, context, and pricing on the board.
Alternative: MiniMax M2.5 if cost matters more than absolute peak quality
Best for Local / Open Weights
Best open-weights option here if you care about privacy, self-hosting, and no per-token API bills.
Alternative: Llama 4 or Qwen3-Coder for lighter local deployments
Need a Faster Decision?
Most competitors win on discoverability because they guide the user to the next step immediately. This does that.
Choose an API model
Use this if you care about fastest setup, no infrastructure, and provider-managed uptime.
Browse ranked API models →Run models locally
Use open-weights models if privacy, customization, or long-term cost control matters most.
See local model guide →Understand open vs proprietary
Still not sure? Start with the tradeoffs around privacy, cost, customization, and performance.
Read the comparison →Verification Sources
Official model and pricing references checked on 2026-03-10.
| Model | Official Sources |
|---|---|
| Claude Opus 4.6 Anthropic | |
| GPT-5.4 OpenAI | |
| Gemini 3.1 Pro Google | |
| Claude Sonnet 4.6 Anthropic | |
| GPT-5.3 Codex OpenAI | |
| GLM-5 Zhipu AI | |
| Llama 4 (405B) Meta | |
| DeepSeek V3 DeepSeek | |
| GPT-5.2 OpenAI | |
| Mistral Large 3 Mistral | |
| Kimi K2.5 Moonshot AI | |
| MiniMax M2.5 MiniMax | |
| Grok 4.1 Fast xAI | |
| Qwen 3 Max Alibaba | |
| GPT-OSS-120B OpenAI |
Dive Deeper into Model Performance
See detailed daily scorecards with task-level breakdowns, failure cases, and cost analysis.
View Daily Scorecards → Our Methodology