Home Models Compare Scorecards Evals Methodology FAQ

Daily AI Model Scorecards

Browse today's winners, task leaders, and cost/latency tradeoffs across models with verified naming.

Live Source log verified on Feb 16, 2026
AllCodingReasoningTool-useRAG / ResearchAgents

Eval Categories

Coding Scorecards

View all →

Bug fixes, diffs, refactors, and API patches tested across leading models.

Reasoning Scorecards

Decision quality under pressure, tradeoffs, and strategy calls.

Tool-use Scorecards

Docs-driven tasks, CLI accuracy, and security hygiene.

RAG / Research Scorecards

Retrieval accuracy, synthesis quality, and citation reliability.

Agent Reliability Scorecards

Multi-step task success, tool failures, and recovery behavior.

All Recent Scorecards

View all →