Chatbot Arena ELO Ratings Explained
Human-preference leaderboards are useful, but the current rankings belong at the official source.
How ELO Ratings Work
Chatbot Arena uses the ELO rating system — the same method used in chess — to rank language models based on pairwise comparisons. Users chat with two anonymous models and vote for the better response.
- Large-scale public voting collected since launch
- Randomized battles prevent gaming
- Confidence intervals show uncertainty
- Open methodology — read the paper
ELO ratings measure human preference, not objective correctness. A model with higher ELO may still fail at specific tasks.
Current Leaderboard
For current model ranks, ELO scores, vote counts, and confidence intervals, use the official LM Arena leaderboard. This page explains how to interpret those numbers without republishing a static copy that can drift.
What These Numbers Mean
- ELO Score
- Relative strength. A 100-point gap means the higher-rated model wins ~64% of battles.
- 95% CI (Confidence Interval)
- Uncertainty range. A model with ELO 1500 ±5 is reliably better than one at 1490 ±10.
- Votes
- Sample size. More votes = narrower confidence interval = more reliable rating.
Limitations
- Human preference ≠ truth. Models may sound confident while being wrong.
- Bias toward verbose responses. Longer answers often score higher.
- No task-specific ratings. Coding, math, and reasoning aren't separated.
- New models need time. Fresh releases may have wider confidence intervals until enough battles accumulate.
For task-specific benchmarks, see our coding benchmark and reasoning benchmark scorecards.
Other Leaderboards
ELO measures human preference. For objective benchmarks, see:
- SWE-Bench — Real bug fixes
- HumanEval — Code generation
- Artificial Analysis — Performance + price
How to Cite
If you use these ratings in your work, cite the official source:
LMSYS Organization. "Chatbot Arena: An Open Platform for
Evaluating LLMs by Pairwise Comparison."
arXiv:2403.04132 (2024).
https://lmsys.org/blog/2023-05-03-arena/