Document Arena

View overall rankings across AI models in document analysis and long-content reasoning.

Jul 1, 2026
288,779 votes
32 models
Rank Spread
1
15
Anthropic
Anthropic · Proprietary
1505±7
23,132$5 / $251M
2
15
Anthropic
Anthropic · Proprietary
1505±6
35,514$5 / $251M
3
16
Anthropic
Anthropic · Proprietary
1500±7
17,103$5 / $251M
4
18
Anthropic
Anthropic · Proprietary
1500±7
16,879$5 / $251M
5
110
Anthropic
Anthropic · Proprietary
1497±12
2,545$10 / $501M
6
411
Anthropic
Anthropic · Proprietary
1488±6
52,833$3 / $151M
7
412
OpenAI · Proprietary
1487±7
14,781$5 / $301.1M
8
513
OpenAI · Proprietary
1481±7
15,246$5 / $301.1M
9
314
Google · Proprietary
1481±16
1,333$1.50 / $91M
10
614
Anthropic
Anthropic · Proprietary
1477±8
6,376$5 / $251M
11
516
Anthropic
Anthropic · Proprietary
1476±17
1,267$2 / $101M
12
714
OpenAI · Proprietary
1475±6
27,648$2.50 / $151.1M
13
816
Anthropic
Anthropic · Proprietary
1468±9
6,117$5 / $251M
14
918
Anthropic
Anthropic · Proprietary
1461±10
7,987$5 / $25200K
15
1218
Moonshot · Modified MIT
1454±8
10,427$0.95 / $4262.1K
16
1421
Anthropic
Anthropic · Proprietary
1447±6
26,859$3 / $15200K
17
1226
Meta
Meta · Proprietary
1444±18
1,089N/AN/A
18
1622
Google · Proprietary
1441±5
42,482$2 / $121M
19
1425
Alibaba · Proprietary
1439±13
2,051$0.32 / $1.281M
20
1624
MiniMax · MiniMax Community License
1435±8
6,367$0.60 / $2.40N/A
21
1626
Google · Proprietary
1433±9
10,752$2 / $121M
22
1727
Moonshot · Modified MIT
1429±7
18,646$0.60 / $3N/A
23
1828
Google · Apache 2.0
1423±8
9,512N/AN/A
24
1828
Google · Proprietary
1421±6
25,094$1.25 / $101M
25
1928
Anthropic
Anthropic · Proprietary
1421±6
29,105$1 / $5200K
26
2231
1416±7
16,701$2 / $62M
27
2032
Z.ai · Proprietary
1416±11
3,334$1.20 / $4202.8K
28
2332
Google · Proprietary
1413±9
7,181$0.50 / $31M
29
2632
OpenAI · Proprietary
1405±9
7,091$1.75 / $14400K
30
2632
OpenAI · Proprietary
1403±8
8,536$5 / $301.1M
31
2732
OpenAI · Proprietary
1401±6
28,283$1.75 / $14400K
32
2632
OpenAI · Proprietary
1401±9
8,247$1.25 / $10400K

Remove Style Control Leaderboard Plots

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)