LLM ranking

Compare the performance of leading large language models across key benchmarks

Showing by:GPQA
Table columns:
GPQA
MMLU
Math
HumanEval
#
Model
Release Date
GPQA
MMLU
Math
HumanEval
Actions
o3
3 months ago
87.7%
Claude 3.7 Sonnet
1 month ago
84.8%
Grok-3
1 month ago
84.6%
4
Grok-3 Mini
1 month ago
84.6%
5
Gemini Pro 2.5 Experimental NEW
5 days ago
84.0%
6
o3-mini
1 month ago
79.7%
86.9%
97.9%
7
o1-pro
3 months ago
79.0%
8
o1
3 months ago
78.0%
91.8%
96.4%
88.1%
9
Gemini 2.0 Flash Thinking
2 months ago
74.2%
10
o1-preview
6 months ago
73.3%
90.8%
85.5%
11
DeepSeek-R1
2 months ago
71.5%
90.8%
12
GPT-4.5
1 month ago
71.4%
13
Claude 3.5 Sonnet
5 months ago
67.2%
90.4%
78.3%
93.7%
14
QwQ-32B-Preview
4 months ago
65.2%
15
Gemini 2.0 Flash
3 months ago
62.1%
89.7%
16
o1-mini
6 months ago
60.0%
85.2%
92.4%
17
DeepSeek-V3
3 months ago
59.1%
88.5%
61.6%
18
Gemini 1.5 Pro
11 months ago
59.1%
85.9%
86.5%
84.1%
19
Phi-4
3 months ago
56.1%
84.8%
80.4%
82.6%
20
Grok-2
7 months ago
56.0%
87.5%
76.1%
88.4%
21
GPT-4o
7 months ago
53.6%
88.0%
76.6%
90.2%
22
Gemini 1.5 Flash
11 months ago
51.0%
78.9%
77.9%
74.3%
23
Grok-2 mini
7 months ago
51.0%
86.2%
73.0%
85.7%
24
Llama 3.1 405B Instruct
8 months ago
50.7%
87.3%
73.8%
89.0%
25
Llama 3.3 70B Instruct
3 months ago
50.5%
86.0%
77.0%
88.4%
26
Claude 3 Opus
1 years ago
50.4%
86.8%
60.1%
84.9%
27
Qwen2.5 32B Instruct
6 months ago
49.5%
83.3%
83.1%
88.4%
28
Qwen2.5 72B Instruct
6 months ago
49.0%
83.1%
86.6%
29
GPT-4 Turbo
11 months ago
48.0%
86.5%
72.6%
87.1%
30
Nova Pro
4 months ago
46.9%
85.9%
76.6%
89.0%
31
Llama 3.2 90B Instruct
6 months ago
46.7%
86.0%
68.0%
32
Qwen2.5 14B Instruct
6 months ago
45.5%
79.7%
80.0%
83.5%
33
Mistral Small 3
1 month ago
45.3%
70.6%
84.8%
34
Qwen2 72B Instruct
8 months ago
42.4%
82.3%
59.7%
86.0%
35
Nova Lite
4 months ago
42.0%
80.5%
73.3%
85.4%
36
Llama 3.1 70B Instruct
8 months ago
41.7%
83.6%
80.5%
37
Claude 3.5 Haiku
5 months ago
41.6%
69.4%
88.1%
38
Claude 3 Sonnet
1 years ago
40.4%
79.0%
43.1%
73.0%
39
GPT-4o mini
8 months ago
40.2%
82.0%
70.2%
87.2%
40
Nova Micro
4 months ago
40.0%
77.6%
69.3%
81.1%
41
Gemini 1.5 Flash 8B
1 years ago
38.4%
58.7%
42
Jamba 1.5 Large
7 months ago
36.9%
81.2%
43
Phi-3.5-MoE-instruct
7 months ago
36.8%
78.9%
59.5%
70.7%
44
Qwen2.5 7B Instruct
6 months ago
36.4%
75.5%
84.8%
45
Grok-1.5
1 years ago
35.9%
81.3%
50.6%
74.1%
46
GPT-4
1 years ago
35.7%
86.4%
42.0%
67.0%
47
Claude 3 Haiku
1 years ago
33.3%
75.2%
38.9%
75.9%
48
Llama 3.2 11B Instruct
6 months ago
32.8%
73.0%
51.9%
49
Llama 3.2 3B Instruct
6 months ago
32.8%
63.4%
48.0%
50
Jamba 1.5 Mini
7 months ago
32.3%
69.7%
51
GPT-3.5 Turbo
2 years ago
30.8%
69.8%
43.1%
68.0%
52
Llama 3.1 8B Instruct
8 months ago
30.4%
69.4%
72.6%
53
Phi-3.5-mini-instruct
7 months ago
30.4%
69.0%
48.5%
62.8%
54
Gemini 1.0 Pro
1 years ago
27.9%
71.8%
32.6%
55
Qwen2 7B Instruct
8 months ago
25.3%
70.5%
49.6%
79.9%
56
Claude 3.5 Sonnet
9 months ago
57
Codestral-22B
10 months ago
81.1%
58
Command A NEW
2 weeks ago
84.0%
78.0%
59
Command R+
7 months ago
75.7%
60
DeepSeek-V2.5
10 months ago
80.4%
74.7%
89.0%
61
Gemma 2 27B
9 months ago
75.2%
42.3%
51.8%
62
Gemma 2 9B
9 months ago
71.3%
36.6%
40.2%
63
Gemma 3 27B NEW
2 weeks ago
76.9%
89.0%
87.8%
64
GPT-4o
10 months ago
65
Grok-1.5V
11 months ago
66
Jamba 1.6 Large NEW
2 weeks ago
67
Jamba 1.6 Mini NEW
2 weeks ago
68
Kimi-k1.5
2 months ago
87.4%
69
Llama 3.1 Nemotron 70B Instruct
6 months ago
80.2%
70
Ministral 8B Instruct
5 months ago
65.0%
54.5%
34.8%
71
Mistral Large 2
8 months ago
84.0%
92.0%
72
Mistral NeMo Instruct
8 months ago
68.0%
73
Mistral Small
6 months ago
74
Mistral Small 3.1 24B NEW
1 week ago
80.6%
69.3%
88.4%
75
Olmo 2 32B NEW
2 weeks ago
74.9%
49.7%
76
Phi-3.5-vision-instruct
7 months ago
77
Pixtral Large
4 months ago
78
Pixtral-12B
6 months ago
69.2%
48.1%
72.0%
79
QvQ-72B-Preview
3 months ago
80
Qwen2-VL-72B-Instruct
7 months ago
81
Qwen2.5-Coder 32B Instruct
6 months ago
75.1%
57.2%
92.7%
82
Qwen2.5-Coder 7B Instruct
6 months ago
67.6%
46.6%
88.4%
83
QwQ 32B NEW
3 weeks ago
Showing 83 of 83 models

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models