Grok-2 logo

Grok-2

xAI

Grok-2 is a cutting-edge language model that boasts state-of-the-art reasoning skills, making it highly proficient in chat, coding, and logical problem-solving. Its advanced architecture allows it to achieve exceptional results in areas like visual math and extracting answers from documents. Grok-2 also shines in a wide range of scholarly evaluations, showcasing its aptitude for reasoning, reading comprehension, mathematics, and scientific principles.

Model Specifications

Technical details and capabilities of Grok-2

Core Specifications

128.0K / 8.0K

Input / Output tokens

July 31, 2024

Knowledge cutoff date

August 12, 2024

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
Yes
License
Proprietary

Resources

API Reference
https://x.ai/api

Performance Insights

Check out how Grok-2 handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
93.6
DocVQA
93.6
(94%)
88.4
HumanEval
88.4
(88%)
87.5
MMLU
87.5
(88%)
76.1
MATH
76.1
(76%)
75.5
MMLU-Pro
75.5
(76%)
69
MathVista
69
(69%)
66.1
MMMU
66.1
(66%)
56.0
GPQA
56.0
(56%)
DocVQA
HumanEval
MMLU
MATH
MMLU-Pro
MathVista
MMMU
GPQA

Model Comparison

See how Grok-2 stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
56.0
GPQA - Grok-2
56.0
(56%)
53.6
GPQA - GPT-4o
53.6
(54%)
50.7
GPQA - Llama 3.1 405B Instruct
50.7
(51%)
51
GPQA - Grok-2 mini
51
(51%)
59.4
GPQA - Claude 3.5 Sonnet
59.4
(59%)
50.5
GPQA - Llama 3.3 70B Instruct
50.5
(51%)
87.5
MMLU - Grok-2
87.5
(88%)
88.7
MMLU - GPT-4o
88.7
(89%)
87.3
MMLU - Llama 3.1 405B Instruct
87.3
(87%)
86.2
MMLU - Grok-2 mini
86.2
(86%)
90.4
MMLU - Claude 3.5 Sonnet
90.4
(90%)
86
MMLU - Llama 3.3 70B Instruct
86
(86%)
75.5
MMLU-Pro - Grok-2
75.5
(76%)
72.6
MMLU-Pro - GPT-4o
72.6
(73%)
73.3
MMLU-Pro - Llama 3.1 405B Instruct
73.3
(73%)
72
MMLU-Pro - Grok-2 mini
72
(72%)
76.1
MMLU-Pro - Claude 3.5 Sonnet
76.1
(76%)
68.9
MMLU-Pro - Llama 3.3 70B Instruct
68.9
(69%)
76.1
MATH - Grok-2
76.1
(76%)
76.6
MATH - GPT-4o
76.6
(77%)
73.8
MATH - Llama 3.1 405B Instruct
73.8
(74%)
73
MATH - Grok-2 mini
73
(73%)
71.1
MATH - Claude 3.5 Sonnet
71.1
(71%)
77
MATH - Llama 3.3 70B Instruct
77
(77%)
88.4
HumanEval - Grok-2
88.4
(88%)
90.2
HumanEval - GPT-4o
90.2
(90%)
89
HumanEval - Llama 3.1 405B Instruct
89
(89%)
85.7
HumanEval - Grok-2 mini
85.7
(86%)
92
HumanEval - Claude 3.5 Sonnet
92
(92%)
88.4
HumanEval - Llama 3.3 70B Instruct
88.4
(88%)
GPQA
MMLU
MMLU-Pro
MATH
HumanEval
Grok-2
GPT-4o
Llama 3.1 405B Instruct
Grok-2 mini
Claude 3.5 Sonnet
Llama 3.3 70B Instruct

Detailed Benchmarks

Dive deeper into Grok-2's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Knowledge

GPQA

Current model
Other models
Avg (56.1%)

MMLU

Current model
Other models
Avg (86.0%)

MATH

Current model
Other models
Avg (73.9%)

Non categorized

MMLU-Pro

Current model
Other models
Avg (72.4%)

MMMU

Current model
Other models
Avg (61.9%)

MathVista

Current model
Other models
Avg (62.3%)

DocVQA

Current model
Other models
Avg (92.1%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Grok-2. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models