Grok-2

xAI

Grok-2 is a cutting-edge language model that boasts state-of-the-art reasoning skills, making it highly proficient in chat, coding, and logical problem-solving. Its advanced architecture allows it to achieve exceptional results in areas like visual math and extracting answers from documents. Grok-2 also shines in a wide range of scholarly evaluations, showcasing its aptitude for reasoning, reading comprehension, mathematics, and scientific principles.

Model Specifications

Technical details and capabilities of Grok-2

Core Specifications

128.0K / 8.0K

Input / Output tokens

July 31, 2024

Knowledge cutoff date

August 12, 2024

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

Yes

License

Proprietary

Resources

API Reference

https://x.ai/api

Performance Insights

Check out how Grok-2 handles various AI tasks through comprehensive benchmark results.

100

93.6

DocVQA

93.6

(94%)

88.4

HumanEval

88.4

(88%)

87.5

MMLU

87.5

(88%)

76.1

MATH

76.1

(76%)

75.5

MMLU-Pro

75.5

(76%)

MathVista

(69%)

66.1

MMMU

66.1

(66%)

56.0

GPQA

56.0

(56%)

DocVQA

HumanEval

MMLU

MATH

MMLU-Pro

MathVista

MMMU

GPQA

Model Comparison

See how Grok-2 stacks up against other leading models across key performance metrics.

100

56.0

GPQA - Grok-2

56.0

(56%)

53.6

GPQA - GPT-4o

53.6

(54%)

50.7

GPQA - Llama 3.1 405B Instruct

50.7

(51%)

GPQA - Grok-2 mini

(51%)

59.4

GPQA - Claude 3.5 Sonnet

59.4

(59%)

50.5

GPQA - Llama 3.3 70B Instruct

50.5

(51%)

87.5

MMLU - Grok-2

87.5

(88%)

88.7

MMLU - GPT-4o

88.7

(89%)

87.3

MMLU - Llama 3.1 405B Instruct

87.3

(87%)

86.2

MMLU - Grok-2 mini

86.2

(86%)

90.4

MMLU - Claude 3.5 Sonnet

90.4

(90%)

MMLU - Llama 3.3 70B Instruct

(86%)

75.5

MMLU-Pro - Grok-2

75.5

(76%)

72.6

MMLU-Pro - GPT-4o

72.6

(73%)

73.3

MMLU-Pro - Llama 3.1 405B Instruct

73.3

(73%)

MMLU-Pro - Grok-2 mini

(72%)

76.1

MMLU-Pro - Claude 3.5 Sonnet

76.1

(76%)

68.9

MMLU-Pro - Llama 3.3 70B Instruct

68.9

(69%)

76.1

MATH - Grok-2

76.1

(76%)

76.6

MATH - GPT-4o

76.6

(77%)

73.8

MATH - Llama 3.1 405B Instruct

73.8

(74%)

MATH - Grok-2 mini

(73%)

71.1

MATH - Claude 3.5 Sonnet

71.1

(71%)

MATH - Llama 3.3 70B Instruct

(77%)

88.4

HumanEval - Grok-2

88.4

(88%)

90.2

HumanEval - GPT-4o

90.2

(90%)

HumanEval - Llama 3.1 405B Instruct

(89%)

85.7

HumanEval - Grok-2 mini

85.7

(86%)

HumanEval - Claude 3.5 Sonnet

(92%)

88.4

HumanEval - Llama 3.3 70B Instruct

88.4

(88%)

GPQA

MMLU

MMLU-Pro

MATH

HumanEval

Grok-2

GPT-4o

Llama 3.1 405B Instruct

Grok-2 mini

Claude 3.5 Sonnet

Llama 3.3 70B Instruct

Detailed Benchmarks

Dive deeper into Grok-2's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

Claude 3.5 Sonnet

93.7%

Mistral Small 3.1 24B

88.4%

Llama 3.3 70B Instruct

88.4%

Qwen2.5 32B Instruct

88.4%

Qwen2.5-Coder 7B Instruct

88.4%

Grok-2

88.4%

Claude 3.5 Haiku

88.1%

Gemma 3 27B

87.8%

Ministral 8B Instruct

34.8%

Current model

Other models

Avg (83.5%)

Knowledge

GPQA

87.7%

59.4%

59.1%

59.1%

56.1%

56.0%

53.6%

53.6%

51.0%

25.3%

Current model

Other models

Avg (56.1%)

MMLU

91.8%

90.4%

88.7%

88.5%

88.0%

87.5%

87.4%

Llama 3.1 405B Instruct

87.3%

o3-mini

86.9%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (86.0%)

MATH

o3-mini

97.9%

Gemini 1.5 Flash

77.9%

Llama 3.3 70B Instruct

77.0%

76.6%

76.6%

76.1%

75.5%

74.7%

Llama 3.1 405B Instruct

73.8%

Gemini 1.0 Pro

32.6%

Current model

Other models

Avg (73.9%)

Non categorized

MMLU-Pro

84.0%

76.4%

76.1%

75.9%

75.8%

75.5%

74.7%

Llama 3.1 405B Instruct

73.3%

GPT-4o

72.6%

Qwen2.5-Coder 7B Instruct

40.1%

Current model

Other models

Avg (72.4%)

MMMU

Gemini Pro 2.5 Experimental

81.7%

70.3%

70.0%

69.1%

68.3%

66.1%

65.9%

64.0%

63.2%

0.0%

Current model

Other models

Avg (61.9%)

MathVista

Kimi-k1.5

74.9%

QvQ-72B-Preview

71.4%

71.0%

Pixtral Large

69.4%

Grok-2

69.0%

Mistral Small 3.1 24B

68.9%

Gemini 1.5 Pro

68.1%

Grok-2 mini

68.1%

GPT-3.5 Turbo

0.0%

Current model

Other models

Avg (62.3%)

DocVQA

Claude 3.5 Sonnet

95.2%

Mistral Small 3.1 24B

94.1%

93.6%

93.3%

93.2%

92.8%

90.7%

Llama 3.2 90B Instruct

90.1%

Grok-1.5V

85.6%

Current model

Other models

Avg (92.1%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Grok-2. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Grok-2

Model Specifications

Core Specifications

Capabilities & License

Resources

Performance Insights

Model Comparison

Detailed Benchmarks

Coding

HumanEval

Knowledge

GPQA

MMLU

MATH

Non categorized

MMLU-Pro

MMMU

MathVista

DocVQA

Providers Pricing Coming Soon

Share your feedback

Stay Ahead with AI Updates