Grok-3

xAI

Grok 3, launched by xAI in February 2025, represents a decisive leap in AI reasoning. Trained with 10× more compute than its predecessor on the Colossus supercluster (≈200,000 GPUs), Grok 3 excels at tasks that demand deep logical processing, long-context memory, and reflective thinking. The flagship model, Grok 3 (Think), leverages large-scale reinforcement learning to perform extended chain-of-thought reasoning, yielding standout results: 93.3% on AIME’25, 84.6% on GPQA, and 79.4% on LiveCodeBench. These scores place it ahead of top-tier competitors like Gemini 2.0, DeepSeek-R1, and OpenAI’s o3 mini in mathematical and scientific reasoning. Meanwhile, its Elo score of 1402 in Chatbot Arena—a human preference benchmark—confirms that users find Grok 3’s responses not just correct, but helpful and trustworthy. What makes Grok 3 particularly compelling is its combination of transparency, agency, and scale. With a 1 million token context window and models that “think” for seconds to minutes, Grok 3 not only retrieves and integrates vast amounts of information but also shows its work—users can inspect the full reasoning trace. The smaller Grok 3 mini also delivers: 95.8% on AIME’24 and 80.4% on LCB, setting a new bar for cost-efficient STEM reasoning. Early agent features like DeepSearch, which performs real-time fact synthesis across the web, signal xAI’s ambition to build autonomous, truth-seeking systems. In a field dominated by black-box generalists, Grok 3 asserts itself as a transparent reasoning engine—one that performs, explains, and improves with every step.

Model Specifications

Technical details and capabilities of Grok-3

Core Specifications

128.0K / 8.0K

Input / Output tokens

January 31, 2025

Knowledge cutoff date

February 16, 2025

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

Yes

License

Proprietary

Resources

API Reference

https://x.ai/api

Performance Insights

Check out how Grok-3 handles various AI tasks through comprehensive benchmark results.

100

AIME 2024

(93%)

AIME 2025

(93%)

84.6

GPQA

84.6

(85%)

LiveCodeBench

(79%)

MMMU

(78%)

43.6

SimpleQA

43.6

(44%)

AIME 2024

AIME 2025

GPQA

LiveCodeBench

MMMU

SimpleQA

Detailed Benchmarks

Dive deeper into Grok-3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

96.7%

Grok-3 Mini

95.8%

Grok-3

93.0%

Gemini Pro 2.5 Experimental

92.0%

o3-mini

87.3%

o1-pro

86.0%

83.3%

Claude 3.7 Sonnet

80.0%

GPT-4o

13.4%

Current model

Other models

Avg (80.8%)

AIME 2025

Grok-3

93.0%

Grok-3 Mini

90.3%

Gemini Pro 2.5 Experimental

86.7%

o3-mini

86.5%

DeepSeek-R1

70.0%

Claude 3.7 Sonnet

49.5%

Current model

Other models

Avg (79.3%)

Coding

LiveCodeBench

Grok-3 Mini

80.0%

Grok-3

79.0%

o3-mini

74.1%

Gemini Pro 2.5 Experimental

70.4%

65.9%

63.4%

62.5%

55.5%

Current model

Other models

Avg (68.8%)

Knowledge

GPQA

87.7%

Claude 3.7 Sonnet

84.8%

Grok-3

84.6%

Grok-3 Mini

84.6%

Gemini Pro 2.5 Experimental

84.0%

o3-mini

79.7%

o1-pro

79.0%

78.0%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (76.4%)

Non categorized

MMMU

Gemini Pro 2.5 Experimental

81.7%

Grok-3

78.0%

77.3%

Gemini 2.0 Flash Thinking

75.4%

75.0%

74.4%

70.7%

70.3%

Current model

Other models

Avg (75.4%)

SimpleQA

GPT-4.5

62.5%

GPT-4o

61.8%

Gemini Pro 2.5 Experimental

52.9%

Grok-3

43.6%

42.6%

42.4%

30.1%

24.9%

3.0%

Current model

Other models

Avg (40.4%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Grok-3. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Grok-3

Model Specifications

Core Specifications

Capabilities & License

Resources

Performance Insights

Detailed Benchmarks

Math

AIME 2024

AIME 2025

Coding

LiveCodeBench

Knowledge

GPQA

Non categorized

MMMU

SimpleQA

Providers Pricing Coming Soon

Share your feedback

Stay Ahead with AI Updates