Grok-3 logo

Grok-3

xAI

Grok 3, launched by xAI in February 2025, represents a decisive leap in AI reasoning. Trained with 10× more compute than its predecessor on the Colossus supercluster (≈200,000 GPUs), Grok 3 excels at tasks that demand deep logical processing, long-context memory, and reflective thinking. The flagship model, Grok 3 (Think), leverages large-scale reinforcement learning to perform extended chain-of-thought reasoning, yielding standout results: 93.3% on AIME’25, 84.6% on GPQA, and 79.4% on LiveCodeBench. These scores place it ahead of top-tier competitors like Gemini 2.0, DeepSeek-R1, and OpenAI’s o3 mini in mathematical and scientific reasoning. Meanwhile, its Elo score of 1402 in Chatbot Arena—a human preference benchmark—confirms that users find Grok 3’s responses not just correct, but helpful and trustworthy. What makes Grok 3 particularly compelling is its combination of transparency, agency, and scale. With a 1 million token context window and models that “think” for seconds to minutes, Grok 3 not only retrieves and integrates vast amounts of information but also shows its work—users can inspect the full reasoning trace. The smaller Grok 3 mini also delivers: 95.8% on AIME’24 and 80.4% on LCB, setting a new bar for cost-efficient STEM reasoning. Early agent features like DeepSearch, which performs real-time fact synthesis across the web, signal xAI’s ambition to build autonomous, truth-seeking systems. In a field dominated by black-box generalists, Grok 3 asserts itself as a transparent reasoning engine—one that performs, explains, and improves with every step.

Model Specifications

Technical details and capabilities of Grok-3

Core Specifications

128.0K / 8.0K

Input / Output tokens

January 31, 2025

Knowledge cutoff date

February 16, 2025

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
Yes
License
Proprietary

Resources

API Reference
https://x.ai/api

Performance Insights

Check out how Grok-3 handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
93
AIME 2024
93
(93%)
93
AIME 2025
93
(93%)
84.6
GPQA
84.6
(85%)
79
LiveCodeBench
79
(79%)
78
MMMU
78
(78%)
43.6
SimpleQA
43.6
(44%)
AIME 2024
AIME 2025
GPQA
LiveCodeBench
MMMU
SimpleQA

Detailed Benchmarks

Dive deeper into Grok-3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

Current model
Other models
Avg (80.8%)

AIME 2025

Current model
Other models
Avg (79.3%)

Coding

LiveCodeBench

Current model
Other models
Avg (68.8%)

Knowledge

GPQA

Current model
Other models
Avg (76.4%)

Non categorized

MMMU

Current model
Other models
Avg (75.4%)

SimpleQA

Current model
Other models
Avg (40.4%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Grok-3. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models