
Grok-3
xAI
Grok 3, launched by xAI in February 2025, represents a decisive leap in AI reasoning. Trained with 10× more compute than its predecessor on the Colossus supercluster (≈200,000 GPUs), Grok 3 excels at tasks that demand deep logical processing, long-context memory, and reflective thinking. The flagship model, Grok 3 (Think), leverages large-scale reinforcement learning to perform extended chain-of-thought reasoning, yielding standout results: 93.3% on AIME’25, 84.6% on GPQA, and 79.4% on LiveCodeBench. These scores place it ahead of top-tier competitors like Gemini 2.0, DeepSeek-R1, and OpenAI’s o3 mini in mathematical and scientific reasoning. Meanwhile, its Elo score of 1402 in Chatbot Arena—a human preference benchmark—confirms that users find Grok 3’s responses not just correct, but helpful and trustworthy. What makes Grok 3 particularly compelling is its combination of transparency, agency, and scale. With a 1 million token context window and models that “think” for seconds to minutes, Grok 3 not only retrieves and integrates vast amounts of information but also shows its work—users can inspect the full reasoning trace. The smaller Grok 3 mini also delivers: 95.8% on AIME’24 and 80.4% on LCB, setting a new bar for cost-efficient STEM reasoning. Early agent features like DeepSearch, which performs real-time fact synthesis across the web, signal xAI’s ambition to build autonomous, truth-seeking systems. In a field dominated by black-box generalists, Grok 3 asserts itself as a transparent reasoning engine—one that performs, explains, and improves with every step.
Model Specifications
Technical details and capabilities of Grok-3
Performance Insights
Check out how Grok-3 handles various AI tasks through comprehensive benchmark results.
Detailed Benchmarks
Dive deeper into Grok-3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.
Math
AIME 2024
AIME 2025
Coding
LiveCodeBench
Knowledge
GPQA
Non categorized
MMMU
SimpleQA
Providers Pricing Coming Soon
We're working on gathering comprehensive pricing data from all major providers for Grok-3. Compare costs across platforms to find the best pricing for your use case.
Share your feedback
Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.
Your feedback helps us improve our service