Grok-3 Mini

xAI

Grok 3 Mini, the lightweight sibling of xAI’s flagship model, proves that “small” doesn’t have to mean “shallow.” Designed with a leaner architecture and optimized for speed, Mini still demonstrates impressive reasoning capabilities—particularly when enhanced with test-time inference via the Think setting. On AIME 2024, a benchmark of symbolic reasoning under pressure, Grok 3 Mini (Think) achieves a striking 95.8%, outperforming larger rivals like Gemini 2.0 Flash Thinking (73.3%) and even Grok 3 Beta (93.3%) itself. It also leads in code generation with 80.4% on LiveCodeBench, edging out Grok 3 Beta (79.4%) and surpassing Claude 3.5 Sonnet (66.3%) and GPT-4o (32.3%) by wide margins. Its 84.0% on GPQA confirms strong performance on graduate-level, adversarial reasoning tasks. This positions Mini not as a fallback, but as a front-line option for STEM-heavy tasks where speed, cost, and accuracy matter—like on-device tutoring, fast coding agents, or high-throughput evaluation pipelines. Critically, Grok 3 Mini (Think) benefits from large-scale RL fine-tuning, allowing it to engage in multi-step reasoning with error correction and backtracking—something previously reserved for frontier models. While it lacks the full contextual breadth of Grok 3 (1M token context, advanced multimodal fluency), Mini still posts competitive scores on MMLU-Pro (78.9%) and MMMU (69.4%), reflecting robust general knowledge and image understanding. In short, Grok 3 Mini isn't a "cut-down" model—it's a recalibrated one. For researchers and builders focused on compute efficiency without compromising reasoning depth, Grok 3 Mini (Think) offers an unusually capable tradeoff.

Model Specifications

Technical details and capabilities of Grok-3 Mini

Core Specifications

128.0K / 8.0K

Input / Output tokens

January 31, 2025

Knowledge cutoff date

February 16, 2025

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

Yes

License

Proprietary

Resources

API Reference

https://x.ai/api

Performance Insights

Check out how Grok-3 Mini handles various AI tasks through comprehensive benchmark results.

100

95.8

AIME 2024

95.8

(96%)

90.3

AIME 2025

90.3

(90%)

84.6

GPQA

84.6

(85%)

LiveCodeBench

(80%)

AIME 2024

AIME 2025

GPQA

LiveCodeBench

Model Comparison

See how Grok-3 Mini stacks up against other leading models across key performance metrics.

100

84.6

GPQA - Grok-3 Mini

84.6

(85%)

84.6

GPQA - Grok-3

84.6

(85%)

GPQA - Gemini Pro 2.5 Experimental

(84%)

79.7

GPQA - o3-mini

79.7

(80%)

71.5

GPQA - DeepSeek-R1

71.5

(72%)

LiveCodeBench - Grok-3 Mini

(80%)

LiveCodeBench - Grok-3

(79%)

70.4

LiveCodeBench - Gemini Pro 2.5 Experimental

70.4

(70%)

74.1

LiveCodeBench - o3-mini

74.1

(74%)

65.9

LiveCodeBench - DeepSeek-R1

65.9

(66%)

95.8

AIME 2024 - Grok-3 Mini

95.8

(96%)

AIME 2024 - Grok-3

(93%)

AIME 2024 - Gemini Pro 2.5 Experimental

(92%)

87.3

AIME 2024 - o3-mini

87.3

(87%)

79.8

AIME 2024 - DeepSeek-R1

79.8

(80%)

90.3

AIME 2025 - Grok-3 Mini

90.3

(90%)

AIME 2025 - Grok-3

(93%)

86.7

AIME 2025 - Gemini Pro 2.5 Experimental

86.7

(87%)

86.5

AIME 2025 - o3-mini

86.5

(87%)

AIME 2025 - DeepSeek-R1

(70%)

GPQA

LiveCodeBench

AIME 2024

AIME 2025

Grok-3 Mini

Grok-3

Gemini Pro 2.5 Experimental

o3-mini

DeepSeek-R1

Detailed Benchmarks

Dive deeper into Grok-3 Mini's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

96.7%

Grok-3 Mini

95.8%

Grok-3

93.0%

Gemini Pro 2.5 Experimental

92.0%

o3-mini

87.3%

o1-pro

86.0%

83.3%

Claude 3.7 Sonnet

80.0%

Current model

Other models

Avg (89.3%)

AIME 2025

Grok-3

93.0%

Grok-3 Mini

90.3%

Gemini Pro 2.5 Experimental

86.7%

o3-mini

86.5%

DeepSeek-R1

70.0%

Claude 3.7 Sonnet

49.5%

Current model

Other models

Avg (79.3%)

Coding

LiveCodeBench

Grok-3 Mini

80.0%

Grok-3

79.0%

o3-mini

74.1%

Gemini Pro 2.5 Experimental

70.4%

65.9%

63.4%

62.5%

55.5%

Current model

Other models

Avg (68.8%)

Knowledge

GPQA

87.7%

Claude 3.7 Sonnet

84.8%

Grok-3

84.6%

Grok-3 Mini

84.6%

Gemini Pro 2.5 Experimental

84.0%

o3-mini

79.7%

o1-pro

79.0%

78.0%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (76.4%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Grok-3 Mini. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Grok-3 Mini

Model Specifications

Core Specifications

Capabilities & License

Resources

Performance Insights

Model Comparison

Detailed Benchmarks

Math

AIME 2024

AIME 2025

Coding

LiveCodeBench

Knowledge

GPQA

Providers Pricing Coming Soon

Share your feedback

Stay Ahead with AI Updates