Mistral Small 3 logo

Mistral Small 3

Mistral AI

Mistral Small 3 decisively raises the bar within the "small" LLM category (<70B parameters), packing impressive capabilities into its compact 24B-parameter footprint. It achieves notable performance metrics across reasoning, instruction-following, and coding benchmarks, highlighted by a robust pass@1 score of 84.8% on HumanEval—nipping closely at the heels of significantly larger models like Llama-3.3-70B (85.4%) and Qwen-2.5-32B (90.9%). Human evaluation further reinforces its practical strengths: evaluators consistently preferred Mistral Small 3 over models such as Gemma-2-27B and Qwen-2.5-32B in over 50% of head-to-head comparisons, underscoring its genuine user appeal. Efficiency emerges as Mistral Small 3's standout quality. With a substantial 32k context window, the model caters effortlessly to expansive, context-rich applications. Critically, its latency optimization delivers 150 tokens per second on mainstream hardware, making it ideal for real-time, conversational applications—think responsive virtual assistants or low-latency function-calling agents. Remarkably, quantization enables deployment on accessible hardware like a single RTX 4090 GPU or a 32GB RAM MacBook, democratizing access to high-caliber LLM performance for local and privacy-sensitive use cases. In sum, Mistral Small 3 isn't just punching above its weight—it's setting a new efficiency-performance benchmark. It's a particularly compelling open-source alternative to proprietary models like GPT-4o-mini, offering comparable real-world effectiveness with dramatically lower computational demands.

Model Specifications

Technical details and capabilities of Mistral Small 3

Core Specifications

24.0B Parameters

Model size and complexity

32.0K / 32.0K

Input / Output tokens

January 29, 2025

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
Apache-2.0

Resources

API Reference
https://docs.mistral.ai/api/

Performance Insights

Check out how Mistral Small 3 handles various AI tasks through comprehensive benchmark results.

90
68
45
23
0
87.6
Arena Hard
87.6
(97%)
84.8
HumanEval
84.8
(94%)
83.5
MT-Bench
83.5
(93%)
82.9
IFEval
82.9
(92%)
70.6
MATH
70.6
(78%)
66.3
MMLU-Pro
66.3
(74%)
52.2
Wildbench
52.2
(58%)
45.3
GPQA
45.3
(50%)
Arena Hard
HumanEval
MT-Bench
IFEval
MATH
MMLU-Pro
Wildbench
GPQA

Detailed Benchmarks

Dive deeper into Mistral Small 3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

Knowledge

GPQA

Current model
Other models
Avg (47.1%)

MATH

Current model
Other models
Avg (70.0%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Mistral Small 3. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models