Mistral Small 3

Mistral AI

Mistral Small 3 decisively raises the bar within the "small" LLM category (<70B parameters), packing impressive capabilities into its compact 24B-parameter footprint. It achieves notable performance metrics across reasoning, instruction-following, and coding benchmarks, highlighted by a robust pass@1 score of 84.8% on HumanEval—nipping closely at the heels of significantly larger models like Llama-3.3-70B (85.4%) and Qwen-2.5-32B (90.9%). Human evaluation further reinforces its practical strengths: evaluators consistently preferred Mistral Small 3 over models such as Gemma-2-27B and Qwen-2.5-32B in over 50% of head-to-head comparisons, underscoring its genuine user appeal. Efficiency emerges as Mistral Small 3's standout quality. With a substantial 32k context window, the model caters effortlessly to expansive, context-rich applications. Critically, its latency optimization delivers 150 tokens per second on mainstream hardware, making it ideal for real-time, conversational applications—think responsive virtual assistants or low-latency function-calling agents. Remarkably, quantization enables deployment on accessible hardware like a single RTX 4090 GPU or a 32GB RAM MacBook, democratizing access to high-caliber LLM performance for local and privacy-sensitive use cases. In sum, Mistral Small 3 isn't just punching above its weight—it's setting a new efficiency-performance benchmark. It's a particularly compelling open-source alternative to proprietary models like GPT-4o-mini, offering comparable real-world effectiveness with dramatically lower computational demands.

Model Specifications

Technical details and capabilities of Mistral Small 3

Core Specifications

24.0B Parameters

Model size and complexity

32.0K / 32.0K

Input / Output tokens

January 29, 2025

Release date

Capabilities & License

Multimodal Support

Not Supported

Web Hydrated

License

Apache-2.0

Resources

API Reference

https://docs.mistral.ai/api/

Performance Insights

Check out how Mistral Small 3 handles various AI tasks through comprehensive benchmark results.

87.6

Arena Hard

87.6

(97%)

84.8

HumanEval

84.8

(94%)

83.5

MT-Bench

83.5

(93%)

82.9

IFEval

82.9

(92%)

70.6

MATH

70.6

(78%)

66.3

MMLU-Pro

66.3

(74%)

52.2

Wildbench

52.2

(58%)

45.3

GPQA

45.3

(50%)

Arena Hard

HumanEval

MT-Bench

IFEval

MATH

MMLU-Pro

Wildbench

GPQA

Detailed Benchmarks

Dive deeper into Mistral Small 3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

93.7%

86.0%

85.7%

85.4%

84.9%

84.8%

84.8%

84.1%

83.5%

Ministral 8B Instruct

34.8%

Current model

Other models

Avg (80.8%)

Knowledge

GPQA

87.7%

GPT-4 Turbo

48.0%

Nova Pro

46.9%

Llama 3.2 90B Instruct

46.7%

45.5%

45.3%

42.4%

42.0%

Llama 3.1 70B Instruct

41.7%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (47.1%)

MATH

97.9%

73.3%

73.0%

72.6%

71.1%

70.6%

70.2%

69.4%

69.3%

32.6%

Current model

Other models

Avg (70.0%)

Non categorized

MMLU-Pro

DeepSeek-R1

84.0%

Llama 3.3 70B Instruct

68.9%

Claude 3 Opus

68.5%

Gemini 1.5 Flash

67.3%

Llama 3.1 70B Instruct

66.4%

66.3%

65.0%

64.4%

63.7%

Qwen2.5-Coder 7B Instruct

40.1%

Current model

Other models

Avg (65.5%)

MT-Bench

Llama 3.1 Nemotron 70B Instruct

9.0

90.2%

86.3%

84.1%

83.5%

Ministral 8B Instruct

83.0%

Pixtral-12B

76.8%

Current model

Other models

Avg (2.0)

Arena Hard

Mistral Small 3

87.6%

Jamba 1.6 Large

76.5%

Mistral Large 2

71.5%

Ministral 8B Instruct

70.9%

Llama 3.3 70B Instruct

65.8%

Jamba 1.5 Large

65.4%

Jamba 1.6 Mini

51.2%

Jamba 1.5 Mini

46.1%

Current model

Other models

Avg (66.9%)

IFEval

93.2%

84.1%

84.0%

83.9%

83.3%

82.9%

Llama 3.1 8B Instruct

80.4%

Llama 3.2 3B Instruct

77.4%

Qwen2.5 7B Instruct

71.2%

Pixtral-12B

61.3%

Current model

Other models

Avg (80.2%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Mistral Small 3. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Mistral Small 3

Model Specifications

Core Specifications

Capabilities & License

Resources

Performance Insights

Detailed Benchmarks

Coding

HumanEval

Knowledge

GPQA

MATH

Non categorized

MMLU-Pro

MT-Bench

Arena Hard

IFEval

Providers Pricing Coming Soon

Share your feedback

Stay Ahead with AI Updates