Mistral Small 3.1 24B logo

Mistral Small 3.1 24B

New
Released in the last 30 days

Mistral AI

Mistral Small 3.1-24B Instruct, an upgraded variant of Mistral Small 3 (2501), brings together 24 billion parameters, native image-text understanding, and a 128k token context window to deliver state-of-the-art performance across reasoning (MMLU 80.62%, GPQA Diamond 45.96%), coding (HumanEval 88.41%, MBPP 74.71%), math (MATH 69.3%), and multilingual benchmarks (71.18% average across 24 languages), while also dominating vision tasks like MathVista (68.91%), DocVQA (94.08%), and AI2D (93.72%), outperforming similarly sized proprietary models like Gemma 3 27B and GPT-4o Mini in most categories, all under an Apache 2.0 license and runnable on a single RTX 4090 or 32GB MacBook—making it uniquely positioned for fast, private, and open deployment in applications ranging from conversational agents and function calling to document analysis and edge-device multimodal inference.

Model Specifications

Technical details and capabilities of Mistral Small 3.1 24B

Core Specifications

24.0B Parameters

Model size and complexity

128.0K / 128.0K

Input / Output tokens

March 16, 2025

Last Week

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
No
License
Apache

Resources

API Reference
https://docs.mistral.ai/api/

Performance Insights

Check out how Mistral Small 3.1 24B handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
94.1
DocVQA
94.1
(94%)
94.0
RULER 32k
94.0
(94%)
93.7
AI2D
93.7
(94%)
88.4
HumanEval
88.4
(88%)
86.2
ChartQA
86.2
(86%)
81.2
RULER 128k
81.2
(81%)
80.6
MMLU
80.6
(81%)
80.5
TriviaQA
80.5
(81%)
75.3
Multilingual European
75.3
(75%)
73
MM-MT-Bench
73
(73%)
71.2
Multilingual Average
71.2
(71%)
69.3
MATH
69.3
(69%)
69.2
Multilingual East Asian
69.2
(69%)
69.1
Multilingual Middle Eastern
69.1
(69%)
68.9
MathVista
68.9
(69%)
66.8
MMLU Pro
66.8
(67%)
62.8
MMMU
62.8
(63%)
49.3
MMMU-Pro
49.3
(49%)
46.0
GPQA Diamond
46.0
(46%)
44.4
GPQA Main
44.4
(44%)
37.2
LongBench v2
37.2
(37%)
10.4
SimpleQA
10.4
(10%)
DocVQA
RULER 32k
AI2D
HumanEval
ChartQA
RULER 128k
MMLU
TriviaQA
Multilingual European
MM-MT-Bench
Multilingual Average
MATH
Multilingual East Asian
Multilingual Middle Eastern
MathVista
MMLU Pro
MMMU
MMMU-Pro
GPQA Diamond
GPQA Main
LongBench v2
SimpleQA

Model Comparison

See how Mistral Small 3.1 24B stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
80.6
MMLU - Mistral Small 3.1 24B
80.6
(81%)
86.2
MMLU - Grok-2 mini
86.2
(86%)
87.5
MMLU - Grok-2
87.5
(88%)
90.4
MMLU - Claude 3.5 Sonnet
90.4
(90%)
82
MMLU - GPT-4o mini
82
(82%)
88.7
MMLU - GPT-4o
88.7
(89%)
88.4
HumanEval - Mistral Small 3.1 24B
88.4
(88%)
85.7
HumanEval - Grok-2 mini
85.7
(86%)
88.4
HumanEval - Grok-2
88.4
(88%)
92
HumanEval - Claude 3.5 Sonnet
92
(92%)
87.2
HumanEval - GPT-4o mini
87.2
(87%)
90.2
HumanEval - GPT-4o
90.2
(90%)
69.3
MATH - Mistral Small 3.1 24B
69.3
(69%)
73
MATH - Grok-2 mini
73
(73%)
76.1
MATH - Grok-2
76.1
(76%)
71.1
MATH - Claude 3.5 Sonnet
71.1
(71%)
70.2
MATH - GPT-4o mini
70.2
(70%)
76.6
MATH - GPT-4o
76.6
(77%)
68.9
MathVista - Mistral Small 3.1 24B
68.9
(69%)
68.1
MathVista - Grok-2 mini
68.1
(68%)
69
MathVista - Grok-2
69
(69%)
67.7
MathVista - Claude 3.5 Sonnet
67.7
(68%)
56.7
MathVista - GPT-4o mini
56.7
(57%)
63.8
MathVista - GPT-4o
63.8
(64%)
62.8
MMMU - Mistral Small 3.1 24B
62.8
(63%)
63.2
MMMU - Grok-2 mini
63.2
(63%)
66.1
MMMU - Grok-2
66.1
(66%)
68.3
MMMU - Claude 3.5 Sonnet
68.3
(68%)
59.4
MMMU - GPT-4o mini
59.4
(59%)
69.1
MMMU - GPT-4o
69.1
(69%)
MMLU
HumanEval
MATH
MathVista
MMMU
Mistral Small 3.1 24B
Grok-2 mini
Grok-2
Claude 3.5 Sonnet
GPT-4o mini
GPT-4o

Detailed Benchmarks

Dive deeper into Mistral Small 3.1 24B's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Knowledge

Non categorized

SimpleQA

Current model
Other models
Avg (26.4%)

MMMU-Pro

Current model
Other models
Avg (43.4%)

MathVista

Current model
Other models
Avg (62.9%)

MM-MT-Bench

Current model
Other models
Avg (73.5%)

DocVQA

Current model
Other models
Avg (92.9%)

AI2D

Current model
Other models
Avg (90.5%)

LongBench v2

Current model
Other models
Avg (42.9%)

TriviaQA

Current model
Other models
Avg (78.0%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Mistral Small 3.1 24B. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models