Phi-3.5-mini-instruct

Unknown Developer

Phi-3.5-mini-instruct is a compact 3.8 billion-parameter language model designed for efficiency. This model excels in handling long contexts of up to 128,000 tokens and demonstrates enhanced proficiency in more than 20 languages. Fine-tuned with extra training and safety protocols, it exhibits superior performance in instruction adherence, logical reasoning, mathematical problem-solving, and code creation. Released under the permissive MIT license, it is particularly well-suited for applications with limited memory or strict latency requirements.

Model Specifications

Technical details and capabilities of Phi-3.5-mini-instruct

Core Specifications

3.8B Parameters

Model size and complexity

3400.0B Training Tokens

Amount of data used in training

128.0K / 128.0K

Input / Output tokens

September 30, 2023

Knowledge cutoff date

August 22, 2024

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
MIT

Resources

Research Paper
https://arxiv.org/abs/2404.14219
API Reference
https://huggingface.co/microsoft/Phi-3.5-mini-instruct

Performance Insights

Check out how Phi-3.5-mini-instruct handles various AI tasks through comprehensive benchmark results.

90
68
45
23
0
86.2
GSM8K
86.2
(96%)
84.6
ARC Challenge
84.6
(94%)
84.1
RULER
84.1
(93%)
81
PIQA
81
(90%)
79.2
OpenBookQA
79.2
(88%)
78
BoolQ
78
(87%)
77
RepoQA
77
(86%)
74.7
Social IQA
74.7
(83%)
73.5
MEGA XStoryCloze
73.5
(82%)
69.6
MBPP
69.6
(77%)
69.4
HellaSwag
69.4
(77%)
69
BigBench Hard CoT
69
(77%)
69
MMLU
69
(77%)
68.5
WinoGrande
68.5
(76%)
64
TruthfulQA
64
(71%)
63.1
MEGA XCOPA
63.1
(70%)
62.8
HumanEval
62.8
(70%)
62.2
MEGA TyDi QA
62.2
(69%)
61.7
MEGA MLQA
61.7
(69%)
55.4
Multilingual MMLU
55.4
(62%)
48.5
MATH
48.5
(54%)
47.9
MGSM
47.9
(53%)
47.4
MMLU-Pro
47.4
(53%)
46.5
MEGA UDPOS
46.5
(52%)
41.9
Qasper
41.9
(47%)
37
Arena Hard
37
(41%)
30.9
Multilingual MMLU-Pro
30.9
(34%)
30.4
GPQA
30.4
(34%)
25.9
GovReport
25.9
(29%)
24.3
SQuALITY
24.3
(27%)
21.3
QMSum
21.3
(24%)
16
SummScreenFD
16
(18%)
GSM8K
ARC Challenge
RULER
PIQA
OpenBookQA
BoolQ
RepoQA
Social IQA
MEGA XStoryCloze
MBPP
HellaSwag
BigBench Hard CoT
MMLU
WinoGrande
TruthfulQA
MEGA XCOPA
HumanEval
MEGA TyDi QA
MEGA MLQA
Multilingual MMLU
MATH
MGSM
MMLU-Pro
MEGA UDPOS
Qasper
Arena Hard
Multilingual MMLU-Pro
GPQA
GovReport
SQuALITY
QMSum
SummScreenFD

Model Comparison

See how Phi-3.5-mini-instruct stacks up against other leading models across key performance metrics.

90
72
54
36
18
0
69
MMLU - Phi-3.5-mini-instruct
69
(77%)
69.8
MMLU - GPT-3.5 Turbo
69.8
(78%)
75.2
MMLU - Claude 3 Haiku
75.2
(84%)
81.3
MMLU - Grok-1.5
81.3
(90%)
78.9
MMLU - Phi-3.5-MoE-instruct
78.9
(88%)
79
MMLU - Claude 3 Sonnet
79
(88%)
30.4
GPQA - Phi-3.5-mini-instruct
30.4
(34%)
30.8
GPQA - GPT-3.5 Turbo
30.8
(34%)
33.3
GPQA - Claude 3 Haiku
33.3
(37%)
35.9
GPQA - Grok-1.5
35.9
(40%)
36.8
GPQA - Phi-3.5-MoE-instruct
36.8
(41%)
40.4
GPQA - Claude 3 Sonnet
40.4
(45%)
48.5
MATH - Phi-3.5-mini-instruct
48.5
(54%)
43.1
MATH - GPT-3.5 Turbo
43.1
(48%)
38.9
MATH - Claude 3 Haiku
38.9
(43%)
50.6
MATH - Grok-1.5
50.6
(56%)
59.5
MATH - Phi-3.5-MoE-instruct
59.5
(66%)
43.1
MATH - Claude 3 Sonnet
43.1
(48%)
62.8
HumanEval - Phi-3.5-mini-instruct
62.8
(70%)
68
HumanEval - GPT-3.5 Turbo
68
(76%)
75.9
HumanEval - Claude 3 Haiku
75.9
(84%)
74.1
HumanEval - Grok-1.5
74.1
(82%)
70.7
HumanEval - Phi-3.5-MoE-instruct
70.7
(79%)
73
HumanEval - Claude 3 Sonnet
73
(81%)
MMLU
GPQA
MATH
HumanEval
Phi-3.5-mini-instruct
GPT-3.5 Turbo
Claude 3 Haiku
Grok-1.5
Phi-3.5-MoE-instruct
Claude 3 Sonnet

Detailed Benchmarks

Dive deeper into Phi-3.5-mini-instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

Current model
Other models
Avg (62.3%)

Non categorized

Multilingual MMLU

Current model
Other models
Avg (62.7%)

Multilingual MMLU-Pro

Current model
Other models
Avg (38.1%)

MGSM

Current model
Other models
Avg (65.0%)

MEGA MLQA

Current model
Other models
Avg (63.5%)

MEGA TyDi QA

Current model
Other models
Avg (64.7%)

MEGA UDPOS

Current model
Other models
Avg (53.4%)

MEGA XCOPA

Current model
Other models
Avg (69.8%)

MEGA XStoryCloze

Current model
Other models
Avg (78.1%)

GovReport

Current model
Other models
Avg (26.2%)

QMSum

Current model
Other models
Avg (20.6%)

Qasper

Current model
Other models
Avg (40.9%)

SQuALITY

Current model
Other models
Avg (21.2%)

SummScreenFD

Current model
Other models
Avg (16.4%)

Arena Hard

Current model
Other models
Avg (48.3%)

BigBench Hard CoT

Current model
Other models
Avg (74.0%)

BoolQ

Current model
Other models
Avg (82.9%)

OpenBookQA

Current model
Other models
Avg (76.5%)

PIQA

Current model
Other models
Avg (83.6%)

Social IQA

Current model
Other models
Avg (76.4%)

WinoGrande

Current model
Other models
Avg (77.4%)

RULER

Current model
Other models
Avg (85.6%)

RepoQA

Current model
Other models
Avg (81.0%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Phi-3.5-mini-instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models