Phi-3.5-vision-instruct

Unknown Developer

Phi-3.5-vision-instruct is an open-source multimodal model boasting 4.2 billion parameters and a large 128K context window. Designed with a focus on advanced image understanding and logical reasoning, it excels in both single-image tasks and complex multi-image analysis, including comparison, summarization, and video processing. Enhanced with safety-focused post-training, the model demonstrates improved instruction-following, alignment, and resilience when processing diverse visual and textual data. It is available for use under the permissive MIT license.

Model Specifications

Technical details and capabilities of Phi-3.5-vision-instruct

Core Specifications

4.2B Parameters

Model size and complexity

500.0B Training Tokens

Amount of data used in training

128.0K / 128.0K

Input / Output tokens

September 30, 2023

Knowledge cutoff date

August 22, 2024

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
No
License
MIT

Resources

Research Paper
https://arxiv.org/abs/2404.14219
API Reference
https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Performance Insights

Check out how Phi-3.5-vision-instruct handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
91.3
ScienceQA
91.3
(91%)
86.1
POPE
86.1
(86%)
81.9
MMBench
81.9
(82%)
81.8
ChartQA
81.8
(82%)
78.1
AI2D
78.1
(78%)
72
TextVQA
72
(72%)
43.9
MathVista
43.9
(44%)
43
MMMU
43
(43%)
36.3
InterGPS
36.3
(36%)
ScienceQA
POPE
MMBench
ChartQA
AI2D
TextVQA
MathVista
MMMU
InterGPS

Detailed Benchmarks

Dive deeper into Phi-3.5-vision-instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Non categorized

MMMU

Current model
Other models
Avg (44.4%)

MathVista

Current model
Other models
Avg (45.4%)

AI2D

Current model
Other models
Avg (90.5%)

ChartQA

Current model
Other models
Avg (83.9%)

TextVQA

Current model
Other models
Avg (75.0%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Phi-3.5-vision-instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models