GPT-4.5

OpenAI

GPT-4.5 is OpenAI’s most capable general-purpose model yet, advancing the unsupervised learning axis with scaled pretraining, improved alignment, and deeper world knowledge. It leads on factuality benchmarks like SimpleQA with 62.5% accuracy—a 24-point gain over GPT-4o—and cuts hallucination rates nearly in half (37.1% vs. 61.8%). On human evaluations, it’s preferred over GPT-4o in creative (56.8%), professional (63.2%), and everyday (57.0%) use cases, suggesting stronger grasp of nuance, tone, and user intent. While it doesn’t explicitly reason like OpenAI’s o-series models (e.g., o1, o3-mini), it holds its own in STEM tasks, improving AIME '24 performance to 36.7% (up from GPT-4o’s 9.3%) and SWE-Bench Verified to 38.0%, though still trailing o3-mini’s 61.0%. What distinguishes GPT-4.5 isn’t raw logic but conversational feel: it’s more succinct, emotionally intelligent, and better at picking up implicit cues. Its responses sound less scripted and more human—more willing to ask, empathize, or suggest without overexplaining. It supports image inputs and structured outputs, making it a strong fit for tasks like tutoring, design critique, and multi-step agentic workflows. In short, GPT-4.5 doesn’t “think out loud,” but its scaled intuition and alignment make it OpenAI’s most reliable and collaborative assistant to date—especially for users who value factual grounding wrapped in conversational warmth.

Model Specifications

Technical details and capabilities of GPT-4.5

Core Specifications

128.0K / 16.4K

Input / Output tokens

February 26, 2025

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
No
License
Proprietary

Resources

API Reference
https://platform.openai.com/docs/api-reference
Playground
https://chat.openai.com/

Performance Insights

Check out how GPT-4.5 handles various AI tasks through comprehensive benchmark results.

90
68
45
23
0
85.1
MMMLU
85.1
(95%)
74.4
MMMU
74.4
(83%)
71.4
GPQA
71.4
(79%)
62.5
SimpleQA
62.5
(69%)
48.8
MRCR
48.8
(54%)
44.9
Aider Polyglot
44.9
(50%)
38
SWE-bench Verified
38
(42%)
36.7
AIME 2024
36.7
(41%)
32.6
SWE-Lancer
32.6
(36%)
6.4
Humanity's Last Exam
6.4
(7%)
MMMLU
MMMU
GPQA
SimpleQA
MRCR
Aider Polyglot
SWE-bench Verified
AIME 2024
SWE-Lancer
Humanity's Last Exam

Detailed Benchmarks

Dive deeper into GPT-4.5's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

Current model
Other models
Avg (62.3%)

Coding

SWE-bench Verified

Current model
Other models
Avg (47.6%)

Aider Polyglot

Current model
Other models
Avg (51.3%)

Knowledge

GPQA

Current model
Other models
Avg (67.6%)

Non categorized

MMMU

Current model
Other models
Avg (67.3%)

MMMLU

Current model
Other models
Avg (85.6%)

Humanity's Last Exam

Current model
Other models
Avg (11.3%)

SimpleQA

Current model
Other models
Avg (45.1%)

MRCR

Current model
Other models
Avg (63.8%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for GPT-4.5. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models