GPT-4.5

OpenAI

GPT-4.5 is OpenAI’s most capable general-purpose model yet, advancing the unsupervised learning axis with scaled pretraining, improved alignment, and deeper world knowledge. It leads on factuality benchmarks like SimpleQA with 62.5% accuracy—a 24-point gain over GPT-4o—and cuts hallucination rates nearly in half (37.1% vs. 61.8%). On human evaluations, it’s preferred over GPT-4o in creative (56.8%), professional (63.2%), and everyday (57.0%) use cases, suggesting stronger grasp of nuance, tone, and user intent. While it doesn’t explicitly reason like OpenAI’s o-series models (e.g., o1, o3-mini), it holds its own in STEM tasks, improving AIME '24 performance to 36.7% (up from GPT-4o’s 9.3%) and SWE-Bench Verified to 38.0%, though still trailing o3-mini’s 61.0%. What distinguishes GPT-4.5 isn’t raw logic but conversational feel: it’s more succinct, emotionally intelligent, and better at picking up implicit cues. Its responses sound less scripted and more human—more willing to ask, empathize, or suggest without overexplaining. It supports image inputs and structured outputs, making it a strong fit for tasks like tutoring, design critique, and multi-step agentic workflows. In short, GPT-4.5 doesn’t “think out loud,” but its scaled intuition and alignment make it OpenAI’s most reliable and collaborative assistant to date—especially for users who value factual grounding wrapped in conversational warmth.

Model Specifications

Technical details and capabilities of GPT-4.5

Core Specifications

128.0K / 16.4K

Input / Output tokens

February 26, 2025

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

License

Proprietary

Resources

API Reference

https://platform.openai.com/docs/api-reference

Playground

https://chat.openai.com/

Performance Insights

Check out how GPT-4.5 handles various AI tasks through comprehensive benchmark results.

85.1

MMMLU

85.1

(95%)

74.4

MMMU

74.4

(83%)

71.4

GPQA

71.4

(79%)

62.5

SimpleQA

62.5

(69%)

48.8

MRCR

48.8

(54%)

44.9

Aider Polyglot

44.9

(50%)

SWE-bench Verified

(42%)

36.7

AIME 2024

36.7

(41%)

32.6

SWE-Lancer

32.6

(36%)

6.4

Humanity's Last Exam

6.4

(7%)

MMMLU

MMMU

GPQA

SimpleQA

MRCR

Aider Polyglot

SWE-bench Verified

AIME 2024

SWE-Lancer

Humanity's Last Exam

Detailed Benchmarks

Dive deeper into GPT-4.5's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

o1-pro

86.0%

83.3%

80.0%

79.8%

77.5%

42.0%

36.7%

13.4%

Current model

Other models

Avg (62.3%)

Coding

SWE-bench Verified

Gemini Pro 2.5 Experimental

63.8%

o3-mini

49.3%

DeepSeek-R1

49.2%

Claude 3.5 Sonnet

49.0%

48.9%

DeepSeek-V3

42.0%

Claude 3.5 Haiku

40.6%

GPT-4.5

38.0%

Current model

Other models

Avg (47.6%)

Aider Polyglot

Gemini Pro 2.5 Experimental

74.0%

Claude 3.7 Sonnet

64.9%

61.7%

60.4%

56.9%

44.9%

27.1%

20.9%

Current model

Other models

Avg (51.3%)

Knowledge

GPQA

87.7%

78.0%

Gemini 2.0 Flash Thinking

74.2%

73.3%

71.5%

71.4%

67.2%

65.2%

62.1%

25.3%

Current model

Other models

Avg (67.6%)

Non categorized

MMMU

Gemini Pro 2.5 Experimental

81.7%

Grok-3

78.0%

77.3%

Gemini 2.0 Flash Thinking

75.4%

75.0%

74.4%

70.7%

70.3%

70.0%

0.0%

Current model

Other models

Avg (67.3%)

MMMLU

Claude 3.7 Sonnet

86.1%

GPT-4.5

85.1%

Current model

Other models

Avg (85.6%)

Humanity's Last Exam

Gemini Pro 2.5 Experimental

18.8%

14.0%

8.9%

8.6%

6.4%

Current model

Other models

Avg (11.3%)

SimpleQA

GPT-4.5

62.5%

GPT-4o

61.8%

Gemini Pro 2.5 Experimental

52.9%

Grok-3

43.6%

42.6%

o1-preview

42.4%

DeepSeek-R1

30.1%

DeepSeek-V3

24.9%

Current model

Other models

Avg (45.1%)

MRCR

Gemini Pro 2.5 Experimental

83.1%

82.6%

71.9%

69.2%

54.7%

48.8%

36.3%

Current model

Other models

Avg (63.8%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for GPT-4.5. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

GPT-4.5

Model Specifications

Core Specifications

Capabilities & License

Resources

Performance Insights

Detailed Benchmarks

Math

AIME 2024

Coding

SWE-bench Verified

Aider Polyglot

Knowledge

GPQA

Non categorized

MMMU

MMMLU

Humanity's Last Exam

SimpleQA

MRCR

Providers Pricing Coming Soon

Share your feedback

Stay Ahead with AI Updates