GPT-4

OpenAI

GPT-4 is a powerful multimodal model that accepts both image and text inputs, producing sophisticated, human-quality text as output. Its performance on a range of professional and academic tests is comparable to that of a human.

Model Specifications

Technical details and capabilities of GPT-4

Core Specifications

32.8K / 32.8K

Input / Output tokens

December 30, 2022

Knowledge cutoff date

June 12, 2023

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
Yes
License
Proprietary

Resources

Research Paper
https://arxiv.org/abs/2303.08774
API Reference
https://platform.openai.com/docs/api-reference/chat
Playground
https://platform.openai.com/playground
Code Repository
https://github.com/openai/gpt-4

Performance Insights

Check out how GPT-4 handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
96.3
AI2 Reasoning Challenge (ARC)
96.3
(96%)
95.3
HellaSwag
95.3
(95%)
90
Uniform Bar Exam
90
(90%)
89
SAT Math
89
(89%)
88
LSAT
88
(88%)
87.5
Winogrande
87.5
(88%)
86.4
MMLU
86.4
(86%)
80.9
DROP
80.9
(81%)
74.5
MGSM
74.5
(75%)
67
HumanEval
67
(67%)
42
MATH
42
(42%)
35.7
GPQA
35.7
(36%)
AI2 Reasoning Challenge (ARC)
HellaSwag
Uniform Bar Exam
SAT Math
LSAT
Winogrande
MMLU
DROP
MGSM
HumanEval
MATH
GPQA

Model Comparison

See how GPT-4 stacks up against other leading models across key performance metrics.

90
72
54
36
18
0
86.4
MMLU - GPT-4
86.4
(96%)
79
MMLU - Claude 3 Sonnet
79
(88%)
75.2
MMLU - Claude 3 Haiku
75.2
(84%)
78.9
MMLU - Phi-3.5-MoE-instruct
78.9
(88%)
69.8
MMLU - GPT-3.5 Turbo
69.8
(78%)
69
MMLU - Phi-3.5-mini-instruct
69
(77%)
67
HumanEval - GPT-4
67
(74%)
73
HumanEval - Claude 3 Sonnet
73
(81%)
75.9
HumanEval - Claude 3 Haiku
75.9
(84%)
70.7
HumanEval - Phi-3.5-MoE-instruct
70.7
(79%)
68
HumanEval - GPT-3.5 Turbo
68
(76%)
62.8
HumanEval - Phi-3.5-mini-instruct
62.8
(70%)
35.7
GPQA - GPT-4
35.7
(40%)
40.4
GPQA - Claude 3 Sonnet
40.4
(45%)
33.3
GPQA - Claude 3 Haiku
33.3
(37%)
36.8
GPQA - Phi-3.5-MoE-instruct
36.8
(41%)
30.8
GPQA - GPT-3.5 Turbo
30.8
(34%)
30.4
GPQA - Phi-3.5-mini-instruct
30.4
(34%)
42
MATH - GPT-4
42
(47%)
43.1
MATH - Claude 3 Sonnet
43.1
(48%)
38.9
MATH - Claude 3 Haiku
38.9
(43%)
59.5
MATH - Phi-3.5-MoE-instruct
59.5
(66%)
43.1
MATH - GPT-3.5 Turbo
43.1
(48%)
48.5
MATH - Phi-3.5-mini-instruct
48.5
(54%)
74.5
MGSM - GPT-4
74.5
(83%)
83.5
MGSM - Claude 3 Sonnet
83.5
(93%)
75.1
MGSM - Claude 3 Haiku
75.1
(83%)
58.7
MGSM - Phi-3.5-MoE-instruct
58.7
(65%)
56.3
MGSM - GPT-3.5 Turbo
56.3
(63%)
47.9
MGSM - Phi-3.5-mini-instruct
47.9
(53%)
MMLU
HumanEval
GPQA
MATH
MGSM
GPT-4
Claude 3 Sonnet
Claude 3 Haiku
Phi-3.5-MoE-instruct
GPT-3.5 Turbo
Phi-3.5-mini-instruct

Detailed Benchmarks

Dive deeper into GPT-4's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

Current model
Other models
Avg (63.4%)

Reasoning

HellaSwag

Current model
Other models
Avg (89.2%)

DROP

Current model
Other models
Avg (80.7%)

Knowledge

MMLU

Current model
Other models
Avg (84.7%)

GPQA

Current model
Other models
Avg (39.4%)

MATH

Current model
Other models
Avg (47.0%)

Non categorized

Winogrande

Current model
Other models
Avg (82.2%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for GPT-4. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models