o1

OpenAI

This model is a research preview that excels in math and logic. It's particularly good at tasks requiring detailed reasoning, like solving math problems and generating code. It demonstrates improved formal reasoning skills while still performing well on a variety of general tasks.

Model Specifications

Technical details and capabilities of o1

Core Specifications

200.0K / 100.0K

Input / Output tokens

December 31, 2023

Knowledge cutoff date

December 16, 2024

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
Proprietary

Resources

Research Paper
https://cdn.openai.com/o1-system-card-20240917.pdf
API Reference
https://platform.openai.com/docs/models
Code Repository
https://openai.com/index/o1-and-new-tools-for-developers/

Performance Insights

Check out how o1 handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
97.1
GSM8K
97.1
(97%)
96.4
MATH
96.4
(96%)
92.8
GPQA Physics
92.8
(93%)
91.8
MMLU
91.8
(92%)
89.3
MGSM
89.3
(89%)
88.1
HumanEval
88.1
(88%)
83.3
AIME 2024
83.3
(83%)
78
GPQA
78
(78%)
77.3
MMMU
77.3
(77%)
73.5
TAU-bench Retail
73.5
(74%)
71
MathVista
71
(71%)
69.2
GPQA Biology
69.2
(69%)
67
LiveBench
67
(67%)
64.7
GPQA Chemistry
64.7
(65%)
61.7
Aider Polyglot
61.7
(62%)
54.2
TAU-bench Airline
54.2
(54%)
48.9
SWE-bench Verified
48.9
(49%)
47
Codeforces
47
(47%)
42.6
SimpleQA
42.6
(43%)
5.5
FrontierMath
5.5
(6%)
GSM8K
MATH
GPQA Physics
MMLU
MGSM
HumanEval
AIME 2024
GPQA
MMMU
TAU-bench Retail
MathVista
GPQA Biology
LiveBench
GPQA Chemistry
Aider Polyglot
TAU-bench Airline
SWE-bench Verified
Codeforces
SimpleQA
FrontierMath

Model Comparison

See how o1 stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
96.4
MATH - o1
96.4
(96%)
86.5
MATH - Gemini 1.5 Pro
86.5
(87%)
71.1
MATH - Claude 3.5 Sonnet
71.1
(71%)
83.1
MATH - Qwen2.5 32B Instruct
83.1
(83%)
73.8
MATH - Llama 3.1 405B Instruct
73.8
(74%)
76.6
MATH - Nova Pro
76.6
(77%)
91.8
MMLU - o1
91.8
(92%)
85.9
MMLU - Gemini 1.5 Pro
85.9
(86%)
90.4
MMLU - Claude 3.5 Sonnet
90.4
(90%)
83.3
MMLU - Qwen2.5 32B Instruct
83.3
(83%)
87.3
MMLU - Llama 3.1 405B Instruct
87.3
(87%)
85.9
MMLU - Nova Pro
85.9
(86%)
97.1
GSM8K - o1
97.1
(97%)
90.8
GSM8K - Gemini 1.5 Pro
90.8
(91%)
96.4
GSM8K - Claude 3.5 Sonnet
96.4
(96%)
95.9
GSM8K - Qwen2.5 32B Instruct
95.9
(96%)
96.8
GSM8K - Llama 3.1 405B Instruct
96.8
(97%)
94.8
GSM8K - Nova Pro
94.8
(95%)
88.1
HumanEval - o1
88.1
(88%)
84.1
HumanEval - Gemini 1.5 Pro
84.1
(84%)
92
HumanEval - Claude 3.5 Sonnet
92
(92%)
88.4
HumanEval - Qwen2.5 32B Instruct
88.4
(88%)
89
HumanEval - Llama 3.1 405B Instruct
89
(89%)
89
HumanEval - Nova Pro
89
(89%)
78
GPQA - o1
78
(78%)
59.1
GPQA - Gemini 1.5 Pro
59.1
(59%)
59.4
GPQA - Claude 3.5 Sonnet
59.4
(59%)
49.5
GPQA - Qwen2.5 32B Instruct
49.5
(50%)
50.7
GPQA - Llama 3.1 405B Instruct
50.7
(51%)
46.9
GPQA - Nova Pro
46.9
(47%)
MATH
MMLU
GSM8K
HumanEval
GPQA
o1
Gemini 1.5 Pro
Claude 3.5 Sonnet
Qwen2.5 32B Instruct
Llama 3.1 405B Instruct
Nova Pro

Detailed Benchmarks

Dive deeper into o1's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

GSM8K

Current model
Other models
Avg (96.2%)

AIME 2024

Current model
Other models
Avg (78.9%)

Coding

HumanEval

Current model
Other models
Avg (83.2%)

Codeforces

90.0%
79.0%
68.0%
47.0%
41.3%
11.0%
Current model
Other models
Avg (57.0%)

SWE-bench Verified

Current model
Other models
Avg (50.1%)

Aider Polyglot

Current model
Other models
Avg (53.7%)

Knowledge

MATH

Current model
Other models
Avg (88.9%)

MMLU

Current model
Other models
Avg (89.9%)

GPQA

Current model
Other models
Avg (73.7%)

Non categorized

MMMU

Current model
Other models
Avg (67.0%)

MathVista

Current model
Other models
Avg (62.3%)

LiveBench

Current model
Other models
Avg (54.7%)

MGSM

Current model
Other models
Avg (85.5%)

SimpleQA

Current model
Other models
Avg (40.4%)

TAU-bench Retail

Current model
Other models
Avg (68.7%)

TAU-bench Airline

Current model
Other models
Avg (45.3%)

FrontierMath

5.5%
Current model
Other models
Avg (7.3%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for o1. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models