o3-mini

OpenAI

This model is a streamlined version of O3, designed with several key improvements. Expect better handling of different types of data (like text and images), more advanced reasoning skills, and a lighter footprint in terms of computing resources. Despite these enhancements, it will still perform strongly on standard tasks.

Model Specifications

Technical details and capabilities of o3-mini

Core Specifications

200.0K / 200.0K

Input / Output tokens

May 31, 2024

Knowledge cutoff date

January 29, 2025

Release date

Capabilities & License

Multimodal Support

Not Supported

Web Hydrated

No

License

Proprietary

Resources

Research Paper

https://cdn.openai.com/o3-mini-system-card.pdf

API Reference

https://platform.openai.com/docs/models

Code Repository

https://github.com/openai

Performance Insights

Check out how o3-mini handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

97.9

MATH

97.9

(98%)

92

MGSM

92

(92%)

87.3

AIME 2024

87.3

(87%)

86.9

MMLU

86.9

(87%)

86.5

AIME 2025

86.5

(87%)

84.6

Livebench

84.6

(85%)

79.7

GPQA

79.7

(80%)

79

Codeforces

79

(79%)

74.1

LiveCodeBench

74.1

(74%)

60.4

Aider Polyglot

60.4

(60%)

49.3

SWE-bench Verified

49.3

(49%)

36.3

MRCR

36.3

(36%)

14.0

Humanity's Last Exam

14.0

(14%)

13.8

SimpleQA

13.8

(14%)

9.2

FrontierMath

9.2

(9%)

MATH

MGSM

AIME 2024

MMLU

AIME 2025

Livebench

GPQA

Codeforces

LiveCodeBench

Aider Polyglot

SWE-bench Verified

MRCR

Humanity's Last Exam

SimpleQA

FrontierMath

Detailed Benchmarks

Dive deeper into o3-mini's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

96.7%

95.8%

93.0%

Gemini Pro 2.5 Experimental

92.0%

87.3%

86.0%

83.3%

Claude 3.7 Sonnet

80.0%

13.4%

Current model

Other models

Avg (80.8%)

AIME 2025

93.0%

90.3%

Gemini Pro 2.5 Experimental

86.7%

86.5%

70.0%

Claude 3.7 Sonnet

49.5%

Current model

Other models

Avg (79.3%)

Coding

Codeforces

94.0%

90.0%

79.0%

68.0%

51.6%

47.0%

41.3%

31.4%

11.0%

Current model

Other models

Avg (57.0%)

SWE-bench Verified

Claude 3.7 Sonnet

70.3%

Gemini Pro 2.5 Experimental

63.8%

49.3%

49.2%

Claude 3.5 Sonnet

49.0%

48.9%

42.0%

Claude 3.5 Haiku

40.6%

38.0%

Current model

Other models

Avg (50.1%)

Aider Polyglot

Gemini Pro 2.5 Experimental

74.0%

Gemini Pro 2.5 Experimental

72.9%

Claude 3.7 Sonnet

64.9%

61.7%

60.4%

56.9%

44.9%

27.1%

20.9%

Current model

Other models

Avg (53.7%)

LiveCodeBench

80.0%

79.0%

74.1%

Gemini Pro 2.5 Experimental

70.4%

65.9%

63.4%

62.5%

Qwen2.5 72B Instruct

55.5%

Qwen2.5-Coder 7B Instruct

18.2%

Current model

Other models

Avg (63.2%)

Knowledge

GPQA

87.7%

Claude 3.7 Sonnet

84.8%

84.6%

84.6%

Gemini Pro 2.5 Experimental

84.0%

79.7%

79.0%

78.0%

Gemini 2.0 Flash Thinking

74.2%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (76.2%)

MMLU

91.8%

88.0%

87.5%

87.4%

Llama 3.1 405B Instruct

87.3%

86.9%

86.8%

86.5%

86.4%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (85.2%)

MATH

97.9%

96.4%

Gemini 2.0 Flash

89.7%

89.0%

86.5%

85.5%

Qwen2.5 32B Instruct

83.1%

Qwen2.5 72B Instruct

83.1%

Current model

Other models

Avg (88.9%)

Non categorized

FrontierMath

9.2%

5.5%

Current model

Other models

Avg (7.3%)

MGSM

92.0%

Claude 3.5 Sonnet

91.6%

Claude 3.5 Sonnet

91.6%

Llama 3.3 70B Instruct

91.1%

90.8%

90.7%

90.5%

89.3%

Current model

Other models

Avg (91.0%)

SimpleQA

62.5%

42.6%

42.4%

30.1%

24.9%

13.8%

Mistral Small 3.1 24B

10.4%

3.0%

Current model

Other models

Avg (28.7%)

Humanity's Last Exam

Gemini Pro 2.5 Experimental

18.8%

14.0%

Claude 3.7 Sonnet

8.9%

8.6%

6.4%

Current model

Other models

Avg (11.3%)

MRCR

Gemini Pro 2.5 Experimental

83.1%

82.6%

Gemini 1.5 Flash

71.9%

Gemini 2.0 Flash

69.2%

Gemini 1.5 Flash 8B

54.7%

48.8%

36.3%

Current model

Other models

Avg (63.8%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for o3-mini. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models