DeepSeek-R1 logo

DeepSeek-R1

DeepSeek

DeepSeek-R1 is a cutting-edge reasoning model developed using DeepSeek-V3 as its foundation (671B total parameters, 37B activated per token). This first-generation model leverages extensive reinforcement learning (RL) to significantly improve its chain-of-thought processes and overall reasoning abilities. As a result, DeepSeek-R1 excels in complex tasks involving mathematics, coding, and multi-step reasoning.

Model Specifications

Technical details and capabilities of DeepSeek-R1

Core Specifications

671.0B Parameters

Model size and complexity

14800.0B Training Tokens

Amount of data used in training

131.1K / 131.1K

Input / Output tokens

January 19, 2025

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
MIT License

Resources

Research Paper
https://arxiv.org/abs/2501.12948
API Reference
https://api-docs.deepseek.com/news/news250120
Playground
https://chat.deepseek.com
Code Repository
https://github.com/deepseek-ai/DeepSeek-R1

Performance Insights

Check out how DeepSeek-R1 handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
97.3
MATH-500
97.3
(97%)
92.9
MMLU-Redux
92.9
(93%)
92.8
CLUEWSC
92.8
(93%)
92.3
ArenaHard
92.3
(92%)
92.2
DROP
92.2
(92%)
91.8
C-Eval
91.8
(92%)
90.8
MMLU
90.8
(91%)
87.6
AlpacaEval2.0
87.6
(88%)
84
MMLU-Pro
84
(84%)
83.3
IFEval
83.3
(83%)
82.5
FRAMES
82.5
(83%)
79.8
AIME 2024
79.8
(80%)
78.8
CNMO 2024
78.8
(79%)
71.5
GPQA
71.5
(72%)
70
AIME 2025
70
(70%)
65.9
LiveCodeBench
65.9
(66%)
63.7
C-SimpleQA
63.7
(64%)
56.9
Aider Polyglot
56.9
(57%)
53.3
Aider-Polyglot
53.3
(53%)
49.2
SWE-bench Verified
49.2
(49%)
30.1
SimpleQA
30.1
(30%)
8.6
Humanity's Last Exam
8.6
(9%)
MATH-500
MMLU-Redux
CLUEWSC
ArenaHard
DROP
C-Eval
MMLU
AlpacaEval2.0
MMLU-Pro
IFEval
FRAMES
AIME 2024
CNMO 2024
GPQA
AIME 2025
LiveCodeBench
C-SimpleQA
Aider Polyglot
Aider-Polyglot
SWE-bench Verified
SimpleQA
Humanity's Last Exam

Model Comparison

See how DeepSeek-R1 stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
90.8
MMLU - DeepSeek-R1
90.8
(91%)
88.5
MMLU - DeepSeek-V3
88.5
(89%)
88.7
MMLU - GPT-4o
88.7
(89%)
87.3
MMLU - Llama 3.1 405B Instruct
87.3
(87%)
84.8
MMLU - Phi-4
84.8
(85%)
83.6
MMLU - Llama 3.1 70B Instruct
83.6
(84%)
84
MMLU-Pro - DeepSeek-R1
84
(84%)
75.9
MMLU-Pro - DeepSeek-V3
75.9
(76%)
72.6
MMLU-Pro - GPT-4o
72.6
(73%)
73.3
MMLU-Pro - Llama 3.1 405B Instruct
73.3
(73%)
70.4
MMLU-Pro - Phi-4
70.4
(70%)
66.4
MMLU-Pro - Llama 3.1 70B Instruct
66.4
(66%)
92.2
DROP - DeepSeek-R1
92.2
(92%)
91.6
DROP - DeepSeek-V3
91.6
(92%)
83.4
DROP - GPT-4o
83.4
(83%)
84.8
DROP - Llama 3.1 405B Instruct
84.8
(85%)
75.5
DROP - Phi-4
75.5
(76%)
79.6
DROP - Llama 3.1 70B Instruct
79.6
(80%)
83.3
IFEval - DeepSeek-R1
83.3
(83%)
86.1
IFEval - DeepSeek-V3
86.1
(86%)
84
IFEval - GPT-4o
84
(84%)
88.6
IFEval - Llama 3.1 405B Instruct
88.6
(89%)
63
IFEval - Phi-4
63
(63%)
87.5
IFEval - Llama 3.1 70B Instruct
87.5
(88%)
71.5
GPQA - DeepSeek-R1
71.5
(72%)
59.1
GPQA - DeepSeek-V3
59.1
(59%)
53.6
GPQA - GPT-4o
53.6
(54%)
50.7
GPQA - Llama 3.1 405B Instruct
50.7
(51%)
56.1
GPQA - Phi-4
56.1
(56%)
41.7
GPQA - Llama 3.1 70B Instruct
41.7
(42%)
MMLU
MMLU-Pro
DROP
IFEval
GPQA
DeepSeek-R1
DeepSeek-V3
GPT-4o
Llama 3.1 405B Instruct
Phi-4
Llama 3.1 70B Instruct

Detailed Benchmarks

Dive deeper into DeepSeek-R1's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2024

Current model
Other models
Avg (68.3%)

MATH-500

Current model
Other models
Avg (93.4%)

AIME 2025

Current model
Other models
Avg (79.3%)

Coding

LiveCodeBench

Current model
Other models
Avg (63.2%)

SWE-bench Verified

Current model
Other models
Avg (50.1%)

Aider-Polyglot

Current model
Other models
Avg (51.4%)

Aider Polyglot

Current model
Other models
Avg (53.7%)

Reasoning

DROP

Current model
Other models
Avg (87.2%)

Knowledge

MMLU

Current model
Other models
Avg (89.9%)

GPQA

Current model
Other models
Avg (69.3%)

Non categorized

CLUEWSC

Current model
Other models
Avg (91.7%)

MMLU-Pro

Current model
Other models
Avg (77.0%)

IFEval

Current model
Other models
Avg (81.6%)

SimpleQA

Current model
Other models
Avg (32.6%)

FRAMES

Current model
Other models
Avg (77.9%)

ArenaHard

Current model
Other models
Avg (81.3%)

C-Eval

Current model
Other models
Avg (84.8%)

C-SimpleQA

Current model
Other models
Avg (64.3%)

Humanity's Last Exam

Current model
Other models
Avg (11.3%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for DeepSeek-R1. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models