DeepSeek-V3

DeepSeek

DeepSeek-V3 is a cutting-edge Mixture-of-Experts (MoE) language model boasting 671 billion parameters, with 37 billion activated for each token. It incorporates Multi-head Latent Attention (MLA) and a unique load-balancing strategy that avoids auxiliary losses, alongside multi-token prediction training objectives. Pre-trained on a massive dataset of 14.8 trillion tokens, this model excels in complex reasoning, mathematical problem-solving, and code generation. The MLA architecture is used for inference, while the DeepSeekMoE architecture ensures cost-effective training. The load-balancing strategy efficiently distributes computational tasks across experts, preventing interference that could hinder model accuracy. The model's multi-token prediction training enhances data efficiency and accelerates inference through speculative decoding. DeepSeek-V3's performance has been rigorously validated on various benchmarks, surpassing other open-source models with scores of 88.5 and 75.9 on the MMLU and MMLU-Pro educational datasets, respectively, and 90.2 on the MATH-500 mathematical reasoning task. Impressively, DeepSeek-V3 achieved these state-of-the-art capabilities for a relatively low training cost of $5.576 million, utilizing just 2.788 million H800 GPU hours.

Model Specifications

Technical details and capabilities of DeepSeek-V3

Core Specifications

671.0B Parameters

Model size and complexity

14800.0B Training Tokens

Amount of data used in training

131.1K / 131.1K

Input / Output tokens

December 24, 2024

Release date

Capabilities & License

Multimodal Support

Not Supported

Web Hydrated

No

License

MIT + Model License (Commercial use allowed)

Resources

Research Paper

https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

API Reference

https://platform.deepseek.com

Playground

https://chat.deepseek.com

Code Repository

https://github.com/deepseek-ai/DeepSeek-V3

Performance Insights

Check out how DeepSeek-V3 handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

91.6

DROP

91.6

(92%)

91

MBPPPlus

91

(91%)

90.9

CLUEWSC

90.9

(91%)

90.2

MATH-500

90.2

(90%)

89.1

MMLU-Redux

89.1

(89%)

88.5

MMLU

88.5

(89%)

86.5

C-Eval

86.5

(87%)

86.1

IFEval

86.1

(86%)

86

RepoQA 32k

86

(86%)

82.6

HumanEval-Mul

82.6

(83%)

79.7

Aider-Edit

79.7

(80%)

75.9

MMLU-Pro

75.9

(76%)

73.3

FRAMES

73.3

(73%)

64.8

C-SimpleQA

64.8

(65%)

61.6

MATH

61.6

(62%)

60

BFCL

60

(60%)

59.1

GPQA

59.1

(59%)

58.0

SQL

58.0

(58%)

55.0

Taubench Retail

55.0

(55%)

51.6

Codeforces

51.6

(52%)

49.6

Aider-Polyglot

49.6

(50%)

48.7

LongBench v2

48.7

(49%)

43.2

CNMO-2024

43.2

(43%)

42

SWE-bench Verified

42

(42%)

40.5

LiveCodeBench

40.5

(41%)

39.2

AIME-2024

39.2

(39%)

37.6

LiveCodeBench

37.6

(38%)

30

Taubench Airline

30

(30%)

24.9

SimpleQA

24.9

(25%)

DROP

MBPPPlus

CLUEWSC

MATH-500

MMLU-Redux

MMLU

C-Eval

IFEval

RepoQA 32k

HumanEval-Mul

Aider-Edit

MMLU-Pro

FRAMES

C-SimpleQA

MATH

BFCL

GPQA

SQL

Taubench Retail

Codeforces

Aider-Polyglot

LongBench v2

CNMO-2024

SWE-bench Verified

LiveCodeBench

AIME-2024

LiveCodeBench

Taubench Airline

SimpleQA

Model Comparison

See how DeepSeek-V3 stacks up against other leading models across key performance metrics.

100

80

60

40

20

0

88.5

MMLU - DeepSeek-V3

88.5

(89%)

90.4

MMLU - Claude 3.5 Sonnet

90.4

(90%)

86.8

MMLU - Claude 3 Opus

86.8

(87%)

87.3

MMLU - Llama 3.1 405B Instruct

87.3

(87%)

88.7

MMLU - GPT-4o

88.7

(89%)

84.8

MMLU - Phi-4

84.8

(85%)

75.9

MMLU-Pro - DeepSeek-V3

75.9

(76%)

76.1

MMLU-Pro - Claude 3.5 Sonnet

76.1

(76%)

68.5

MMLU-Pro - Claude 3 Opus

68.5

(69%)

73.3

MMLU-Pro - Llama 3.1 405B Instruct

73.3

(73%)

72.6

MMLU-Pro - GPT-4o

72.6

(73%)

70.4

MMLU-Pro - Phi-4

70.4

(70%)

91.6

DROP - DeepSeek-V3

91.6

(92%)

87.1

DROP - Claude 3.5 Sonnet

87.1

(87%)

83.1

DROP - Claude 3 Opus

83.1

(83%)

84.8

DROP - Llama 3.1 405B Instruct

84.8

(85%)

83.4

DROP - GPT-4o

83.4

(83%)

75.5

DROP - Phi-4

75.5

(76%)

59.1

GPQA - DeepSeek-V3

59.1

(59%)

59.4

GPQA - Claude 3.5 Sonnet

59.4

(59%)

50.4

GPQA - Claude 3 Opus

50.4

(50%)

50.7

GPQA - Llama 3.1 405B Instruct

50.7

(51%)

53.6

GPQA - GPT-4o

53.6

(54%)

56.1

GPQA - Phi-4

56.1

(56%)

61.6

MATH - DeepSeek-V3

61.6

(62%)

71.1

MATH - Claude 3.5 Sonnet

71.1

(71%)

60.1

MATH - Claude 3 Opus

60.1

(60%)

73.8

MATH - Llama 3.1 405B Instruct

73.8

(74%)

76.6

MATH - GPT-4o

76.6

(77%)

80.4

MATH - Phi-4

80.4

(80%)

MMLU

MMLU-Pro

DROP

GPQA

MATH

DeepSeek-V3

Claude 3.5 Sonnet

Claude 3 Opus

Llama 3.1 405B Instruct

GPT-4o

Phi-4

Detailed Benchmarks

Dive deeper into DeepSeek-V3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

MATH-500

97.3%

Claude 3.7 Sonnet

96.2%

96.2%

QwQ-32B-Preview

90.6%

90.2%

90.0%

Current model

Other models

Avg (93.4%)

Coding

LiveCodeBench

80.0%

62.5%

Qwen2.5 72B Instruct

55.5%

QwQ-32B-Preview

50.0%

41.8%

40.5%

39.0%

37.6%

Gemini 2.0 Flash

35.1%

Qwen2.5-Coder 7B Instruct

18.2%

Current model

Other models

Avg (46.0%)

Codeforces

94.0%

90.0%

79.0%

68.0%

51.6%

47.0%

41.3%

31.4%

11.0%

Current model

Other models

Avg (57.0%)

SWE-bench Verified

Claude 3.7 Sonnet

70.3%

49.3%

49.2%

Claude 3.5 Sonnet

49.0%

48.9%

42.0%

Claude 3.5 Haiku

40.6%

38.0%

Current model

Other models

Avg (48.4%)

Aider-Polyglot

53.3%

49.6%

Current model

Other models

Avg (51.4%)

Reasoning

DROP

92.2%

91.6%

Claude 3.5 Sonnet

87.1%

Claude 3.5 Sonnet

87.1%

86.0%

85.4%

Llama 3.1 405B Instruct

84.8%

83.4%

Current model

Other models

Avg (87.2%)

Knowledge

MMLU

91.8%

90.8%

Claude 3.5 Sonnet

90.4%

Claude 3.5 Sonnet

90.4%

88.7%

88.5%

88.0%

87.5%

87.4%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (86.7%)

GPQA

87.7%

QwQ-32B-Preview

65.2%

Gemini 2.0 Flash

62.1%

60.0%

Claude 3.5 Sonnet

59.4%

59.1%

59.1%

56.1%

56.0%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (59.0%)

MATH

97.9%

Claude 3.5 Haiku

69.4%

69.3%

Mistral Small 3.1 24B

69.3%

Llama 3.2 90B Instruct

68.0%

61.6%

60.1%

Qwen2 72B Instruct

59.7%

Phi-3.5-MoE-instruct

59.5%

32.6%

Current model

Other models

Avg (64.7%)

Non categorized

MMLU-Redux

92.9%

89.1%

Qwen2.5 72B Instruct

86.8%

Qwen2.5 32B Instruct

83.9%

Qwen2.5 14B Instruct

80.0%

Qwen2.5-Coder 32B Instruct

77.5%

Qwen2.5 7B Instruct

75.4%

Qwen2.5-Coder 7B Instruct

66.6%

Current model

Other models

Avg (81.5%)

MMLU-Pro

84.0%

Claude 3.5 Sonnet

77.6%

Gemini 2.0 Flash

76.4%

Claude 3.5 Sonnet

76.1%

75.9%

75.8%

75.5%

74.7%

Qwen2.5-Coder 7B Instruct

40.1%

Current model

Other models

Avg (72.9%)

IFEval

Claude 3.7 Sonnet

93.2%

89.7%

Llama 3.1 405B Instruct

88.6%

Llama 3.1 70B Instruct

87.5%

87.2%

86.1%

85.6%

Qwen2.5 72B Instruct

84.1%

84.0%

61.3%

Current model

Other models

Avg (84.7%)

SimpleQA

62.5%

43.6%

42.6%

42.4%

30.1%

24.9%

13.8%

Mistral Small 3.1 24B

10.4%

3.0%

Current model

Other models

Avg (30.4%)

FRAMES

82.5%

73.3%

Current model

Other models

Avg (77.9%)

LongBench v2

48.7%

Mistral Small 3.1 24B

37.2%

Current model

Other models

Avg (42.9%)

CLUEWSC

92.8%

91.4%

90.9%

Current model

Other models

Avg (91.7%)

C-Eval

91.8%

86.5%

Qwen2 72B Instruct

83.8%

Qwen2 7B Instruct

77.2%

Current model

Other models

Avg (84.8%)

C-SimpleQA

64.8%

63.7%

Current model

Other models

Avg (64.3%)

Taubench Retail

60.0%

60.0%

55.0%

Current model

Other models

Avg (58.3%)

Taubench Airline

43.0%

41.0%

30.0%

Current model

Other models

Avg (38.0%)

BFCL

Llama 3.1 8B Instruct

76.1%

74.0%

68.4%

66.6%

66.4%

65.0%

60.0%

56.2%

Current model

Other models

Avg (66.6%)

MBPPPlus

91.0%

89.0%

88.0%

Current model

Other models

Avg (89.3%)

SQL

71.0%

70.0%

58.0%

Current model

Other models

Avg (66.3%)

RepoQA 32k

91.0%

90.0%

86.0%

Current model

Other models

Avg (89.0%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for DeepSeek-V3. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models