
DeepSeek-V3
DeepSeek
DeepSeek-V3 is a cutting-edge Mixture-of-Experts (MoE) language model boasting 671 billion parameters, with 37 billion activated for each token. It incorporates Multi-head Latent Attention (MLA) and a unique load-balancing strategy that avoids auxiliary losses, alongside multi-token prediction training objectives. Pre-trained on a massive dataset of 14.8 trillion tokens, this model excels in complex reasoning, mathematical problem-solving, and code generation. The MLA architecture is used for inference, while the DeepSeekMoE architecture ensures cost-effective training. The load-balancing strategy efficiently distributes computational tasks across experts, preventing interference that could hinder model accuracy. The model's multi-token prediction training enhances data efficiency and accelerates inference through speculative decoding. DeepSeek-V3's performance has been rigorously validated on various benchmarks, surpassing other open-source models with scores of 88.5 and 75.9 on the MMLU and MMLU-Pro educational datasets, respectively, and 90.2 on the MATH-500 mathematical reasoning task. Impressively, DeepSeek-V3 achieved these state-of-the-art capabilities for a relatively low training cost of $5.576 million, utilizing just 2.788 million H800 GPU hours.
Model Specifications
Technical details and capabilities of DeepSeek-V3
Core Specifications
671.0B Parameters
Model size and complexity
14800.0B Training Tokens
Amount of data used in training
131.1K / 131.1K
Input / Output tokens
December 24, 2024
Release date
Capabilities & License
Performance Insights
Check out how DeepSeek-V3 handles various AI tasks through comprehensive benchmark results.
Model Comparison
See how DeepSeek-V3 stacks up against other leading models across key performance metrics.
Detailed Benchmarks
Dive deeper into DeepSeek-V3's performance across specific task categories. Expand each section to see detailed metrics and comparisons.
Math
MATH-500
Coding
LiveCodeBench
Codeforces
SWE-bench Verified
Reasoning
DROP
Knowledge
MMLU
GPQA
MATH
Non categorized
MMLU-Redux
MMLU-Pro
IFEval
SimpleQA
C-Eval
BFCL
Providers Pricing Coming Soon
We're working on gathering comprehensive pricing data from all major providers for DeepSeek-V3. Compare costs across platforms to find the best pricing for your use case.
Share your feedback
Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.
Your feedback helps us improve our service