Qwen2.5 14B Instruct

Qwen

Qwen2.5-14B-Instruct is an instruction-tuned language model with 14.7 billion parameters and part of the Qwen2.5 series. This model boasts substantial enhancements in instruction adherence, producing extended text (8,000+ tokens), interpreting structured data, and generating JSON outputs. With a context length of 128,000 tokens, it proficiently handles multiple languages, supporting over 29, including Chinese, English, French, and Spanish.

Model Specifications

Technical details and capabilities of Qwen2.5 14B Instruct

Core Specifications

14.7B Parameters

Model size and complexity

18000.0B Training Tokens

Amount of data used in training

131.1K / 8.2K

Input / Output tokens

September 18, 2024

Release date

Capabilities & License

Multimodal Support

Not Supported

Web Hydrated

No

License

apache-2.0

Resources

Research Paper

https://arxiv.org/abs/2407.10671

API Reference

https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api

Code Repository

https://github.com/QwenLM/Qwen2.5

Performance Insights

Check out how Qwen2.5 14B Instruct handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

94.8

GSM8K

94.8

(95%)

83.5

HumanEval

83.5

(84%)

82

MBPP

82

(82%)

80

MMLU-Redux

80

(80%)

80

MATH

80

(80%)

79.7

MMLU

79.7

(80%)

78.2

BBH

78.2

(78%)

76.4

MMLU-STEM

76.4

(76%)

72.8

MultiPL-E

72.8

(73%)

67.3

ARC-C

67.3

(67%)

63.7

MMLU-Pro

63.7

(64%)

63.2

MBPP+

63.2

(63%)

58.4

TruthfulQA

58.4

(58%)

51.2

HumanEval+

51.2

(51%)

45.5

GPQA

45.5

(46%)

43

TheoremQA

43

(43%)

GSM8K

HumanEval

MBPP

MMLU-Redux

MATH

MMLU

BBH

MMLU-STEM

MultiPL-E

ARC-C

MMLU-Pro

MBPP+

TruthfulQA

HumanEval+

GPQA

TheoremQA

Detailed Benchmarks

Dive deeper into Qwen2.5 14B Instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

GSM8K

97.1%

Qwen2.5 72B Instruct

95.8%

95.1%

95.0%

94.8%

Qwen2.5 14B Instruct

94.8%

Mistral Large 2

93.0%

92.3%

Claude 3 Sonnet

92.3%

68.6%

Current model

Other models

Avg (91.9%)

Coding

HumanEval

Claude 3.5 Sonnet

93.7%

84.9%

Mistral Small 3

84.8%

Qwen2.5 7B Instruct

84.8%

84.1%

Qwen2.5 14B Instruct

83.5%

82.6%

81.1%

81.1%

Ministral 8B Instruct

34.8%

Current model

Other models

Avg (79.5%)

HumanEval+

82.8%

Qwen2.5 32B Instruct

52.4%

Qwen2.5 14B Instruct

51.2%

Current model

Other models

Avg (62.1%)

MBPP

Qwen2.5-Coder 32B Instruct

90.2%

Qwen2.5 72B Instruct

88.2%

Qwen2.5 32B Instruct

84.0%

Qwen2.5-Coder 7B Instruct

83.5%

Qwen2.5 14B Instruct

82.0%

Phi-3.5-MoE-instruct

80.8%

Qwen2 72B Instruct

80.2%

Qwen2.5 7B Instruct

79.2%

52.4%

Current model

Other models

Avg (80.1%)

MBPP+

Qwen2.5 32B Instruct

67.2%

Qwen2.5 14B Instruct

63.2%

Current model

Other models

Avg (65.2%)

Knowledge

MMLU

91.8%

Mistral Small 3.1 24B

80.6%

80.5%

80.4%

Llama 3.1 Nemotron 70B Instruct

80.2%

Qwen2.5 14B Instruct

79.7%

Claude 3 Sonnet

79.0%

Gemini 1.5 Flash

78.9%

Phi-3.5-MoE-instruct

78.9%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (79.3%)

GPQA

87.7%

Qwen2.5 72B Instruct

49.0%

48.0%

46.9%

Llama 3.2 90B Instruct

46.7%

Qwen2.5 14B Instruct

45.5%

Mistral Small 3

45.3%

Qwen2 72B Instruct

42.4%

42.0%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (47.9%)

MATH

97.9%

85.5%

Qwen2.5 32B Instruct

83.1%

Qwen2.5 72B Instruct

83.1%

80.4%

Qwen2.5 14B Instruct

80.0%

Claude 3.5 Sonnet

78.3%

78.0%

Gemini 1.5 Flash

77.9%

32.6%

Current model

Other models

Avg (77.7%)

Hallucination

TruthfulQA

Phi-3.5-MoE-instruct

77.5%

Phi-3.5-mini-instruct

64.0%

Llama 3.1 Nemotron 70B Instruct

58.6%

Qwen2.5 14B Instruct

58.4%

Jamba 1.5 Large

58.3%

Qwen2.5 32B Instruct

57.8%

Qwen2 72B Instruct

54.8%

Qwen2.5-Coder 32B Instruct

54.2%

Mistral NeMo Instruct

50.3%

Current model

Other models

Avg (59.3%)

Non categorized

MMLU-Pro

84.0%

Llama 3.1 70B Instruct

66.4%

Mistral Small 3

66.3%

Claude 3.5 Haiku

65.0%

Qwen2 72B Instruct

64.4%

Qwen2.5 14B Instruct

63.7%

Gemini 1.5 Flash 8B

58.7%

Claude 3 Sonnet

56.8%

Qwen2.5 7B Instruct

56.3%

Qwen2.5-Coder 7B Instruct

40.1%

Current model

Other models

Avg (62.2%)

MMLU-Redux

92.9%

89.1%

Qwen2.5 72B Instruct

86.8%

Qwen2.5 32B Instruct

83.9%

Qwen2.5 14B Instruct

80.0%

Qwen2.5-Coder 32B Instruct

77.5%

Qwen2.5 7B Instruct

75.4%

Qwen2.5-Coder 7B Instruct

66.6%

Current model

Other models

Avg (81.5%)

BBH

86.9%

Qwen2.5 32B Instruct

84.5%

84.3%

82.4%

Qwen2 72B Instruct

82.4%

79.5%

Qwen2.5 14B Instruct

78.2%

70.6%

Current model

Other models

Avg (81.1%)

ARC-C

94.8%

92.4%

90.2%

Llama 3.2 3B Instruct

78.6%

Qwen2.5 32B Instruct

70.4%

Qwen2.5 14B Instruct

67.3%

Current model

Other models

Avg (82.3%)

TheoremQA

Qwen2.5 32B Instruct

44.1%

Qwen2.5-Coder 32B Instruct

43.1%

Qwen2.5 14B Instruct

43.0%

Qwen2.5-Coder 7B Instruct

34.0%

Current model

Other models

Avg (41.1%)

MMLU-STEM

Qwen2.5 32B Instruct

80.9%

Qwen2.5 14B Instruct

76.4%

Current model

Other models

Avg (78.6%)

MultiPL-E

Qwen2.5 32B Instruct

75.4%

Qwen2.5 72B Instruct

75.1%

Qwen2.5 14B Instruct

72.8%

Qwen2.5 7B Instruct

70.4%

Qwen2 72B Instruct

69.2%

Qwen2 7B Instruct

59.1%

Current model

Other models

Avg (70.3%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Qwen2.5 14B Instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models