Claude 3 Sonnet

Anthropic

Claude 3 Sonnet is designed to provide the optimal combination of intelligence and speed, making it perfect for handling demanding business tasks. It offers exceptional performance at a more economical price point than similar models and is built to reliably handle extensive AI deployments.

Model Specifications

Technical details and capabilities of Claude 3 Sonnet

Core Specifications

200.0K / 200.0K

Input / Output tokens

February 28, 2024

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

License

Proprietary

Resources

Research Paper

https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf

API Reference

https://www.anthropic.com/claude

Playground

https://claude.ai

Performance Insights

Check out how Claude 3 Sonnet handles various AI tasks through comprehensive benchmark results.

100

93.2

ARC Challenge

93.2

(93%)

92.3

GSM8K

92.3

(92%)

HellaSwag

(89%)

83.5

MGSM

83.5

(84%)

82.9

BIG-Bench-Hard

82.9

(83%)

MMLU

(79%)

78.9

DROP

78.9

(79%)

HumanEval

(73%)

56.8

MMLU-Pro

56.8

(57%)

43.1

MATH

43.1

(43%)

40.4

GPQA

40.4

(40%)

ARC Challenge

GSM8K

HellaSwag

MGSM

BIG-Bench-Hard

MMLU

DROP

HumanEval

MMLU-Pro

MATH

GPQA

Model Comparison

See how Claude 3 Sonnet stacks up against other leading models across key performance metrics.

MMLU - Claude 3 Sonnet

(88%)

81.3

MMLU - Grok-1.5

81.3

(90%)

78.9

MMLU - Phi-3.5-MoE-instruct

78.9

(88%)

MMLU - Phi-3.5-mini-instruct

(77%)

82.3

MMLU - Qwen2 72B Instruct

82.3

(91%)

86.8

MMLU - Claude 3 Opus

86.8

(96%)

40.4

GPQA - Claude 3 Sonnet

40.4

(45%)

35.9

GPQA - Grok-1.5

35.9

(40%)

36.8

GPQA - Phi-3.5-MoE-instruct

36.8

(41%)

30.4

GPQA - Phi-3.5-mini-instruct

30.4

(34%)

42.4

GPQA - Qwen2 72B Instruct

42.4

(47%)

50.4

GPQA - Claude 3 Opus

50.4

(56%)

43.1

MATH - Claude 3 Sonnet

43.1

(48%)

50.6

MATH - Grok-1.5

50.6

(56%)

59.5

MATH - Phi-3.5-MoE-instruct

59.5

(66%)

48.5

MATH - Phi-3.5-mini-instruct

48.5

(54%)

59.7

MATH - Qwen2 72B Instruct

59.7

(66%)

60.1

MATH - Claude 3 Opus

60.1

(67%)

HumanEval - Claude 3 Sonnet

(81%)

74.1

HumanEval - Grok-1.5

74.1

(82%)

70.7

HumanEval - Phi-3.5-MoE-instruct

70.7

(79%)

62.8

HumanEval - Phi-3.5-mini-instruct

62.8

(70%)

HumanEval - Qwen2 72B Instruct

(96%)

84.9

HumanEval - Claude 3 Opus

84.9

(94%)

56.8

MMLU-Pro - Claude 3 Sonnet

56.8

(63%)

MMLU-Pro - Grok-1.5

(57%)

54.3

MMLU-Pro - Phi-3.5-MoE-instruct

54.3

(60%)

47.4

MMLU-Pro - Phi-3.5-mini-instruct

47.4

(53%)

64.4

MMLU-Pro - Qwen2 72B Instruct

64.4

(72%)

68.5

MMLU-Pro - Claude 3 Opus

68.5

(76%)

MMLU

GPQA

MATH

HumanEval

MMLU-Pro

Claude 3 Sonnet

Grok-1.5

Phi-3.5-MoE-instruct

Phi-3.5-mini-instruct

Qwen2 72B Instruct

Claude 3 Opus

Detailed Benchmarks

Dive deeper into Claude 3 Sonnet's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

GSM8K

97.1%

94.8%

94.8%

93.0%

92.3%

92.3%

91.6%

Llama 3.1 Nemotron 70B Instruct

91.4%

Qwen2.5-Coder 32B Instruct

91.1%

Gemma 2 9B

68.6%

Current model

Other models

Avg (90.7%)

Coding

HumanEval

Claude 3.5 Sonnet

93.7%

Llama 3.1 70B Instruct

80.5%

75.9%

74.3%

74.1%

73.0%

Llama 3.1 8B Instruct

72.6%

Pixtral-12B

72.0%

Phi-3.5-MoE-instruct

70.7%

Ministral 8B Instruct

34.8%

Current model

Other models

Avg (72.2%)

Reasoning

DROP

DeepSeek-R1

92.2%

Nova Lite

80.2%

GPT-4o mini

79.7%

Llama 3.1 70B Instruct

79.6%

79.3%

78.9%

78.4%

75.5%

74.9%

Llama 3.1 8B Instruct

59.5%

Current model

Other models

Avg (77.8%)

HellaSwag

95.4%

95.3%

89.0%

88.6%

87.6%

86.4%

85.9%

85.2%

Phi-3.5-mini-instruct

69.4%

Current model

Other models

Avg (87.0%)

Knowledge

MMLU

91.8%

Nova Lite

80.5%

DeepSeek-V2.5

80.4%

Llama 3.1 Nemotron 70B Instruct

80.2%

79.7%

79.0%

78.9%

78.9%

77.6%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (79.0%)

GPQA

87.7%

Qwen2 72B Instruct

42.4%

Nova Lite

42.0%

Llama 3.1 70B Instruct

41.7%

41.6%

40.4%

40.2%

40.0%

38.4%

25.3%

Current model

Other models

Avg (44.0%)

MATH

o3-mini

97.9%

Qwen2 7B Instruct

49.6%

Phi-3.5-mini-instruct

48.5%

Llama 3.2 3B Instruct

48.0%

Qwen2.5-Coder 7B Instruct

46.6%

43.1%

43.1%

42.3%

42.0%

32.6%

Current model

Other models

Avg (49.4%)

Non categorized

MGSM

o3-mini

92.0%

Gemini 1.5 Pro

87.5%

GPT-4o mini

87.0%

Llama 3.2 90B Instruct

86.9%

85.6%

83.5%

82.6%

80.6%

75.1%

Phi-3.5-mini-instruct

47.9%

Current model

Other models

Avg (80.9%)

BIG-Bench-Hard

Claude 3 Opus

86.8%

Claude 3 Sonnet

82.9%

Claude 3 Haiku

73.7%

Current model

Other models

Avg (81.1%)

ARC Challenge

Llama 3.1 405B Instruct

96.9%

Claude 3 Opus

96.4%

Llama 3.1 70B Instruct

94.8%

93.2%

93.0%

91.0%

89.2%

85.7%

68.9%

Current model

Other models

Avg (89.9%)

MMLU-Pro

84.0%

65.0%

64.4%

63.7%

58.7%

56.8%

56.3%

54.3%

53.5%

Qwen2.5-Coder 7B Instruct

40.1%

Current model

Other models

Avg (59.7%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Claude 3 Sonnet. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Claude 3 Sonnet

Model Specifications

Core Specifications

Capabilities & License

Resources

Performance Insights

Model Comparison

Detailed Benchmarks

Math

GSM8K

Coding

HumanEval

Reasoning

DROP

HellaSwag

Knowledge

MMLU

GPQA

MATH

Non categorized

MGSM

BIG-Bench-Hard

ARC Challenge

MMLU-Pro

Providers Pricing Coming Soon

Share your feedback

Stay Ahead with AI Updates