Claude 3.5 Sonnet

Anthropic

Claude 3.5 Sonnet is a top-tier AI model that demonstrates exceptional software engineering abilities. It shows marked improvements in coding, strategic planning, and complex problem-solving, particularly in tasks requiring autonomous coding and the use of external tools. Currently in public beta, its "computer use" feature allows it to engage with computer interfaces in a manner analogous to a human operator.

Model Specifications

Technical details and capabilities of Claude 3.5 Sonnet

Core Specifications

200.0K / 200.0K

Input / Output tokens

October 21, 2024

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

No

License

Proprietary

Resources

Research Paper

https://www-cdn.anthropic.com/fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf

API Reference

https://docs.anthropic.com/en/docs/intro-to-claude#claude-3-5-family

Playground

https://claude.ai

Performance Insights

Check out how Claude 3.5 Sonnet handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

96.4

GSM8K

96.4

(96%)

96.4

GSM8K

96.4

(96%)

95.2

DocVQA

95.2

(95%)

94.7

AI2D

94.7

(95%)

93.7

HumanEval

93.7

(94%)

93.1

BIG-Bench Hard

93.1

(93%)

93.1

BIG-Bench Hard

93.1

(93%)

92

HumanEval

92

(92%)

91.6

MGSM

91.6

(92%)

91.6

MGSM

91.6

(92%)

90.8

ChartQA

90.8

(91%)

90.4

MMLU

90.4

(90%)

90.4

MMLU

90.4

(90%)

87.1

DROP

87.1

(87%)

87.1

DROP

87.1

(87%)

78.3

MATH

78.3

(78%)

77.6

MMLU-Pro

77.6

(78%)

76.1

MMLU-Pro

76.1

(76%)

71.1

MATH

71.1

(71%)

69.2

TAU-bench Retail

69.2

(69%)

68.3

MMMU

68.3

(68%)

67.7

MathVista

67.7

(68%)

67.2

GPQA

67.2

(67%)

59.4

GPQA

59.4

(59%)

49

SWE-bench Verified

49

(49%)

46

TAU-bench Airline

46

(46%)

22

OSWorld Extended

22

(22%)

14.9

OSWorld Screenshot-only

14.9

(15%)

GSM8K

GSM8K

DocVQA

AI2D

HumanEval

BIG-Bench Hard

BIG-Bench Hard

HumanEval

MGSM

MGSM

ChartQA

MMLU

MMLU

DROP

DROP

MATH

MMLU-Pro

MMLU-Pro

MATH

TAU-bench Retail

MMMU

MathVista

GPQA

GPQA

SWE-bench Verified

TAU-bench Airline

OSWorld Extended

OSWorld Screenshot-only

Detailed Benchmarks

Dive deeper into Claude 3.5 Sonnet's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

GSM8K

97.1%

Llama 3.1 405B Instruct

96.8%

Claude 3.5 Sonnet

96.4%

Claude 3.5 Sonnet

96.4%

Qwen2.5 32B Instruct

95.9%

95.9%

Qwen2.5 72B Instruct

95.8%

95.1%

68.6%

Current model

Other models

Avg (93.1%)

Coding

HumanEval

Claude 3.5 Sonnet

93.7%

Qwen2.5-Coder 32B Instruct

92.7%

92.4%

Claude 3.5 Sonnet

92.0%

Mistral Large 2

92.0%

90.2%

89.0%

89.0%

Current model

Other models

Avg (91.4%)

SWE-bench Verified

Claude 3.7 Sonnet

70.3%

Gemini Pro 2.5 Experimental

63.8%

49.3%

49.2%

Claude 3.5 Sonnet

49.0%

48.9%

42.0%

Claude 3.5 Haiku

40.6%

38.0%

Current model

Other models

Avg (50.1%)

Reasoning

DROP

92.2%

91.6%

Claude 3.5 Sonnet

87.1%

Claude 3.5 Sonnet

87.1%

86.0%

85.4%

Llama 3.1 405B Instruct

84.8%

83.4%

Llama 3.1 8B Instruct

59.5%

Current model

Other models

Avg (84.1%)

Knowledge

GPQA

87.7%

Gemini 2.0 Flash Thinking

74.2%

73.3%

71.5%

71.4%

Claude 3.5 Sonnet

67.2%

QwQ-32B-Preview

65.2%

Gemini 2.0 Flash

62.1%

60.0%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (65.8%)

MMLU

91.8%

90.8%

90.8%

Claude 3.5 Sonnet

90.4%

Claude 3.5 Sonnet

90.4%

88.7%

88.5%

88.0%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (87.0%)

MATH

97.9%

Qwen2.5 32B Instruct

83.1%

Qwen2.5 72B Instruct

83.1%

80.4%

Qwen2.5 14B Instruct

80.0%

Claude 3.5 Sonnet

78.3%

78.0%

Gemini 1.5 Flash

77.9%

Llama 3.3 70B Instruct

77.0%

32.6%

Current model

Other models

Avg (76.8%)

Non categorized

MGSM

92.0%

Claude 3.5 Sonnet

91.6%

Claude 3.5 Sonnet

91.6%

Llama 3.3 70B Instruct

91.1%

90.8%

90.7%

90.5%

89.3%

Current model

Other models

Avg (91.0%)

BIG-Bench Hard

Claude 3.5 Sonnet

93.1%

Claude 3.5 Sonnet

93.1%

Current model

Other models

Avg (93.1%)

MMLU-Pro

84.0%

Claude 3.5 Sonnet

77.6%

Gemini 2.0 Flash

76.4%

Claude 3.5 Sonnet

76.1%

75.9%

75.8%

75.5%

74.7%

Current model

Other models

Avg (77.0%)

TAU-bench Retail

Claude 3.7 Sonnet

81.2%

73.5%

Claude 3.5 Sonnet

69.2%

Claude 3.5 Haiku

51.0%

Current model

Other models

Avg (68.7%)

TAU-bench Airline

Claude 3.7 Sonnet

58.4%

54.2%

Claude 3.5 Sonnet

46.0%

Claude 3.5 Haiku

22.8%

Current model

Other models

Avg (45.3%)

MMMU

Gemini Pro 2.5 Experimental

81.7%

Gemini 2.0 Flash

70.7%

QvQ-72B-Preview

70.3%

70.0%

69.1%

Claude 3.5 Sonnet

68.3%

66.1%

65.9%

64.0%

0.0%

Current model

Other models

Avg (62.6%)

MathVista

74.9%

69.0%

Mistral Small 3.1 24B

68.9%

68.1%

68.1%

Claude 3.5 Sonnet

67.7%

Gemini 1.5 Flash

65.8%

63.8%

63.8%

0.0%

Current model

Other models

Avg (61.0%)

AI2D

Claude 3.5 Sonnet

94.7%

94.2%

93.8%

Mistral Small 3.1 24B

93.7%

88.3%

Phi-3.5-vision-instruct

78.1%

Current model

Other models

Avg (90.5%)

ChartQA

Claude 3.5 Sonnet

90.8%

88.1%

Mistral Small 3.1 24B

86.2%

85.7%

Llama 3.2 90B Instruct

85.5%

Llama 3.2 11B Instruct

83.4%

Phi-3.5-vision-instruct

81.8%

81.8%

Current model

Other models

Avg (85.4%)

DocVQA

Claude 3.5 Sonnet

95.2%

Mistral Small 3.1 24B

94.1%

93.6%

93.3%

93.2%

92.8%

90.7%

Llama 3.2 90B Instruct

90.1%

Current model

Other models

Avg (92.9%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Claude 3.5 Sonnet. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models