Claude 3.5 Sonnet logo

Claude 3.5 Sonnet

Anthropic

Claude 3.5 Sonnet is a top-tier AI model that demonstrates exceptional software engineering abilities. It shows marked improvements in coding, strategic planning, and complex problem-solving, particularly in tasks requiring autonomous coding and the use of external tools. Currently in public beta, its "computer use" feature allows it to engage with computer interfaces in a manner analogous to a human operator.

Model Specifications

Technical details and capabilities of Claude 3.5 Sonnet

Core Specifications

200.0K / 200.0K

Input / Output tokens

October 21, 2024

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
No
License
Proprietary

Resources

Research Paper
https://www-cdn.anthropic.com/fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf
API Reference
https://docs.anthropic.com/en/docs/intro-to-claude#claude-3-5-family
Playground
https://claude.ai

Performance Insights

Check out how Claude 3.5 Sonnet handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
96.4
GSM8K
96.4
(96%)
96.4
GSM8K
96.4
(96%)
95.2
DocVQA
95.2
(95%)
94.7
AI2D
94.7
(95%)
93.7
HumanEval
93.7
(94%)
93.1
BIG-Bench Hard
93.1
(93%)
93.1
BIG-Bench Hard
93.1
(93%)
92
HumanEval
92
(92%)
91.6
MGSM
91.6
(92%)
91.6
MGSM
91.6
(92%)
90.8
ChartQA
90.8
(91%)
90.4
MMLU
90.4
(90%)
90.4
MMLU
90.4
(90%)
87.1
DROP
87.1
(87%)
87.1
DROP
87.1
(87%)
78.3
MATH
78.3
(78%)
77.6
MMLU-Pro
77.6
(78%)
76.1
MMLU-Pro
76.1
(76%)
71.1
MATH
71.1
(71%)
69.2
TAU-bench Retail
69.2
(69%)
68.3
MMMU
68.3
(68%)
67.7
MathVista
67.7
(68%)
67.2
GPQA
67.2
(67%)
59.4
GPQA
59.4
(59%)
49
SWE-bench Verified
49
(49%)
46
TAU-bench Airline
46
(46%)
22
OSWorld Extended
22
(22%)
14.9
OSWorld Screenshot-only
14.9
(15%)
GSM8K
GSM8K
DocVQA
AI2D
HumanEval
BIG-Bench Hard
BIG-Bench Hard
HumanEval
MGSM
MGSM
ChartQA
MMLU
MMLU
DROP
DROP
MATH
MMLU-Pro
MMLU-Pro
MATH
TAU-bench Retail
MMMU
MathVista
GPQA
GPQA
SWE-bench Verified
TAU-bench Airline
OSWorld Extended
OSWorld Screenshot-only

Detailed Benchmarks

Dive deeper into Claude 3.5 Sonnet's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

GSM8K

Current model
Other models
Avg (93.1%)

Coding

HumanEval

Current model
Other models
Avg (91.4%)

SWE-bench Verified

Current model
Other models
Avg (50.1%)

Reasoning

DROP

Current model
Other models
Avg (84.1%)

Knowledge

GPQA

Current model
Other models
Avg (65.8%)

MMLU

Current model
Other models
Avg (87.0%)

Non categorized

MGSM

Current model
Other models
Avg (91.0%)

BIG-Bench Hard

Current model
Other models
Avg (93.1%)

MMLU-Pro

Current model
Other models
Avg (77.0%)

TAU-bench Retail

Current model
Other models
Avg (68.7%)

TAU-bench Airline

Current model
Other models
Avg (45.3%)

MMMU

Current model
Other models
Avg (62.6%)

MathVista

Current model
Other models
Avg (61.0%)

AI2D

Current model
Other models
Avg (90.5%)

DocVQA

Current model
Other models
Avg (92.9%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Claude 3.5 Sonnet. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models