Claude 3.7 Sonnet logo

Claude 3.7 Sonnet

Anthropic

Claude 3.7 Sonnet is Anthropic’s most advanced model to date and the first hybrid reasoning model on the market, blending near-instant responses with a new "extended thinking" mode that visibly improves performance on complex reasoning tasks. This dual-mode capability allows users to trade speed for depth by adjusting a thinking token budget—up to 128K output tokens—enabling stronger results in math, physics, instruction-following, and software planning. Unlike prior models that silo reasoning into separate modes or products, Sonnet unifies them under a single API, creating a seamless experience for developers and end users alike. It's especially strong in enterprise-facing use cases, where Claude’s long context window, low hallucination rate, and agentic planning make it well-suited for content analysis, customer support, and data extraction. Where Claude 3.7 Sonnet truly stands out is in software engineering. It achieves **state-of-the-art results on SWE-bench Verified**—scoring 63.7% pass@1 with minimal scaffolding and 70.3% with higher test-time compute—surpassing GPT-4 Turbo and DeepSeek-VL. It also leads on TAU-bench, a framework for real-world agent tasks, reflecting Claude’s strength in multi-step tool use and planning. Claude Code, a new CLI-based coding agent, extends this further, completing end-to-end dev tasks in a single pass and outperforming rivals in handling complex codebases, according to testing by Cursor, Cognition, Replit, and Canva. With deeply integrated GitHub tooling and superior design intuition, Claude 3.7 is not only the best Claude model—it’s arguably the most practically useful LLM for coding and real-world reasoning workflows available today.

Model Specifications

Technical details and capabilities of Claude 3.7 Sonnet

Core Specifications

200.0K / 128.0K

Input / Output tokens

September 30, 2024

Knowledge cutoff date

February 23, 2025

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
No
License
Proprietary

Resources

API Reference
https://docs.anthropic.com/en/docs/about-claude/models/all-models
Playground
https://claude.ai

Performance Insights

Check out how Claude 3.7 Sonnet handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
96.2
MATH-500
96.2
(96%)
93.2
IFEval
93.2
(93%)
86.1
MMMLU
86.1
(86%)
84.8
GPQA
84.8
(85%)
81.2
TAU-bench Retail
81.2
(81%)
80
AIME 2024
80
(80%)
75
MMMU
75
(75%)
70.3
SWE-bench Verified
70.3
(70%)
64.9
Aider Polyglot
64.9
(65%)
58.4
TAU-bench Airline
58.4
(58%)
49.5
AIME 2025
49.5
(50%)
8.9
Humanity's Last Exam
8.9
(9%)
MATH-500
IFEval
MMMLU
GPQA
TAU-bench Retail
AIME 2024
MMMU
SWE-bench Verified
Aider Polyglot
TAU-bench Airline
AIME 2025
Humanity's Last Exam

Model Comparison

See how Claude 3.7 Sonnet stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
70.3
SWE-bench Verified - Claude 3.7 Sonnet
70.3
(70%)
63.8
SWE-bench Verified - Gemini Pro 2.5 Experimental
63.8
(64%)
49.3
SWE-bench Verified - o3-mini
49.3
(49%)
49.2
SWE-bench Verified - DeepSeek-R1
49.2
(49%)
38
SWE-bench Verified - GPT-4.5
38
(38%)
84.8
GPQA - Claude 3.7 Sonnet
84.8
(85%)
84
GPQA - Gemini Pro 2.5 Experimental
84
(84%)
79.7
GPQA - o3-mini
79.7
(80%)
71.5
GPQA - DeepSeek-R1
71.5
(72%)
71.4
GPQA - GPT-4.5
71.4
(71%)
80
AIME 2024 - Claude 3.7 Sonnet
80
(80%)
92
AIME 2024 - Gemini Pro 2.5 Experimental
92
(92%)
87.3
AIME 2024 - o3-mini
87.3
(87%)
79.8
AIME 2024 - DeepSeek-R1
79.8
(80%)
36.7
AIME 2024 - GPT-4.5
36.7
(37%)
64.9
Aider Polyglot - Claude 3.7 Sonnet
64.9
(65%)
74
Aider Polyglot - Gemini Pro 2.5 Experimental
74
(74%)
60.4
Aider Polyglot - o3-mini
60.4
(60%)
56.9
Aider Polyglot - DeepSeek-R1
56.9
(57%)
44.9
Aider Polyglot - GPT-4.5
44.9
(45%)
8.9
Humanity's Last Exam - Claude 3.7 Sonnet
8.9
(9%)
18.8
Humanity's Last Exam - Gemini Pro 2.5 Experimental
18.8
(19%)
14.0
Humanity's Last Exam - o3-mini
14.0
(14%)
8.6
Humanity's Last Exam - DeepSeek-R1
8.6
(9%)
6.4
Humanity's Last Exam - GPT-4.5
6.4
(6%)
SWE-bench Verified
GPQA
AIME 2024
Aider Polyglot
Humanity's Last Exam
Claude 3.7 Sonnet
Gemini Pro 2.5 Experimental
o3-mini
DeepSeek-R1
GPT-4.5

Detailed Benchmarks

Dive deeper into Claude 3.7 Sonnet's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

MATH-500

Current model
Other models
Avg (93.4%)

AIME 2024

Current model
Other models
Avg (73.8%)

AIME 2025

Current model
Other models
Avg (79.3%)

Coding

SWE-bench Verified

Current model
Other models
Avg (51.6%)

Aider Polyglot

Current model
Other models
Avg (53.7%)

Knowledge

GPQA

Current model
Other models
Avg (82.8%)

Non categorized

TAU-bench Retail

Current model
Other models
Avg (68.7%)

TAU-bench Airline

Current model
Other models
Avg (45.3%)

MMMLU

Current model
Other models
Avg (85.6%)

MMMU

Current model
Other models
Avg (67.0%)

IFEval

Current model
Other models
Avg (90.5%)

Humanity's Last Exam

Current model
Other models
Avg (11.3%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Claude 3.7 Sonnet. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models