Claude 3.7 Sonnet

Anthropic

Claude 3.7 Sonnet is Anthropic’s most advanced model to date and the first hybrid reasoning model on the market, blending near-instant responses with a new "extended thinking" mode that visibly improves performance on complex reasoning tasks. This dual-mode capability allows users to trade speed for depth by adjusting a thinking token budget—up to 128K output tokens—enabling stronger results in math, physics, instruction-following, and software planning. Unlike prior models that silo reasoning into separate modes or products, Sonnet unifies them under a single API, creating a seamless experience for developers and end users alike. It's especially strong in enterprise-facing use cases, where Claude’s long context window, low hallucination rate, and agentic planning make it well-suited for content analysis, customer support, and data extraction. Where Claude 3.7 Sonnet truly stands out is in software engineering. It achieves **state-of-the-art results on SWE-bench Verified**—scoring 63.7% pass@1 with minimal scaffolding and 70.3% with higher test-time compute—surpassing GPT-4 Turbo and DeepSeek-VL. It also leads on TAU-bench, a framework for real-world agent tasks, reflecting Claude’s strength in multi-step tool use and planning. Claude Code, a new CLI-based coding agent, extends this further, completing end-to-end dev tasks in a single pass and outperforming rivals in handling complex codebases, according to testing by Cursor, Cognition, Replit, and Canva. With deeply integrated GitHub tooling and superior design intuition, Claude 3.7 is not only the best Claude model—it’s arguably the most practically useful LLM for coding and real-world reasoning workflows available today.

Model Specifications

Technical details and capabilities of Claude 3.7 Sonnet

Core Specifications

200.0K / 128.0K

Input / Output tokens

September 30, 2024

Knowledge cutoff date

February 23, 2025

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

No

License

Proprietary

Resources

API Reference

https://docs.anthropic.com/en/docs/about-claude/models/all-models

Playground

https://claude.ai

Performance Insights

Check out how Claude 3.7 Sonnet handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

96.2

MATH-500

96.2

(96%)

93.2

IFEval

93.2

(93%)

86.1

MMMLU

86.1

(86%)

84.8

GPQA

84.8

(85%)

81.2

TAU-bench Retail

81.2

(81%)

80

AIME 2024

80

(80%)

75

MMMU

75

(75%)

70.3

SWE-bench Verified

70.3

(70%)

64.9

Aider Polyglot

64.9

(65%)

58.4

TAU-bench Airline

58.4

(58%)

49.5

AIME 2025

49.5

(50%)

8.9

Humanity's Last Exam

8.9

(9%)

MATH-500

IFEval

MMMLU

GPQA

TAU-bench Retail

AIME 2024

MMMU

SWE-bench Verified

Aider Polyglot

TAU-bench Airline

AIME 2025

Humanity's Last Exam

Model Comparison

See how Claude 3.7 Sonnet stacks up against other leading models across key performance metrics.

100

80

60

40

20

0

70.3

SWE-bench Verified - Claude 3.7 Sonnet

70.3

(70%)

63.8

SWE-bench Verified - Gemini Pro 2.5 Experimental

63.8

(64%)

49.3

SWE-bench Verified - o3-mini

49.3

(49%)

49.2

SWE-bench Verified - DeepSeek-R1

49.2

(49%)

38

SWE-bench Verified - GPT-4.5

38

(38%)

84.8

GPQA - Claude 3.7 Sonnet

84.8

(85%)

84

GPQA - Gemini Pro 2.5 Experimental

84

(84%)

79.7

GPQA - o3-mini

79.7

(80%)

71.5

GPQA - DeepSeek-R1

71.5

(72%)

71.4

GPQA - GPT-4.5

71.4

(71%)

80

AIME 2024 - Claude 3.7 Sonnet

80

(80%)

92

AIME 2024 - Gemini Pro 2.5 Experimental

92

(92%)

87.3

AIME 2024 - o3-mini

87.3

(87%)

79.8

AIME 2024 - DeepSeek-R1

79.8

(80%)

36.7

AIME 2024 - GPT-4.5

36.7

(37%)

64.9

Aider Polyglot - Claude 3.7 Sonnet

64.9

(65%)

74

Aider Polyglot - Gemini Pro 2.5 Experimental

74

(74%)

60.4

Aider Polyglot - o3-mini

60.4

(60%)

56.9

Aider Polyglot - DeepSeek-R1

56.9

(57%)

44.9

Aider Polyglot - GPT-4.5

44.9

(45%)

8.9

Humanity's Last Exam - Claude 3.7 Sonnet

8.9

(9%)

18.8

Humanity's Last Exam - Gemini Pro 2.5 Experimental

18.8

(19%)

14.0

Humanity's Last Exam - o3-mini

14.0

(14%)

8.6

Humanity's Last Exam - DeepSeek-R1

8.6

(9%)

6.4

Humanity's Last Exam - GPT-4.5

6.4

(6%)

SWE-bench Verified

GPQA

AIME 2024

Aider Polyglot

Humanity's Last Exam

Claude 3.7 Sonnet

Gemini Pro 2.5 Experimental

o3-mini

DeepSeek-R1

GPT-4.5

Detailed Benchmarks

Dive deeper into Claude 3.7 Sonnet's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

MATH-500

97.3%

Claude 3.7 Sonnet

96.2%

96.2%

QwQ-32B-Preview

90.6%

90.2%

90.0%

Current model

Other models

Avg (93.4%)

AIME 2024

96.7%

Gemini Pro 2.5 Experimental

92.0%

87.3%

86.0%

83.3%

Claude 3.7 Sonnet

80.0%

79.8%

77.5%

42.0%

13.4%

Current model

Other models

Avg (73.8%)

AIME 2025

93.0%

90.3%

Gemini Pro 2.5 Experimental

86.7%

86.5%

70.0%

Claude 3.7 Sonnet

49.5%

Current model

Other models

Avg (79.3%)

Coding

SWE-bench Verified

Claude 3.7 Sonnet

70.3%

Gemini Pro 2.5 Experimental

63.8%

49.3%

49.2%

Claude 3.5 Sonnet

49.0%

48.9%

42.0%

Claude 3.5 Haiku

40.6%

Current model

Other models

Avg (51.6%)

Aider Polyglot

Gemini Pro 2.5 Experimental

74.0%

Gemini Pro 2.5 Experimental

72.9%

Claude 3.7 Sonnet

64.9%

61.7%

60.4%

56.9%

44.9%

27.1%

20.9%

Current model

Other models

Avg (53.7%)

Knowledge

GPQA

87.7%

Claude 3.7 Sonnet

84.8%

84.6%

84.6%

Gemini Pro 2.5 Experimental

84.0%

79.7%

79.0%

78.0%

Current model

Other models

Avg (82.8%)

Non categorized

TAU-bench Retail

Claude 3.7 Sonnet

81.2%

73.5%

Claude 3.5 Sonnet

69.2%

Claude 3.5 Haiku

51.0%

Current model

Other models

Avg (68.7%)

TAU-bench Airline

Claude 3.7 Sonnet

58.4%

54.2%

Claude 3.5 Sonnet

46.0%

Claude 3.5 Haiku

22.8%

Current model

Other models

Avg (45.3%)

MMMLU

Claude 3.7 Sonnet

86.1%

85.1%

Current model

Other models

Avg (85.6%)

MMMU

Gemini Pro 2.5 Experimental

81.7%

78.0%

77.3%

Gemini 2.0 Flash Thinking

75.4%

Claude 3.7 Sonnet

75.0%

74.4%

Gemini 2.0 Flash

70.7%

QvQ-72B-Preview

70.3%

0.0%

Current model

Other models

Avg (67.0%)

IFEval

Claude 3.7 Sonnet

93.2%

92.1%

Llama 3.3 70B Instruct

92.1%

90.4%

90.0%

89.7%

Llama 3.1 405B Instruct

88.6%

Llama 3.1 70B Instruct

87.5%

Current model

Other models

Avg (90.5%)

Humanity's Last Exam

Gemini Pro 2.5 Experimental

18.8%

14.0%

Claude 3.7 Sonnet

8.9%

8.6%

6.4%

Current model

Other models

Avg (11.3%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Claude 3.7 Sonnet. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models