Gemini Pro 2.5 Experimental

New

Released in the last 30 days

Google

Gemini 2.5 Pro Experimental, Google DeepMind’s newest "thinking model," isn’t just optimized for benchmarks—it's designed to tackle complex reasoning, sophisticated coding tasks, and nuanced multimodal interactions (text, audio, images, and video) at a scale previously unseen. Released March 25, 2025, it delivers impressive first-attempt accuracy ("pass@1") on challenging tasks: 86.7% on the notoriously difficult AIME 2025 math problems and 18.8% on Humanity’s Last Exam, an interdisciplinary reasoning test pushing AI to its cognitive limits. Notably, Gemini 2.5 Pro also debuted at #1 on LMArena, a human-preference leaderboard, by a substantial margin—meaning real users consistently find its responses more insightful, coherent, and helpful compared to competitors like GPT-4.5 or Claude 3.7. With its strong agentic coding capabilities, demonstrated by a 63.8% score on SWE-Bench Verified, and a massive 1-million-token context window (soon expanding to 2 million), Gemini 2.5 Pro isn't just performing better numerically—it's becoming the model people actually prefer using in practice.

Model Specifications

Technical details and capabilities of Gemini Pro 2.5 Experimental

Core Specifications

1.0M / 1.0M

Input / Output tokens

December 31, 2024

Knowledge cutoff date

March 24, 2025

This Week

Release date

Capabilities & License

Multimodal Support

Supported

Web Hydrated

No

License

Proprietary

Resources

API Reference

https://ai.google.dev/gemini-api/docs/models#gemini-2.5-pro-exp-03-25

Playground

https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-pro-exp-03-25

Performance Insights

Check out how Gemini Pro 2.5 Experimental handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

92

AIME 2024

92

(92%)

91.5

MRCR (128k)

91.5

(92%)

89.8

Global MMLU (Lite)

89.8

(90%)

86.7

AIME 2025

86.7

(87%)

84

GPQA

84

(84%)

83.1

MRCR

83.1

(83%)

81.7

MMMU

81.7

(82%)

74

Aider Polyglot

74

(74%)

72.9

Aider Polyglot

72.9

(73%)

70.4

LiveCodeBench

70.4

(70%)

69.4

Vibe-Eval

69.4

(69%)

68.6

Aider Polyglot (diff)

68.6

(69%)

63.8

SWE-bench Verified

63.8

(64%)

52.9

SimpleQA

52.9

(53%)

18.8

Humanity's Last Exam

18.8

(19%)

AIME 2024

MRCR (128k)

Global MMLU (Lite)

AIME 2025

GPQA

MRCR

MMMU

Aider Polyglot

Aider Polyglot

LiveCodeBench

Vibe-Eval

Aider Polyglot (diff)

SWE-bench Verified

SimpleQA

Humanity's Last Exam

Model Comparison

See how Gemini Pro 2.5 Experimental stacks up against other leading models across key performance metrics.

100

80

60

40

20

0

84

GPQA - Gemini Pro 2.5 Experimental

84

(84%)

84.8

GPQA - Claude 3.7 Sonnet

84.8

(85%)

79.7

GPQA - o3-mini

79.7

(80%)

78

GPQA - o1

78

(78%)

71.5

GPQA - DeepSeek-R1

71.5

(72%)

71.4

GPQA - GPT-4.5

71.4

(71%)

92

AIME 2024 - Gemini Pro 2.5 Experimental

92

(92%)

80

AIME 2024 - Claude 3.7 Sonnet

80

(80%)

87.3

AIME 2024 - o3-mini

87.3

(87%)

83.3

AIME 2024 - o1

83.3

(83%)

79.8

AIME 2024 - DeepSeek-R1

79.8

(80%)

36.7

AIME 2024 - GPT-4.5

36.7

(37%)

74

Aider Polyglot - Gemini Pro 2.5 Experimental

74

(74%)

64.9

Aider Polyglot - Claude 3.7 Sonnet

64.9

(65%)

60.4

Aider Polyglot - o3-mini

60.4

(60%)

61.7

Aider Polyglot - o1

61.7

(62%)

56.9

Aider Polyglot - DeepSeek-R1

56.9

(57%)

44.9

Aider Polyglot - GPT-4.5

44.9

(45%)

63.8

SWE-bench Verified - Gemini Pro 2.5 Experimental

63.8

(64%)

70.3

SWE-bench Verified - Claude 3.7 Sonnet

70.3

(70%)

49.3

SWE-bench Verified - o3-mini

49.3

(49%)

48.9

SWE-bench Verified - o1

48.9

(49%)

49.2

SWE-bench Verified - DeepSeek-R1

49.2

(49%)

38

SWE-bench Verified - GPT-4.5

38

(38%)

GPQA

AIME 2024

Aider Polyglot

SWE-bench Verified

Gemini Pro 2.5 Experimental

Claude 3.7 Sonnet

o3-mini

o1

DeepSeek-R1

GPT-4.5

Detailed Benchmarks

Dive deeper into Gemini Pro 2.5 Experimental's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2025

93.0%

90.3%

Gemini Pro 2.5 Experimental

86.7%

86.5%

70.0%

Claude 3.7 Sonnet

49.5%

Current model

Other models

Avg (79.3%)

AIME 2024

96.7%

95.8%

93.0%

Gemini Pro 2.5 Experimental

92.0%

87.3%

86.0%

83.3%

Claude 3.7 Sonnet

80.0%

13.4%

Current model

Other models

Avg (80.8%)

Coding

LiveCodeBench

80.0%

79.0%

74.1%

Gemini Pro 2.5 Experimental

70.4%

65.9%

63.4%

62.5%

Qwen2.5 72B Instruct

55.5%

Qwen2.5-Coder 7B Instruct

18.2%

Current model

Other models

Avg (63.2%)

Aider Polyglot

Gemini Pro 2.5 Experimental

74.0%

Gemini Pro 2.5 Experimental

72.9%

Claude 3.7 Sonnet

64.9%

61.7%

60.4%

56.9%

44.9%

27.1%

Current model

Other models

Avg (57.9%)

SWE-bench Verified

Claude 3.7 Sonnet

70.3%

Gemini Pro 2.5 Experimental

63.8%

49.3%

49.2%

Claude 3.5 Sonnet

49.0%

48.9%

42.0%

Claude 3.5 Haiku

40.6%

Current model

Other models

Avg (51.6%)

Knowledge

GPQA

87.7%

Claude 3.7 Sonnet

84.8%

84.6%

84.6%

Gemini Pro 2.5 Experimental

84.0%

79.7%

79.0%

78.0%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (76.4%)

Non categorized

Humanity's Last Exam

Gemini Pro 2.5 Experimental

18.8%

14.0%

Claude 3.7 Sonnet

8.9%

8.6%

6.4%

Current model

Other models

Avg (11.3%)

SimpleQA

62.5%

61.8%

Gemini Pro 2.5 Experimental

52.9%

43.6%

42.6%

42.4%

30.1%

24.9%

3.0%

Current model

Other models

Avg (40.4%)

MMMU

Gemini Pro 2.5 Experimental

81.7%

78.0%

77.3%

Gemini 2.0 Flash Thinking

75.4%

Claude 3.7 Sonnet

75.0%

74.4%

Gemini 2.0 Flash

70.7%

QvQ-72B-Preview

70.3%

Current model

Other models

Avg (75.4%)

Vibe-Eval

Gemini Pro 2.5 Experimental

69.4%

Gemini 2.0 Flash

56.3%

53.9%

Gemini 1.5 Flash

48.9%

Gemini 1.5 Flash 8B

40.9%

Current model

Other models

Avg (53.9%)

MRCR

Gemini Pro 2.5 Experimental

83.1%

82.6%

Gemini 1.5 Flash

71.9%

Gemini 2.0 Flash

69.2%

Gemini 1.5 Flash 8B

54.7%

48.8%

36.3%

Current model

Other models

Avg (63.8%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Gemini Pro 2.5 Experimental. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models