Gemini Pro 2.5 Experimental logo

Gemini Pro 2.5 Experimental

New
Released in the last 30 days

Google

Gemini 2.5 Pro Experimental, Google DeepMind’s newest "thinking model," isn’t just optimized for benchmarks—it's designed to tackle complex reasoning, sophisticated coding tasks, and nuanced multimodal interactions (text, audio, images, and video) at a scale previously unseen. Released March 25, 2025, it delivers impressive first-attempt accuracy ("pass@1") on challenging tasks: 86.7% on the notoriously difficult AIME 2025 math problems and 18.8% on Humanity’s Last Exam, an interdisciplinary reasoning test pushing AI to its cognitive limits. Notably, Gemini 2.5 Pro also debuted at #1 on LMArena, a human-preference leaderboard, by a substantial margin—meaning real users consistently find its responses more insightful, coherent, and helpful compared to competitors like GPT-4.5 or Claude 3.7. With its strong agentic coding capabilities, demonstrated by a 63.8% score on SWE-Bench Verified, and a massive 1-million-token context window (soon expanding to 2 million), Gemini 2.5 Pro isn't just performing better numerically—it's becoming the model people actually prefer using in practice.

Model Specifications

Technical details and capabilities of Gemini Pro 2.5 Experimental

Core Specifications

1.0M / 1.0M

Input / Output tokens

December 31, 2024

Knowledge cutoff date

March 24, 2025

This Week

Release date

Capabilities & License

Multimodal Support
Supported
Web Hydrated
No
License
Proprietary

Resources

API Reference
https://ai.google.dev/gemini-api/docs/models#gemini-2.5-pro-exp-03-25
Playground
https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-pro-exp-03-25

Performance Insights

Check out how Gemini Pro 2.5 Experimental handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
92
AIME 2024
92
(92%)
91.5
MRCR (128k)
91.5
(92%)
89.8
Global MMLU (Lite)
89.8
(90%)
86.7
AIME 2025
86.7
(87%)
84
GPQA
84
(84%)
83.1
MRCR
83.1
(83%)
81.7
MMMU
81.7
(82%)
74
Aider Polyglot
74
(74%)
72.9
Aider Polyglot
72.9
(73%)
70.4
LiveCodeBench
70.4
(70%)
69.4
Vibe-Eval
69.4
(69%)
68.6
Aider Polyglot (diff)
68.6
(69%)
63.8
SWE-bench Verified
63.8
(64%)
52.9
SimpleQA
52.9
(53%)
18.8
Humanity's Last Exam
18.8
(19%)
AIME 2024
MRCR (128k)
Global MMLU (Lite)
AIME 2025
GPQA
MRCR
MMMU
Aider Polyglot
Aider Polyglot
LiveCodeBench
Vibe-Eval
Aider Polyglot (diff)
SWE-bench Verified
SimpleQA
Humanity's Last Exam

Model Comparison

See how Gemini Pro 2.5 Experimental stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
84
GPQA - Gemini Pro 2.5 Experimental
84
(84%)
84.8
GPQA - Claude 3.7 Sonnet
84.8
(85%)
79.7
GPQA - o3-mini
79.7
(80%)
78
GPQA - o1
78
(78%)
71.5
GPQA - DeepSeek-R1
71.5
(72%)
71.4
GPQA - GPT-4.5
71.4
(71%)
92
AIME 2024 - Gemini Pro 2.5 Experimental
92
(92%)
80
AIME 2024 - Claude 3.7 Sonnet
80
(80%)
87.3
AIME 2024 - o3-mini
87.3
(87%)
83.3
AIME 2024 - o1
83.3
(83%)
79.8
AIME 2024 - DeepSeek-R1
79.8
(80%)
36.7
AIME 2024 - GPT-4.5
36.7
(37%)
74
Aider Polyglot - Gemini Pro 2.5 Experimental
74
(74%)
64.9
Aider Polyglot - Claude 3.7 Sonnet
64.9
(65%)
60.4
Aider Polyglot - o3-mini
60.4
(60%)
61.7
Aider Polyglot - o1
61.7
(62%)
56.9
Aider Polyglot - DeepSeek-R1
56.9
(57%)
44.9
Aider Polyglot - GPT-4.5
44.9
(45%)
63.8
SWE-bench Verified - Gemini Pro 2.5 Experimental
63.8
(64%)
70.3
SWE-bench Verified - Claude 3.7 Sonnet
70.3
(70%)
49.3
SWE-bench Verified - o3-mini
49.3
(49%)
48.9
SWE-bench Verified - o1
48.9
(49%)
49.2
SWE-bench Verified - DeepSeek-R1
49.2
(49%)
38
SWE-bench Verified - GPT-4.5
38
(38%)
GPQA
AIME 2024
Aider Polyglot
SWE-bench Verified
Gemini Pro 2.5 Experimental
Claude 3.7 Sonnet
o3-mini
o1
DeepSeek-R1
GPT-4.5

Detailed Benchmarks

Dive deeper into Gemini Pro 2.5 Experimental's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

AIME 2025

Current model
Other models
Avg (79.3%)

AIME 2024

Current model
Other models
Avg (80.8%)

Coding

LiveCodeBench

Current model
Other models
Avg (63.2%)

Aider Polyglot

Current model
Other models
Avg (57.9%)

SWE-bench Verified

Current model
Other models
Avg (51.6%)

Knowledge

GPQA

Current model
Other models
Avg (76.4%)

Non categorized

Humanity's Last Exam

Current model
Other models
Avg (11.3%)

SimpleQA

Current model
Other models
Avg (40.4%)

MMMU

Current model
Other models
Avg (75.4%)

Vibe-Eval

Current model
Other models
Avg (53.9%)

MRCR

Current model
Other models
Avg (63.8%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Gemini Pro 2.5 Experimental. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models