GPT-3.5 Turbo

OpenAI

The newest version of the GPT-3.5 Turbo model offers improved accuracy in adhering to specified formatting requests. Additionally, a bug that previously caused text encoding problems in non-English function calls has been resolved.

Model Specifications

Technical details and capabilities of GPT-3.5 Turbo

Core Specifications

16.4K / 4.1K

Input / Output tokens

September 29, 2021

Knowledge cutoff date

March 20, 2023

Release date

Capabilities & License

Multimodal Support

Not Supported

Web Hydrated

No

License

Proprietary

Resources

API Reference

https://platform.openai.com/docs/models/gpt-3-5-turbo

Playground

https://platform.openai.com/playground

Performance Insights

Check out how GPT-3.5 Turbo handles various AI tasks through comprehensive benchmark results.

80

60

40

20

0

70.2

DROP

70.2

(88%)

69.8

MMLU

69.8

(87%)

68

HumanEval

68

(85%)

56.3

MGSM

56.3

(70%)

43.1

MATH

43.1

(54%)

30.8

GPQA

30.8

(39%)

0

MMMU

0

(0%)

0

MathVista

0

(0%)

DROP

MMLU

HumanEval

MGSM

MATH

GPQA

MMMU

MathVista

Model Comparison

See how GPT-3.5 Turbo stacks up against other leading models across key performance metrics.

90

72

54

36

18

0

69.8

MMLU - GPT-3.5 Turbo

69.8

(78%)

69

MMLU - Phi-3.5-mini-instruct

69

(77%)

78.9

MMLU - Phi-3.5-MoE-instruct

78.9

(88%)

75.2

MMLU - Claude 3 Haiku

75.2

(84%)

86.4

MMLU - GPT-4

86.4

(96%)

79

MMLU - Claude 3 Sonnet

79

(88%)

30.8

GPQA - GPT-3.5 Turbo

30.8

(34%)

30.4

GPQA - Phi-3.5-mini-instruct

30.4

(34%)

36.8

GPQA - Phi-3.5-MoE-instruct

36.8

(41%)

33.3

GPQA - Claude 3 Haiku

33.3

(37%)

35.7

GPQA - GPT-4

35.7

(40%)

40.4

GPQA - Claude 3 Sonnet

40.4

(45%)

56.3

MGSM - GPT-3.5 Turbo

56.3

(63%)

47.9

MGSM - Phi-3.5-mini-instruct

47.9

(53%)

58.7

MGSM - Phi-3.5-MoE-instruct

58.7

(65%)

75.1

MGSM - Claude 3 Haiku

75.1

(83%)

74.5

MGSM - GPT-4

74.5

(83%)

83.5

MGSM - Claude 3 Sonnet

83.5

(93%)

43.1

MATH - GPT-3.5 Turbo

43.1

(48%)

48.5

MATH - Phi-3.5-mini-instruct

48.5

(54%)

59.5

MATH - Phi-3.5-MoE-instruct

59.5

(66%)

38.9

MATH - Claude 3 Haiku

38.9

(43%)

42

MATH - GPT-4

42

(47%)

43.1

MATH - Claude 3 Sonnet

43.1

(48%)

68

HumanEval - GPT-3.5 Turbo

68

(76%)

62.8

HumanEval - Phi-3.5-mini-instruct

62.8

(70%)

70.7

HumanEval - Phi-3.5-MoE-instruct

70.7

(79%)

75.9

HumanEval - Claude 3 Haiku

75.9

(84%)

67

HumanEval - GPT-4

67

(74%)

73

HumanEval - Claude 3 Sonnet

73

(81%)

MMLU

GPQA

MGSM

MATH

HumanEval

GPT-3.5 Turbo

Phi-3.5-mini-instruct

Phi-3.5-MoE-instruct

Claude 3 Haiku

GPT-4

Claude 3 Sonnet

Detailed Benchmarks

Dive deeper into GPT-3.5 Turbo's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

Claude 3.5 Sonnet

93.7%

Claude 3 Sonnet

73.0%

Llama 3.1 8B Instruct

72.6%

72.0%

Phi-3.5-MoE-instruct

70.7%

68.0%

67.0%

Phi-3.5-mini-instruct

62.8%

51.8%

Ministral 8B Instruct

34.8%

Current model

Other models

Avg (66.6%)

Reasoning

DROP

79.3%

Claude 3 Sonnet

78.9%

78.4%

75.5%

74.9%

74.3%

70.2%

Llama 3.1 8B Instruct

59.5%

Current model

Other models

Avg (73.9%)

Knowledge

MMLU

91.8%

Llama 3.2 11B Instruct

73.0%

71.8%

71.3%

Qwen2 7B Instruct

70.5%

69.8%

69.7%

Llama 3.1 8B Instruct

69.4%

69.2%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (72.0%)

GPQA

87.7%

33.3%

Llama 3.2 11B Instruct

32.8%

Llama 3.2 3B Instruct

32.8%

32.3%

30.8%

Llama 3.1 8B Instruct

30.4%

Phi-3.5-mini-instruct

30.4%

27.9%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (36.4%)

MATH

97.9%

Phi-3.5-mini-instruct

48.5%

Llama 3.2 3B Instruct

48.0%

Qwen2.5-Coder 7B Instruct

46.6%

Claude 3 Sonnet

43.1%

43.1%

42.3%

42.0%

38.9%

32.6%

Current model

Other models

Avg (48.3%)

Non categorized

MGSM

80.6%

75.1%

74.5%

Llama 3.2 11B Instruct

68.9%

Phi-3.5-MoE-instruct

58.7%

Llama 3.2 3B Instruct

58.2%

56.3%

Phi-3.5-mini-instruct

47.9%

Current model

Other models

Avg (65.0%)

MMMU

Gemini 1.5 Flash 8B

53.7%

53.6%

53.6%

52.5%

Llama 3.2 11B Instruct

50.7%

47.9%

Phi-3.5-vision-instruct

43.0%

0.0%

Current model

Other models

Avg (44.4%)

MathVista

Llama 3.2 90B Instruct

57.3%

56.7%

Gemini 1.5 Flash 8B

54.7%

52.8%

Llama 3.2 11B Instruct

51.5%

46.6%

Phi-3.5-vision-instruct

43.9%

0.0%

Current model

Other models

Avg (45.4%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for GPT-3.5 Turbo. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models