Llama 3.1 70B Instruct

Meta Llama

Llama 3.1 70B Instruct is a state-of-the-art large language model expertly designed for engaging in multilingual conversations. Its performance on standard industry evaluations surpasses numerous other publicly available and proprietary chat models.

Model Specifications

Technical details and capabilities of Llama 3.1 70B Instruct

Core Specifications

70.0B Parameters

Model size and complexity

15000.0B Training Tokens

Amount of data used in training

128.0K / 128.0K

Input / Output tokens

November 30, 2023

Knowledge cutoff date

July 22, 2024

Release date

Capabilities & License

Multimodal Support

Not Supported

Web Hydrated

No

License

Llama 3.1 Community License

Resources

Research Paper

https://ai.meta.com/research/publications/llama-3-open-foundation-and-fine-tuned-chat-models/

API Reference

https://ai.meta.com/llama/

Code Repository

https://github.com/meta-llama/llama-models

Performance Insights

Check out how Llama 3.1 70B Instruct handles various AI tasks through comprehensive benchmark results.

100

75

50

25

0

95.1

GSM-8K (CoT)

95.1

(95%)

94.8

ARC Challenge

94.8

(95%)

90

API-Bank

90

(90%)

87.5

IFEval

87.5

(88%)

86.9

Multilingual MGSM (CoT)

86.9

(87%)

86

MMLU (CoT)

86

(86%)

86

MBPP ++ base version

86

(86%)

84.8

BFCL

84.8

(85%)

83.6

MMLU

83.6

(84%)

80.5

HumanEval

80.5

(81%)

79.6

DROP

79.6

(80%)

68

MATH (CoT)

68

(68%)

66.4

MMLU-Pro

66.4

(66%)

65.5

Multipl-E HumanEval

65.5

(66%)

62

Multipl-E MBPP

62

(62%)

56.7

Nexus

56.7

(57%)

41.7

GPQA

41.7

(42%)

29.7

Gorilla Benchmark API Bench

29.7

(30%)

GSM-8K (CoT)

ARC Challenge

API-Bank

IFEval

Multilingual MGSM (CoT)

MMLU (CoT)

MBPP ++ base version

BFCL

MMLU

HumanEval

DROP

MATH (CoT)

MMLU-Pro

Multipl-E HumanEval

Multipl-E MBPP

Nexus

GPQA

Gorilla Benchmark API Bench

Detailed Benchmarks

Dive deeper into Llama 3.1 70B Instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

Claude 3.5 Sonnet

93.7%

Qwen2.5 14B Instruct

83.5%

82.6%

81.1%

81.1%

Llama 3.1 70B Instruct

80.5%

75.9%

Gemini 1.5 Flash

74.3%

74.1%

Ministral 8B Instruct

34.8%

Current model

Other models

Avg (76.2%)

Reasoning

DROP

92.2%

83.1%

80.9%

80.2%

79.7%

Llama 3.1 70B Instruct

79.6%

79.3%

Claude 3 Sonnet

78.9%

78.4%

Llama 3.1 8B Instruct

59.5%

Current model

Other models

Avg (79.2%)

Knowledge

MMLU

91.8%

85.2%

84.8%

Mistral Large 2

84.0%

84.0%

Llama 3.1 70B Instruct

83.6%

Qwen2.5 32B Instruct

83.3%

Qwen2 72B Instruct

82.3%

82.0%

Llama 3.2 3B Instruct

63.4%

Current model

Other models

Avg (82.4%)

GPQA

87.7%

Qwen2.5 14B Instruct

45.5%

Mistral Small 3

45.3%

Qwen2 72B Instruct

42.4%

42.0%

Llama 3.1 70B Instruct

41.7%

Claude 3.5 Haiku

41.6%

Claude 3 Sonnet

40.4%

40.2%

Qwen2 7B Instruct

25.3%

Current model

Other models

Avg (45.2%)

Non categorized

MMLU (CoT)

Llama 3.1 405B Instruct

88.6%

Llama 3.1 70B Instruct

86.0%

Llama 3.1 8B Instruct

73.0%

Current model

Other models

Avg (82.5%)

MMLU-Pro

84.0%

Qwen2.5 32B Instruct

69.0%

Llama 3.3 70B Instruct

68.9%

68.5%

Gemini 1.5 Flash

67.3%

Llama 3.1 70B Instruct

66.4%

Mistral Small 3

66.3%

Claude 3.5 Haiku

65.0%

Qwen2 72B Instruct

64.4%

Qwen2.5-Coder 7B Instruct

40.1%

Current model

Other models

Avg (66.0%)

IFEval

Claude 3.7 Sonnet

93.2%

90.4%

90.0%

89.7%

Llama 3.1 405B Instruct

88.6%

Llama 3.1 70B Instruct

87.5%

87.2%

86.1%

85.6%

61.3%

Current model

Other models

Avg (86.0%)

ARC Challenge

Llama 3.1 405B Instruct

96.9%

96.4%

Llama 3.1 70B Instruct

94.8%

Claude 3 Sonnet

93.2%

Jamba 1.5 Large

93.0%

Phi-3.5-MoE-instruct

91.0%

89.2%

85.7%

Qwen2 72B Instruct

68.9%

Current model

Other models

Avg (89.9%)

Multipl-E HumanEval

Llama 3.1 405B Instruct

75.2%

Llama 3.1 70B Instruct

65.5%

Llama 3.1 8B Instruct

50.8%

Current model

Other models

Avg (63.8%)

Multipl-E MBPP

Llama 3.1 405B Instruct

65.7%

Llama 3.1 70B Instruct

62.0%

Llama 3.1 8B Instruct

52.4%

Current model

Other models

Avg (60.0%)

GSM-8K (CoT)

Llama 3.1 70B Instruct

95.1%

Llama 3.1 8B Instruct

84.5%

Current model

Other models

Avg (89.8%)

MATH (CoT)

Llama 3.1 70B Instruct

68.0%

Llama 3.1 8B Instruct

51.9%

Current model

Other models

Avg (60.0%)

API-Bank

Llama 3.1 405B Instruct

92.0%

Llama 3.1 70B Instruct

90.0%

Llama 3.1 8B Instruct

82.6%

Current model

Other models

Avg (88.2%)

BFCL

Llama 3.1 405B Instruct

88.5%

Llama 3.1 70B Instruct

84.8%

Llama 3.1 8B Instruct

76.1%

74.0%

68.4%

66.6%

66.4%

65.0%

Current model

Other models

Avg (73.7%)

Gorilla Benchmark API Bench

Llama 3.1 405B Instruct

35.3%

Llama 3.1 70B Instruct

29.7%

Llama 3.1 8B Instruct

8.2%

Current model

Other models

Avg (24.4%)

Nexus

Llama 3.1 405B Instruct

58.7%

Llama 3.1 70B Instruct

56.7%

Llama 3.1 8B Instruct

38.5%

Llama 3.2 3B Instruct

34.3%

Current model

Other models

Avg (47.0%)

Multilingual MGSM (CoT)

Llama 3.1 70B Instruct

86.9%

Llama 3.1 8B Instruct

68.9%

Current model

Other models

Avg (77.9%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Llama 3.1 70B Instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI

Anthropic

Google

Mistral AI

Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models