Llama 3.1 70B Instruct logo

Llama 3.1 70B Instruct

Meta Llama

Llama 3.1 70B Instruct is a state-of-the-art large language model expertly designed for engaging in multilingual conversations. Its performance on standard industry evaluations surpasses numerous other publicly available and proprietary chat models.

Model Specifications

Technical details and capabilities of Llama 3.1 70B Instruct

Core Specifications

70.0B Parameters

Model size and complexity

15000.0B Training Tokens

Amount of data used in training

128.0K / 128.0K

Input / Output tokens

November 30, 2023

Knowledge cutoff date

July 22, 2024

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
Llama 3.1 Community License

Resources

Research Paper
https://ai.meta.com/research/publications/llama-3-open-foundation-and-fine-tuned-chat-models/
API Reference
https://ai.meta.com/llama/
Code Repository
https://github.com/meta-llama/llama-models

Performance Insights

Check out how Llama 3.1 70B Instruct handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
95.1
GSM-8K (CoT)
95.1
(95%)
94.8
ARC Challenge
94.8
(95%)
90
API-Bank
90
(90%)
87.5
IFEval
87.5
(88%)
86.9
Multilingual MGSM (CoT)
86.9
(87%)
86
MMLU (CoT)
86
(86%)
86
MBPP ++ base version
86
(86%)
84.8
BFCL
84.8
(85%)
83.6
MMLU
83.6
(84%)
80.5
HumanEval
80.5
(81%)
79.6
DROP
79.6
(80%)
68
MATH (CoT)
68
(68%)
66.4
MMLU-Pro
66.4
(66%)
65.5
Multipl-E HumanEval
65.5
(66%)
62
Multipl-E MBPP
62
(62%)
56.7
Nexus
56.7
(57%)
41.7
GPQA
41.7
(42%)
29.7
Gorilla Benchmark API Bench
29.7
(30%)
GSM-8K (CoT)
ARC Challenge
API-Bank
IFEval
Multilingual MGSM (CoT)
MMLU (CoT)
MBPP ++ base version
BFCL
MMLU
HumanEval
DROP
MATH (CoT)
MMLU-Pro
Multipl-E HumanEval
Multipl-E MBPP
Nexus
GPQA
Gorilla Benchmark API Bench

Detailed Benchmarks

Dive deeper into Llama 3.1 70B Instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

HumanEval

Current model
Other models
Avg (76.2%)

Reasoning

DROP

Current model
Other models
Avg (79.2%)

Knowledge

MMLU

Current model
Other models
Avg (82.4%)

Non categorized

MMLU (CoT)

Current model
Other models
Avg (82.5%)

IFEval

Current model
Other models
Avg (86.0%)

Multipl-E HumanEval

Current model
Other models
Avg (63.8%)

Multipl-E MBPP

Current model
Other models
Avg (60.0%)

GSM-8K (CoT)

Current model
Other models
Avg (89.8%)

MATH (CoT)

Current model
Other models
Avg (60.0%)

API-Bank

Current model
Other models
Avg (88.2%)

BFCL

Current model
Other models
Avg (73.7%)

Gorilla Benchmark API Bench

Current model
Other models
Avg (24.4%)

Nexus

Current model
Other models
Avg (47.0%)

Multilingual MGSM (CoT)

Current model
Other models
Avg (77.9%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Llama 3.1 70B Instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models