Llama 3.1 8B Instruct logo

Llama 3.1 8B Instruct

Meta Llama

Llama 3.1 8B Instruct is a cutting-edge, multilingual large language model designed for engaging conversations. With an extensive 128K context window, this model excels at complex reasoning and incorporates state-of-the-art tools for enhanced functionality.

Model Specifications

Technical details and capabilities of Llama 3.1 8B Instruct

Core Specifications

8.0B Parameters

Model size and complexity

15000.0B Training Tokens

Amount of data used in training

131.1K / 131.1K

Input / Output tokens

December 30, 2023

Knowledge cutoff date

July 22, 2024

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
Llama 3.1 Community License

Resources

API Reference
https://www.llama.com/
Code Repository
https://github.com/meta-llama/llama-models

Performance Insights

Check out how Llama 3.1 8B Instruct handles various AI tasks through comprehensive benchmark results.

90
68
45
23
0
84.5
GSM-8K (CoT)
84.5
(94%)
83.4
ARC Challenge
83.4
(93%)
82.6
API-Bank
82.6
(92%)
80.4
IFEval
80.4
(89%)
76.1
BFCL
76.1
(85%)
73
MMLU (CoT)
73
(81%)
72.8
MBPP EvalPlus (base)
72.8
(81%)
72.6
HumanEval
72.6
(81%)
69.4
MMLU
69.4
(77%)
68.9
Multilingual MGSM (CoT)
68.9
(77%)
60
CRAG
60
(67%)
59.5
DROP
59.5
(66%)
52.4
Multipl-E MBPP
52.4
(58%)
51.9
MATH (CoT)
51.9
(58%)
50.8
Multipl-E HumanEval
50.8
(56%)
48.3
MMLU-Pro
48.3
(54%)
38.5
Nexus
38.5
(43%)
30.4
GPQA
30.4
(34%)
29.2
HELMET LongQA
29.2
(32%)
28.4
FinanceBench
28.4
(32%)
28.2
Arena Hard
28.2
(31%)
17.7
LongBench
17.7
(20%)
8.2
Gorilla Benchmark API Bench
8.2
(9%)
GSM-8K (CoT)
ARC Challenge
API-Bank
IFEval
BFCL
MMLU (CoT)
MBPP EvalPlus (base)
HumanEval
MMLU
Multilingual MGSM (CoT)
CRAG
DROP
Multipl-E MBPP
MATH (CoT)
Multipl-E HumanEval
MMLU-Pro
Nexus
GPQA
HELMET LongQA
FinanceBench
Arena Hard
LongBench
Gorilla Benchmark API Bench

Model Comparison

See how Llama 3.1 8B Instruct stacks up against other leading models across key performance metrics.

90
72
54
36
18
0
69.4
MMLU - Llama 3.1 8B Instruct
69.4
(77%)
79
MMLU - Claude 3 Sonnet
79
(88%)
83.6
MMLU - Llama 3.1 70B Instruct
83.6
(93%)
84.8
MMLU - Phi-4
84.8
(94%)
86.8
MMLU - Claude 3 Opus
86.8
(96%)
85.9
MMLU - Gemini 1.5 Pro
85.9
(95%)
48.3
MMLU-Pro - Llama 3.1 8B Instruct
48.3
(54%)
56.8
MMLU-Pro - Claude 3 Sonnet
56.8
(63%)
66.4
MMLU-Pro - Llama 3.1 70B Instruct
66.4
(74%)
70.4
MMLU-Pro - Phi-4
70.4
(78%)
68.5
MMLU-Pro - Claude 3 Opus
68.5
(76%)
75.8
MMLU-Pro - Gemini 1.5 Pro
75.8
(84%)
30.4
GPQA - Llama 3.1 8B Instruct
30.4
(34%)
40.4
GPQA - Claude 3 Sonnet
40.4
(45%)
41.7
GPQA - Llama 3.1 70B Instruct
41.7
(46%)
56.1
GPQA - Phi-4
56.1
(62%)
50.4
GPQA - Claude 3 Opus
50.4
(56%)
59.1
GPQA - Gemini 1.5 Pro
59.1
(66%)
72.6
HumanEval - Llama 3.1 8B Instruct
72.6
(81%)
73
HumanEval - Claude 3 Sonnet
73
(81%)
80.5
HumanEval - Llama 3.1 70B Instruct
80.5
(89%)
82.6
HumanEval - Phi-4
82.6
(92%)
84.9
HumanEval - Claude 3 Opus
84.9
(94%)
84.1
HumanEval - Gemini 1.5 Pro
84.1
(93%)
59.5
DROP - Llama 3.1 8B Instruct
59.5
(66%)
78.9
DROP - Claude 3 Sonnet
78.9
(88%)
79.6
DROP - Llama 3.1 70B Instruct
79.6
(88%)
75.5
DROP - Phi-4
75.5
(84%)
83.1
DROP - Claude 3 Opus
83.1
(92%)
74.9
DROP - Gemini 1.5 Pro
74.9
(83%)
MMLU
MMLU-Pro
GPQA
HumanEval
DROP
Llama 3.1 8B Instruct
Claude 3 Sonnet
Llama 3.1 70B Instruct
Phi-4
Claude 3 Opus
Gemini 1.5 Pro

Detailed Benchmarks

Dive deeper into Llama 3.1 8B Instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Coding

Reasoning

DROP

Current model
Other models
Avg (73.9%)

Non categorized

MMLU (CoT)

Current model
Other models
Avg (82.5%)

IFEval

Current model
Other models
Avg (78.1%)

Multipl-E HumanEval

Current model
Other models
Avg (63.8%)

Multipl-E MBPP

Current model
Other models
Avg (60.0%)

GSM-8K (CoT)

Current model
Other models
Avg (89.8%)

MATH (CoT)

Current model
Other models
Avg (60.0%)

API-Bank

Current model
Other models
Avg (88.2%)

BFCL

Current model
Other models
Avg (71.8%)

Gorilla Benchmark API Bench

Current model
Other models
Avg (24.4%)

Nexus

Current model
Other models
Avg (47.0%)

Multilingual MGSM (CoT)

Current model
Other models
Avg (77.9%)

Arena Hard

Current model
Other models
Avg (45.6%)

CRAG

Current model
Other models
Avg (59.8%)

FinanceBench

Current model
Other models
Avg (31.8%)

HELMET LongQA

Current model
Other models
Avg (41.4%)

LongBench

Current model
Other models
Avg (25.9%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Llama 3.1 8B Instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models