Llama 3.1 405B Instruct logo

Llama 3.1 405B Instruct

Meta Llama

Llama 3.1 405B Instruct is a powerful language model designed for engaging in multilingual conversations. It excels in multiple languages and surpasses the performance of numerous open-source and proprietary chat models on standard industry evaluations. With support for 8 languages and an extensive 128K token context window, it's built for complex and nuanced dialogue.

Model Specifications

Technical details and capabilities of Llama 3.1 405B Instruct

Core Specifications

405.0B Parameters

Model size and complexity

15000.0B Training Tokens

Amount of data used in training

128.0K / 128.0K

Input / Output tokens

November 30, 2023

Knowledge cutoff date

July 22, 2024

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
Llama 3.1 Community License

Resources

API Reference
https://github.com/meta-llama/llama-models
Playground
https://llama.meta.com/llama-downloads
Code Repository
https://github.com/meta-llama/llama-models

Performance Insights

Check out how Llama 3.1 405B Instruct handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
96.9
ARC Challenge
96.9
(97%)
96.8
GSM8K
96.8
(97%)
92
API-Bank
92
(92%)
91.6
Multilingual MGSM
91.6
(92%)
89
HumanEval
89
(89%)
88.6
MMLU (CoT)
88.6
(89%)
88.6
IFEval
88.6
(89%)
88.6
MBPP EvalPlus
88.6
(89%)
88.5
BFCL
88.5
(89%)
87.3
MMLU
87.3
(87%)
84.8
DROP
84.8
(85%)
75.2
Multipl-E HumanEval
75.2
(75%)
73.8
MATH
73.8
(74%)
73.3
MMLU-Pro
73.3
(73%)
65.7
Multipl-E MBPP
65.7
(66%)
58.7
Nexus
58.7
(59%)
50.7
GPQA
50.7
(51%)
35.3
Gorilla Benchmark API Bench
35.3
(35%)
ARC Challenge
GSM8K
API-Bank
Multilingual MGSM
HumanEval
MMLU (CoT)
IFEval
MBPP EvalPlus
BFCL
MMLU
DROP
Multipl-E HumanEval
MATH
MMLU-Pro
Multipl-E MBPP
Nexus
GPQA
Gorilla Benchmark API Bench

Model Comparison

See how Llama 3.1 405B Instruct stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
87.3
MMLU - Llama 3.1 405B Instruct
87.3
(87%)
86.2
MMLU - Grok-2 mini
86.2
(86%)
88.7
MMLU - GPT-4o
88.7
(89%)
86
MMLU - Llama 3.3 70B Instruct
86
(86%)
87.5
MMLU - Grok-2
87.5
(88%)
90.4
MMLU - Claude 3.5 Sonnet
90.4
(90%)
73.3
MMLU-Pro - Llama 3.1 405B Instruct
73.3
(73%)
72
MMLU-Pro - Grok-2 mini
72
(72%)
72.6
MMLU-Pro - GPT-4o
72.6
(73%)
68.9
MMLU-Pro - Llama 3.3 70B Instruct
68.9
(69%)
75.5
MMLU-Pro - Grok-2
75.5
(76%)
76.1
MMLU-Pro - Claude 3.5 Sonnet
76.1
(76%)
50.7
GPQA - Llama 3.1 405B Instruct
50.7
(51%)
51
GPQA - Grok-2 mini
51
(51%)
53.6
GPQA - GPT-4o
53.6
(54%)
50.5
GPQA - Llama 3.3 70B Instruct
50.5
(51%)
56.0
GPQA - Grok-2
56.0
(56%)
59.4
GPQA - Claude 3.5 Sonnet
59.4
(59%)
89
HumanEval - Llama 3.1 405B Instruct
89
(89%)
85.7
HumanEval - Grok-2 mini
85.7
(86%)
90.2
HumanEval - GPT-4o
90.2
(90%)
88.4
HumanEval - Llama 3.3 70B Instruct
88.4
(88%)
88.4
HumanEval - Grok-2
88.4
(88%)
92
HumanEval - Claude 3.5 Sonnet
92
(92%)
73.8
MATH - Llama 3.1 405B Instruct
73.8
(74%)
73
MATH - Grok-2 mini
73
(73%)
76.6
MATH - GPT-4o
76.6
(77%)
77
MATH - Llama 3.3 70B Instruct
77
(77%)
76.1
MATH - Grok-2
76.1
(76%)
71.1
MATH - Claude 3.5 Sonnet
71.1
(71%)
MMLU
MMLU-Pro
GPQA
HumanEval
MATH
Llama 3.1 405B Instruct
Grok-2 mini
GPT-4o
Llama 3.3 70B Instruct
Grok-2
Claude 3.5 Sonnet

Detailed Benchmarks

Dive deeper into Llama 3.1 405B Instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Math

GSM8K

Current model
Other models
Avg (96.2%)

Reasoning

DROP

Current model
Other models
Avg (83.2%)

Knowledge

MMLU

Current model
Other models
Avg (85.4%)

GPQA

Current model
Other models
Avg (52.3%)

MATH

Current model
Other models
Avg (72.6%)

Non categorized

MMLU (CoT)

Current model
Other models
Avg (82.5%)

MMLU-Pro

Current model
Other models
Avg (71.5%)

IFEval

Current model
Other models
Avg (86.6%)

ARC Challenge

Current model
Other models
Avg (92.5%)

MBPP EvalPlus

Current model
Other models
Avg (88.1%)

Multipl-E HumanEval

Current model
Other models
Avg (63.8%)

Multipl-E MBPP

Current model
Other models
Avg (60.0%)

API-Bank

Current model
Other models
Avg (88.2%)

BFCL

Current model
Other models
Avg (73.7%)

Gorilla Benchmark API Bench

Current model
Other models
Avg (24.4%)

Nexus

Current model
Other models
Avg (47.0%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Llama 3.1 405B Instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models