Llama 3.3 70B Instruct logo

Llama 3.3 70B Instruct

Meta Llama

Llama 3.3 is a state-of-the-art multilingual large language model expertly designed for engaging in conversations across numerous languages. This powerful generative model boasts 70 billion parameters, allowing it to surpass the performance of many other publicly available and proprietary chat models on standard industry evaluations. With an extensive context window of 128,000 tokens, Llama 3.3 is well-suited for both commercial applications and research endeavors in a variety of languages.

Model Specifications

Technical details and capabilities of Llama 3.3 70B Instruct

Core Specifications

70.0B Parameters

Model size and complexity

15000.0B Training Tokens

Amount of data used in training

128.0K / 128.0K

Input / Output tokens

November 30, 2023

Knowledge cutoff date

December 5, 2024

Release date

Capabilities & License

Multimodal Support
Not Supported
Web Hydrated
No
License
Llama 3.3 Community License Agreement

Resources

API Reference
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
Playground
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
Code Repository
https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

Performance Insights

Check out how Llama 3.3 70B Instruct handles various AI tasks through comprehensive benchmark results.

100
75
50
25
0
92.1
IFEval
92.1
(92%)
91.1
MGSM
91.1
(91%)
88.4
HumanEval
88.4
(88%)
87.6
MBPP EvalPlus
87.6
(88%)
86
MMLU
86
(86%)
77.3
BFCL v2
77.3
(77%)
77
MATH
77
(77%)
68.9
MMLU-Pro
68.9
(69%)
65.8
Arena Hard
65.8
(66%)
61.7
CRAG
61.7
(62%)
52.8
HELMET LongQA
52.8
(53%)
50.5
GPQA
50.5
(51%)
21.7
LongBench
21.7
(22%)
20
FinanceBench
20
(20%)
IFEval
MGSM
HumanEval
MBPP EvalPlus
MMLU
BFCL v2
MATH
MMLU-Pro
Arena Hard
CRAG
HELMET LongQA
GPQA
LongBench
FinanceBench

Model Comparison

See how Llama 3.3 70B Instruct stacks up against other leading models across key performance metrics.

100
80
60
40
20
0
86
MMLU - Llama 3.3 70B Instruct
86
(86%)
87.3
MMLU - Llama 3.1 405B Instruct
87.3
(87%)
86.2
MMLU - Grok-2 mini
86.2
(86%)
88.7
MMLU - GPT-4o
88.7
(89%)
83.3
MMLU - Qwen2.5 32B Instruct
83.3
(83%)
87.5
MMLU - Grok-2
87.5
(88%)
68.9
MMLU-Pro - Llama 3.3 70B Instruct
68.9
(69%)
73.3
MMLU-Pro - Llama 3.1 405B Instruct
73.3
(73%)
72
MMLU-Pro - Grok-2 mini
72
(72%)
72.6
MMLU-Pro - GPT-4o
72.6
(73%)
69
MMLU-Pro - Qwen2.5 32B Instruct
69
(69%)
75.5
MMLU-Pro - Grok-2
75.5
(76%)
50.5
GPQA - Llama 3.3 70B Instruct
50.5
(51%)
50.7
GPQA - Llama 3.1 405B Instruct
50.7
(51%)
51
GPQA - Grok-2 mini
51
(51%)
53.6
GPQA - GPT-4o
53.6
(54%)
49.5
GPQA - Qwen2.5 32B Instruct
49.5
(50%)
56.0
GPQA - Grok-2
56.0
(56%)
88.4
HumanEval - Llama 3.3 70B Instruct
88.4
(88%)
89
HumanEval - Llama 3.1 405B Instruct
89
(89%)
85.7
HumanEval - Grok-2 mini
85.7
(86%)
90.2
HumanEval - GPT-4o
90.2
(90%)
88.4
HumanEval - Qwen2.5 32B Instruct
88.4
(88%)
88.4
HumanEval - Grok-2
88.4
(88%)
77
MATH - Llama 3.3 70B Instruct
77
(77%)
73.8
MATH - Llama 3.1 405B Instruct
73.8
(74%)
73
MATH - Grok-2 mini
73
(73%)
76.6
MATH - GPT-4o
76.6
(77%)
83.1
MATH - Qwen2.5 32B Instruct
83.1
(83%)
76.1
MATH - Grok-2
76.1
(76%)
MMLU
MMLU-Pro
GPQA
HumanEval
MATH
Llama 3.3 70B Instruct
Llama 3.1 405B Instruct
Grok-2 mini
GPT-4o
Qwen2.5 32B Instruct
Grok-2

Detailed Benchmarks

Dive deeper into Llama 3.3 70B Instruct's performance across specific task categories. Expand each section to see detailed metrics and comparisons.

Knowledge

MMLU

Current model
Other models
Avg (84.3%)

MATH

Current model
Other models
Avg (75.1%)

Non categorized

IFEval

Current model
Other models
Avg (87.2%)

MBPP EvalPlus

Current model
Other models
Avg (88.1%)

MGSM

Current model
Other models
Avg (86.2%)

CRAG

Current model
Other models
Avg (59.8%)

FinanceBench

Current model
Other models
Avg (31.8%)

HELMET LongQA

Current model
Other models
Avg (41.4%)

LongBench

Current model
Other models
Avg (25.9%)

Providers Pricing Coming Soon

We're working on gathering comprehensive pricing data from all major providers for Llama 3.3 70B Instruct. Compare costs across platforms to find the best pricing for your use case.

OpenAI
Anthropic
Google
Mistral AI
Cohere

Share your feedback

Hi, I'm Charlie Palars, the founder of Deepranking.ai. I'm always looking for ways to improve the site and make it more useful for you. You can write me through this form or directly through X at @palarsio.

Your feedback helps us improve our service

Stay Ahead with AI Updates

Get insights on Gemini Pro 2.5, Sonnet 3.7 and more top AI models