Buy hardware | Install | Discord | Join Us

TT-NN is a Python & C++ Neural Network OP library.

API Reference | Model Demos

LLMs

Model	Batch	Hardware	ttft (ms)	t/s/u	Target t/s/u	t/s	TT-Metalium Release	vLLM Tenstorrent Repo Release
Falcon 7B	32	n150	71	18.1	26	579.2	v0.56.0-rc6
Mistral 7B	32	n150		9.9	25	316.8	v0.51.0-rc28
Mamba 2.8B	32	n150	48	12.3	41	393.6	v0.51.0-rc26
Llama 3.1 8B	32	n150	168	24.0	23	768.0	v0.56.0-rc6	b9564bf
Llama 3.2 1B	32	n150	56	59.4	160	1900.8	v0.56.0-rc6	b9564bf
Llama 3.2 3B	32	n150	97	36.5	60	1168.0	v0.56.0-rc6	b9564bf
Llama 3.2 11B Vision (TP=2)	16	n300	2550	15.8	17	252.8	v0.56.0-rc3	0fde628
Falcon 7B (DP=8)	256	QuietBox	88	15.5	26	3968.0	v0.55.0-rc18
Llama 3.1 70B (TP=8)	32	QuietBox	190	15.1	20	483.2	v0.54.0-rc2	9531611
Falcon 40B (TP=8)	32	QuietBox		5.3	36	169.6	v0.55.0-rc20
Mixtral 8x7B (TP=8)	32	QuietBox	227	14.9	33	476.8	v0.56.0-rc6
Falcon 7B (DP=32)	1024	Galaxy	223	4.8	26	4915.2	v0.56.0-rc6
Llama 3.1 70B (DP=4, TP=8)	128	Galaxy	190	14.3	20	1835.5	v0.52.0-rc31
Llama 3.1 70B (TP=32)	32	Galaxy	763	13.5	80	432.0	v0.56.0-rc6	b9564bf
DeepSeek R1 Distill Llama 3.3 70B (TP=8)	32	QuietBox	1113	16.4	33	524.8	main	b9564bf

Last Update: February 10, 2025

Notes:

ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.

TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.

The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).

The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

CNNs

Model	Batch	Hardware	fps	Target fps
ResNet-50 (224x224)	20	e150	5,100	10,000
ResNet-50 (224x224)	16	n150	4,700	7,000
ResNet-50 (224x224) (DP=2)	32	n300	9,200	14,000
ResNet-50 (224x224) (DP=8)	128	QuietBox	35,800	56,000
ResNet-50 (224x224) (DP=32)	512	Galaxy	96,800	224,000
ResNet-50 (224x224) (DP=64)	1024	Two Galaxies	145,000	448,000
ViT (224x224)	9	e150	1,360	2,000
ViT (224x224)	8	n150	912	1,600
Stable Diffusion 1.4 (512x512)	1	n150	0.167	0.3
YOLOv4 (320x320)	1	n150	95	300
SegFormer Semantic Segmentation (512x512)	1	n150	90	300
Stable Diffusion 3.5 medium (512x512)	1	n150	0.06	0.3

NLPs

Model	Batch	Hardware	sen/sec	Target sen/sec
BERT-Large	12	e150	370	410
BERT-Large	8	n150	270	400
T5 small		e150	140
Bloom		e150	70

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

Model Bring-Up and Testing

For information on initial model procedures, please see Model Bring-Up and Testing

TT-NN Tech Reports

Advanced Performance Optimizations for Models (updated Dec 4th, 2024)
Programming Mesh of Devices (updated Sept 9th, 2024)
ViT Implementation in TT-NN on GS (updated Sept 22nd, 2024)
LLMs Bring up in TT-NN (updated Oct 29th, 2024)
YOLOv4 Implementation in TT-NN on WH (updated November 8th, 2024)
CNN Bring up & Optimization in TT-NN (updated Jan 22nd, 2025)

Benchmarks

Matrix Multiply FLOPS on WH (updated November 13th, 2024)

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Programming Guide | API Reference

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

Matrix Engine (updated Sept 6th, 2024)
Data Formats (updated Sept 7th, 2024)
Reconfiguring Data Formats (updated Oct 17th, 2024)
Handling special floating-point numbers (updated Oct 5th, 2024)
Allocator (Updated Dec 19th, 2024)
Tensor Layouts (updated Sept 6th, 2024)
Saturating DRAM Bandwidth (updated Sept 6th, 2024)
Flash Attention on Wormhole (updated Sept 6th, 2024)
CNNs on TT Architectures (updated Sept 6th, 2024)
Ethernet and Multichip Basics (Updated Sept 20th, 2024)
Collective Communication Library (CCL) (Updated Sept 20th, 2024)
Blackhole Bring-Up Programming Guide (Updated Dec 18th, 2024)
Sub-Devices (Updated Jan 7th, 2025)

Name		Name	Last commit message	Last commit date
Latest commit History 13,237 Commits
.github		.github
cmake		cmake
contributing		contributing
dependencies		dependencies
dockerfile		dockerfile
docs		docs
infra		infra
models		models
scripts		scripts
tech_reports		tech_reports
tests		tests
tt-train		tt-train
tt_fabric		tt_fabric
tt_metal		tt_metal
ttnn		ttnn
.clang-format		.clang-format
.clang-format-ignore		.clang-format-ignore
.clang-tidy		.clang-tidy
.clangd		.clangd
.gersemirc		.gersemirc
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.test_durations		.test_durations
.yamllint		.yamllint
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Doxyfile		Doxyfile
INSTALLING.md		INSTALLING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
METALIUM_GUIDE.md		METALIUM_GUIDE.md
README.md		README.md
build_metal.sh		build_metal.sh
check_copyright_config.yaml		check_copyright_config.yaml
cloc.sh		cloc.sh
conftest.py		conftest.py
create_venv.sh		create_venv.sh
install_dependencies.sh		install_dependencies.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Buy hardware | Install | Discord | Join Us

API Reference | Model Demos

LLMs

CNNs

NLPs

Model Updates

Model Bring-Up and Testing

TT-NN Tech Reports

Benchmarks

Programming Guide | API Reference

Getting started

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

About

Releases

Packages

Languages

License

dgomezTT/tt-metal

Folders and files

Latest commit

History

Repository files navigation

Buy hardware | Install | Discord | Join Us

API Reference | Model Demos

LLMs

CNNs

NLPs

Model Updates

Model Bring-Up and Testing

TT-NN Tech Reports

Benchmarks

Programming Guide | API Reference

Getting started

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages