Skip to content

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

License

Notifications You must be signed in to change notification settings

dgomezTT/tt-metal

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tt-metal CI

ttnn logo

TT-NN is a Python & C++ Neural Network OP library.


LLMs

Model Batch Hardware ttft (ms) t/s/u Target
t/s/u
t/s TT-Metalium Release vLLM Tenstorrent Repo Release
Falcon 7B 32 n150 71 18.1 26 579.2 v0.56.0-rc6
Mistral 7B 32 n150 9.9 25 316.8 v0.51.0-rc28
Mamba 2.8B 32 n150 48 12.3 41 393.6 v0.51.0-rc26
Llama 3.1 8B 32 n150 168 24.0 23 768.0 v0.56.0-rc6 b9564bf
Llama 3.2 1B 32 n150 56 59.4 160 1900.8 v0.56.0-rc6 b9564bf
Llama 3.2 3B 32 n150 97 36.5 60 1168.0 v0.56.0-rc6 b9564bf
Llama 3.2 11B Vision (TP=2) 16 n300 2550 15.8 17 252.8 v0.56.0-rc3 0fde628
Falcon 7B (DP=8) 256 QuietBox 88 15.5 26 3968.0 v0.55.0-rc18
Llama 3.1 70B (TP=8) 32 QuietBox 190 15.1 20 483.2 v0.54.0-rc2 9531611
Falcon 40B (TP=8) 32 QuietBox 5.3 36 169.6 v0.55.0-rc20
Mixtral 8x7B (TP=8) 32 QuietBox 227 14.9 33 476.8 v0.56.0-rc6
Falcon 7B (DP=32) 1024 Galaxy 223 4.8 26 4915.2 v0.56.0-rc6
Llama 3.1 70B (DP=4, TP=8) 128 Galaxy 190 14.3 20 1835.5 v0.52.0-rc31
Llama 3.1 70B (TP=32) 32 Galaxy 763 13.5 80 432.0 v0.56.0-rc6 b9564bf
DeepSeek R1 Distill Llama 3.3 70B (TP=8) 32 QuietBox 1113 16.4 33 524.8 main b9564bf

Last Update: February 10, 2025

Notes:

  • ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
  • TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

CNNs

Model Batch Hardware fps Target fps Release
ResNet-50 (224x224) 20 e150 5,100 10,000
ResNet-50 (224x224) 16 n150 4,700 7,000
ResNet-50 (224x224) (DP=2) 32 n300 9,200 14,000
ResNet-50 (224x224) (DP=8) 128 QuietBox 35,800 56,000
ResNet-50 (224x224) (DP=32) 512 Galaxy 96,800 224,000
ResNet-50 (224x224) (DP=64) 1024 Two Galaxies 145,000 448,000
ViT (224x224) 9 e150 1,360 2,000
ViT (224x224) 8 n150 912 1,600
Stable Diffusion 1.4 (512x512) 1 n150 0.167 0.3
YOLOv4 (320x320) 1 n150 95 300
SegFormer Semantic Segmentation (512x512) 1 n150 90 300
Stable Diffusion 3.5 medium (512x512) 1 n150 0.06 0.3

NLPs

Model Batch Hardware sen/sec Target sen/sec Release
BERT-Large 12 e150 370 410
BERT-Large 8 n150 270 400
T5 small e150 140
Bloom e150 70

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

Model Bring-Up and Testing

For information on initial model procedures, please see Model Bring-Up and Testing

TT-NN Tech Reports

Benchmarks


TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

About

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 52.0%
  • Python 40.6%
  • Jupyter Notebook 3.7%
  • C 2.7%
  • Shell 0.6%
  • CMake 0.4%