T6: Tensor ProducT ATTenTion Transformer

T6 (Tensor ProducT ATTenTion Transformer) is a state-of-the-art transformer model that leverages Tensor Product Attention (TPA) mechanisms to enhance performance and reduce KV cache size. This repository provides tools for data preparation, model pretraining, and evaluation to facilitate research and development using the T6 architecture.

This repository contains the official code for the paper "Tensor Product Attention Is All You Need".

Authors: Yifan Zhang*, Yifeng Liu*, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao

[Webpage] [Huggingface]

Features

Tensor Product Attention: Implements advanced attention mechanisms for improved model performance.
Scalability: Efficient training procedures optimized for large-scale datasets and multi-GPU setups.
Flexible Data Support: Compatible with popular datasets like Fineweb-Edu-100B and OpenWebText.
Comprehensive Evaluation: Integrated with lm-evaluation-harness for standardized benchmarking.
Higher-order TPA (TBD): Higher-order TPA.
Flash TPA (TBD): Flash TPA.

Installation

Ensure you have Python 3.10 or higher installed. It's recommended to use a virtual environment to manage dependencies.

Clone the Repository

git clone https://github.com/tensorgi/T6.git
cd T6

Create and Activate a Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Required Packages

pip install torch==2.4.0 numpy transformers datasets tiktoken wandb tqdm

Data Preparation

Prepare the necessary datasets before pretraining the model. T6 supports both Fineweb-Edu-100B and OpenWebText.

Fineweb-Edu-100B

Fineweb-Edu-100B is a large-scale educational dataset hosted on Hugging Face.

Navigate to the Data Directory
```
cd data/fineweb-edu
```
Run the Data Preparation Script
```
python fineweb-edu.py
```
Move the Prepared Data
```
mv fineweb-edu100B ..
cd ../..
```

OpenWebText

OpenWebText is an open reproduction of OpenAI's WebText dataset.

Run the Data Preparation Script
```
python data/openwebtext/prepare.py
```
Ensure you have sufficient storage and computational resources as OpenWebText is sizable.

Pretraining

Pretrain the T6 model using the prepared datasets. The provided scripts support distributed training across multiple GPUs.

Using the Provided Bash Script

Execute the pretraining script which handles the training process.
```
bash pretrain.sh
```
Manual Execution with torchrun

For more control or customization, use torchrun to initiate training. Replace config/train_T6_medium_adam_80g8.py with your desired configuration file.
```
torchrun --standalone --nproc_per_node=8 \
    train_adam_fw.py \
    config/train_T6_medium_adam_80g8.py
```
- --nproc_per_node=8 specifies the number of processes (typically matching the number of GPUs).

Evaluation

Evaluate the performance of the pretrained T6 model using standardized benchmarks.

Navigate to the Evaluation Harness Directory
```
cd lm-evaluation-harness
```
Follow the Instructions Within This Directory

Ensure your model is compatible with the evaluation harness requirements.

Acknowledgements

Hugging Face for providing the Fineweb-Edu-100B dataset.
EleutherAI for the lm-evaluation-harness.
OpenWebText team for replicating the WebText dataset.

Star History

Citation

If you use Tensor Product Attention (TPA) or the Tensor ProducT ATTenTion Transformer (T6) in your research or application, please consider citing it!

@article{zhang2025tensor,
    title={Tensor Product Attention Is All You Need},
    author={Zhang, Yifan and Liu, Yifeng and Yuan, Huizhuo and Qin, Zhen and Yuan, Yang and Gu, Quanquan and Yao, Andrew Chi-Chih},
    journal={arXiv preprint arXiv:2501.06425},
    year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
data		data
lm-evaluation-harness		lm-evaluation-harness
model		model
.gitignore		.gitignore
Flash_TPA.pdf		Flash_TPA.pdf
Higher_order_TPA.pdf		Higher_order_TPA.pdf
LICENSE		LICENSE
README.md		README.md
configurator.py		configurator.py
pretrain.sh		pretrain.sh
requirements.txt		requirements.txt
train_adam.py		train_adam.py
train_adam_fw.py		train_adam_fw.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T6: Tensor ProducT ATTenTion Transformer

Table of Contents

Features

Installation

Data Preparation

Fineweb-Edu-100B

OpenWebText

Pretraining

Evaluation

Acknowledgements

Star History

Citation

About

Releases

Packages

Contributors 2

Languages

License

tensorgi/T6

Folders and files

Latest commit

History

Repository files navigation

T6: Tensor ProducT ATTenTion Transformer

Table of Contents

Features

Installation

Data Preparation

Fineweb-Edu-100B

OpenWebText

Pretraining

Evaluation

Acknowledgements

Star History

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages