Add colab

facebookresearch · Oct 7, 2024 · dd5d68d · dd5d68d
1 parent 7f5bf66
commit dd5d68d
Show file tree

Hide file tree

Showing 6 changed files with 15 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -31,10 +31,11 @@ If you find our code useful for your research, please consider citing:
 
 ### 1. Requirements:
 * python 3.9, pytorch >= 2.0
+* install pytorch with cuda from https://pytorch.org/get-started/locally/, it is prerequisite for fast-hadamard-transform package.
 * pip install -r requirement.txt
 * git clone https://github.com/Dao-AILab/fast-hadamard-transform.git  
-  cd fast-hadamard-transform  
-  pip install .
+* cd fast-hadamard-transform  
+* pip install .
 
 ### 2. Steps to run:
 For the scripts here, set `output_rotation_path` `output_dir` `logging_dir` `optimized_rotation_path` to your own locations. For gated repo such as meta-llama, you can set your HF token to `access_token`.
@@ -59,7 +60,8 @@ To obtain ExecuTorch-compatible quantized models, you can use the following scri
 
 * `bash scripts/31_optimize_rotation_executorch.sh $model_name`
 * `bash scripts/32_eval_ptq_executorch.sh $model_name`
-
+
+We also provide an example [colab notebook](https://colab.research.google.com/gist/zxdmike/abbb2c9b0d1fd1f4ed8cdae8c02180f4) to train and export ExecuTorch compatiable Llama 3.2 models
 ### Note
 * If using GPTQ quantization method in Step 2 for quantizing both weight and activations, we optimize the rotation matrices with respect to a network where only activations are quantized.   
   e.g. `bash 10_optimize_rotation.sh meta-llama/Llama-2-7b 16 4 4` followed by `bash 2_eval_ptq.sh meta-llama/Llama-2-7b 4 4 4` with the `--optimized_rotation_path` pointing to the rotation optimized for W16A4KV4.

diff --git a/scripts/10_optimize_rotation.sh b/scripts/10_optimize_rotation.sh
@@ -5,6 +5,8 @@
 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
 
+# nnodes determines the number of GPU nodes to utilize (usually 1 for an 8 GPU node)
+# nproc_per_node indicates the number of GPUs per node to employ.
 torchrun --nnodes=1 --nproc_per_node=8 optimize_rotation.py \
 --input_model $1  \
 --output_rotation_path "your_path" \

diff --git a/scripts/11_optimize_rotation_fsdp.sh b/scripts/11_optimize_rotation_fsdp.sh
@@ -5,6 +5,8 @@
 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
 
+# nnodes determines the number of GPU nodes to utilize (usually 1 for an 8 GPU node)
+# nproc_per_node indicates the number of GPUs per node to employ.
 torchrun --nnodes=1 --nproc_per_node=8 optimize_rotation.py \
 --input_model $1  \
 --output_rotation_path "your_path" \

diff --git a/scripts/2_eval_ptq.sh b/scripts/2_eval_ptq.sh
@@ -5,6 +5,8 @@
 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
 
+# nnodes determines the number of GPU nodes to utilize (usually 1 for an 8 GPU node)
+# nproc_per_node indicates the number of GPUs per node to employ.
 torchrun --nnodes=1 --nproc_per_node=1 ptq.py \
 --input_model $1 \
 --do_train False \

diff --git a/scripts/31_optimize_rotation_executorch.sh b/scripts/31_optimize_rotation_executorch.sh
@@ -5,6 +5,8 @@
 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
 
+# nnodes determines the number of GPU nodes to utilize (usually 1 for an 8 GPU node)
+# nproc_per_node indicates the number of GPUs per node to employ.
 torchrun --nnodes=1 --nproc_per_node=8 optimize_rotation.py \
 --input_model $1  \
 --output_rotation_path "your_path" \

diff --git a/scripts/32_eval_ptq_executorch.sh b/scripts/32_eval_ptq_executorch.sh
@@ -5,6 +5,8 @@
 # This source code is licensed under the license found in the
 # LICENSE file in the root directory of this source tree.
 
+# nnodes determines the number of GPU nodes to utilize (usually 1 for an 8 GPU node)
+# nproc_per_node indicates the number of GPUs per node to employ.
 torchrun --nnodes=1 --nproc_per_node=1 ptq.py \
 --input_model $1 \
 --do_train False \