Name | Affiliation | |
---|---|---|
Heehoon Kim* | Seoul National University | [email protected] |
Junyeol Ryu | Seoul National University | [email protected] |
(* is the team lead)
.
├── llama_fast/ # ⚡ LLaMA python package
│ ├── apex_subset/ # - C extensions for kernels
│ ├── teamh_c_helper/ # - C extensions for helper functions
│ ├── build.py # - Package build script
│ ├── example.py # - Main inference script
│ ├── model.py # - LLaMA model components
│ ├── schedule.py # - Batch scheduling module
│ ├── tokenizer.py # - LLaMA tokenizer
│ ├── run.sh # - Docker entry script
├── tools/ # 🛠️ LLaMA tools
│ ├── repartition_ckpt.py # - Model ckpt repartition script
├── Dockerfile # 🐳 LLaMA Docker build script
Prepare <DATA_DIR> with the files from the original LLaMA 30B model checkpoint:
<DATA_DIR>
├── consolidated.00.pth # Model parallel partition 0
├── consolidated.01.pth # Model parallel partition 1
├── consolidated.02.pth # Model parallel partition 2
├── consolidated.03.pth # Model parallel partition 3
├── params.json # Parameter metadata json file
├── tokenizer.model # Tokenizer checkpoint
Then, execute the provided script to repartition the model checkpoint:
$ python tools/repartition_ckpt.py --data_dir <DATA_DIR>
If the repartition is successful, the <DATA_DIR> would contain the following additional files:
<DATA_DIR>
├── ...
├── 30B_cpu_0.pth # Pipeline parallel partition 0
├── 30B_cpu_1.pth # Pipeline parallel partition 1
├── 30B_cpu_2.pth # Pipeline parallel partition 2
├── 30B_cpu_3.pth # Pipeline parallel partition 3
├── ...
- Build docker image
docker build -t <IMAGE_NAME> .
- Run docker
docker run --rm -it --ipc host --gpus all -v <DATA_DIR>:/data --name <CONTAINER_NAME> <IMAGE_NAME>