Skip to content

TACCL experiment codebase containing scripts & synthesized plans for akmu used by TCCL

Notifications You must be signed in to change notification settings

mcrl/TACCL-experiment-codebase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TACCL Experiment Codebase

Repository Organization

.                                 
├── msccl                           # MSCCL runtime
├── sccl                            # MSCCL tools
├── taccl/taccl                     # TACCL
│   └── examples                    # TACCL inputs
│       └── topo                    #   - Topologies for akmu
│       └── sketch                  #   - Sketches for akmu
├── scripts                         # Scripts for experiment 
├── taccl-exp-synthesis-plans       # Synthesized plans for akmu using TACCL
├── transformers                    # HuggingFace Transformers 

Prerequisited

  • Anaconda
  • PyTorch @456ecef
  • Python 3.8
  • gurobipy

Setup

  1. Create conda env with Python 3.8. Naming it taccl is recommended to run all given scripts without error. We expect everything to be done inside this env, using conda activate.
conda create -n taccl python=3.8 
  1. Build MSCCL. Add NVCC_GENCODE with respect to your GPU.
make -j src.build CUDA_HOME=/path/to/cuda/install NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"
  1. Build PyTorch with MSCCL
export CUDA_NVCC_EXECUTABLE=/path/to/nvcc
export CUDA_HOME=/path/to/cuda/install
export CUDNN_INCLUDE_DIR=/path/to/cudnn
export CUDNN_LIB_DIR=/path/to/cudnn/lib
export USE_SYSTEM_NCCL=1
export NCCL_INCLUDE_DIRS=/path/to/msccl/include
export NCCL_LIBRARIES=/path/to/msccl/lib
cd pytorch && python setup.py install
  1. Build SCCL
cd sccl/ && python setup.py install
  1. Build HuggingFace
cd transformers && pip install -e . && pip install datasets evaluate accelerate
  1. Build TACCL
conda config --add channels http://conda.anaconda.org/gurobi
conda install -c conda-forge gurobi -y
<command to add Gurobi license>
cd taccl && pip install .

Getting started

(Optional) To write custom input topology and sketch files, measuring alpha-beta of your server is required. We used following libraries:

  • cuda-samples: For intra-node alpha-beta measurement. Specifically, we used p2pBandwidthLatencyTest.
  • OSU Micro-benchmarks: For inter-node alpha beta measurement. Specifically, we used osu_nccl_latency and osu_nccl_bw.

Our custom input files are provided under topo/ and sketch/.

To generate synthesis plans using custom input files, we used generate-synthesis.sh. Synthesized plans are in XML format and stored in taccl-exp-synthesis-plans. You can test the synthesized plans using provided run_single/two/multinode.sh.

To run a benchmark using the synthesized plans, we need to load them before launching the actual benchmark script. We provide a sample script in hf0.sh. Simply, we add the MSCCL prefix at the beginning of the benchmark script. Benchmark script must be a Python script for our implementation. You can test the benchmark using provided hf.sh.

About

TACCL experiment codebase containing scripts & synthesized plans for akmu used by TCCL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published