TACCL Experiment Codebase

Repository Organization

.                                 
├── msccl                           # MSCCL runtime
├── sccl                            # MSCCL tools
├── taccl/taccl                     # TACCL
│   └── examples                    # TACCL inputs
│       └── topo                    #   - Topologies for akmu
│       └── sketch                  #   - Sketches for akmu
├── scripts                         # Scripts for experiment 
├── taccl-exp-synthesis-plans       # Synthesized plans for akmu using TACCL
├── transformers                    # HuggingFace Transformers

Prerequisited

Anaconda
PyTorch @456ecef
Python 3.8
gurobipy

Setup

Create conda env with Python 3.8. Naming it taccl is recommended to run all given scripts without error. We expect everything to be done inside this env, using conda activate.

conda create -n taccl python=3.8

Build MSCCL. Add NVCC_GENCODE with respect to your GPU.

make -j src.build CUDA_HOME=/path/to/cuda/install NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"

Build PyTorch with MSCCL

export CUDA_NVCC_EXECUTABLE=/path/to/nvcc
export CUDA_HOME=/path/to/cuda/install
export CUDNN_INCLUDE_DIR=/path/to/cudnn
export CUDNN_LIB_DIR=/path/to/cudnn/lib
export USE_SYSTEM_NCCL=1
export NCCL_INCLUDE_DIRS=/path/to/msccl/include
export NCCL_LIBRARIES=/path/to/msccl/lib
cd pytorch && python setup.py install

Build SCCL

cd sccl/ && python setup.py install

Build HuggingFace

cd transformers && pip install -e . && pip install datasets evaluate accelerate

Build TACCL

conda config --add channels http://conda.anaconda.org/gurobi
conda install -c conda-forge gurobi -y
<command to add Gurobi license>
cd taccl && pip install .

Getting started

(Optional) To write custom input topology and sketch files, measuring alpha-beta of your server is required. We used following libraries:

cuda-samples: For intra-node alpha-beta measurement. Specifically, we used p2pBandwidthLatencyTest.
OSU Micro-benchmarks: For inter-node alpha beta measurement. Specifically, we used osu_nccl_latency and osu_nccl_bw.

Our custom input files are provided under topo/ and sketch/.

To generate synthesis plans using custom input files, we used generate-synthesis.sh. Synthesized plans are in XML format and stored in taccl-exp-synthesis-plans. You can test the synthesized plans using provided run_single/two/multinode.sh.

To run a benchmark using the synthesized plans, we need to load them before launching the actual benchmark script. We provide a sample script in hf0.sh. Simply, we add the MSCCL prefix at the beginning of the benchmark script. Benchmark script must be a Python script for our implementation. You can test the benchmark using provided hf.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TACCL Experiment Codebase

Repository Organization

Prerequisited

Setup

Getting started

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
msccl		msccl
sccl		sccl
scripts		scripts
taccl		taccl
transformers		transformers
.gitignore		.gitignore
README.md		README.md

mcrl/TACCL-experiment-codebase

Folders and files

Latest commit

History

Repository files navigation

TACCL Experiment Codebase

Repository Organization

Prerequisited

Setup

Getting started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages