Adaptive distributed machine learning.
KungFu requires Python 3, CMake 3, Golang 1.11+ and TensorFlow 1.x.
# Install tensorflow CPU
pip3 install tensorflow==1.13.1
# pip3 install tensorflow-gpu==1.13.1 # Using GPUs
# Download the KungFu source code
git clone https://github.com/lsds/KungFu.git
# Install KungFu
# export CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) # Parallel install.
pip3 install .
KungFu provides: kungfu-run, similar to mpirun, to launch a TensorFlow program on multiple GPU/CPU devices in a server. Using the following command to build kungfu-run.
# Build kungfu-run in the given GOBIN directory.
GOBIN=$(pwd)/bin go install -v ./srcs/go/cmd/kungfu-run/
# Check if kungfu-run is built
./bin/kungfu-run -help
For Mac users, the following is required after the install:
export DYLD_LIBRARY_PATH=$(python3 -c "import os; import kungfu; print(os.path.dirname(kungfu.__file__))")
Download MNIST dataset (script) and run the following training script.
# Download the MNIST dataset in a ./mnist folder in the current directory.
./scripts/download-mnist.sh
# Train a Single Layer Perception (SLP) model for the MNIST dataset using 4 CPUs for 10 data epochs.
./bin/kungfu-run -np 4 -timeout 1h python3 examples/mnist_slp.py --n-epochs 10
./scripts/clean-code.sh --fmt-py
# build a .whl package for release
pip3 wheel -vvv --no-index .
KungFu can use NCCL to leverage GPU-GPU direct communication. However, the use of NCCL enforces KungFu to serialize the execution of all-reduce operations, which can hurt performance.
# uncomment to use your own NCCL
# export NCCL_HOME=$HOME/local/nccl
KUNGFU_USE_NCCL=1 pip3 install .