oneAPI Collective Communications Library (oneCCL) provides an efficient implementation of communication patterns used in deep learning.
oneCCL is integrated into:
- Horovod* (distributed training framework). Refer to Horovod with oneCCL for details.
- PyTorch* (machine learning framework). Refer to PyTorch bindings for oneCCL for details.
Release notes available by link.
Ubuntu* 18 GNU*: C, C++ 4.8.5 or higher.
Intel(R) oneAPI DPC++ Compiler with L0 v1.0 support
cd oneccl
mkdir build
cd build
cmake ..
make -j install
If a "clear" build is needed, then one should create a new build directory and invoke cmake
within it.
Modify cmake
command as follow:
cmake .. -DCMAKE_INSTALL_PREFIX=/path/to/installation/directory
If no -DCMAKE_INSTALL_PREFIX
is specified then ccl will be installed into _install
subdirectory of the current
build directory, e.g. ccl/build/_install
Modify cmake
command as follow:
cmake .. -DCMAKE_C_COMPILER=your_c_compiler -DCMAKE_CXX_COMPILER=your_cxx_compiler
If your CXX compiler requires SYCL, it is possible to specify it (CodePlay ComputeCpp and DPC++ are available for now).
Modify cmake
command as follows:
cmake .. -DCMAKE_C_COMPILER=your_c_compiler -DCMAKE_CXX_COMPILER=compute++ -DCOMPUTE_RUNTIME=computecpp
cmake .. -DCMAKE_C_COMPILER=your_c_compiler -DCMAKE_CXX_COMPILER=dpcpp -DCOMPUTE_RUNTIME=dpcpp
OpenCL search location path hint can be specified by using standard environment OPENCLROOT
additionally:
OPENCLROOT=your_opencl_location cmake .. -DCMAKE_C_COMPILER=your_c_compiler -DCMAKE_CXX_COMPILER=compute++ -DCOMPUTE_RUNTIME=computecpp
Modify cmake
command as follow:
cmake .. -DCMAKE_BUILD_TYPE=[Debug|Release|RelWithDebInfo|MinSizeRel]
Modify make
command as follow to see all parameters used by make
during compilation
and linkage:
make -j VERBOSE=1
Modify cmake
command as follow:
cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_ASAN=true
Note: address sanitizer only works in Debug build
Make sure that libasan.so exists.
Use the command:
$ source <install_dir>/env/setvars.sh
$ cd <install_dir>/examples
$ mpirun -n 2 ./benchmark/benchmark
There are two ways to set workers threads affinity - explicit and automatic
- Set environment variable CCL_WORKER_COUNT with desired number of workers threads
- Set environment variable CCL_WORKER_AFFINITY with IDs of cores to be bound to
Example:
export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=3,4,5,6
With variables above CCL will create 4 threads and pin them to cores with numbers 3,4,5 and 6 accordingly
NOTE: automatic pinning only works if application has been launched using mpirun provided by CCL distribution package.
- Set environment variable CCL_WORKER_COUNT with desired number of workers threads
- Set environment variable CCL_WORKER_AFFINITY with value auto
Example:
export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=auto
With variables above CCL will create 4 threads and pin them to the last 4 cores available for the launched process.
The exact IDs of CPU cores depend on parameters passed to mpirun
In the most cases there is no need in removal of the current build directory. Just run make
to
compile and link changed files. Only if one sees some suspicious build errors after significant
change in the code (e.g. after rebase or change of branch) then it is a hint to clean build directory.