FBGEMM_GPU (FBGEMM GPU kernel library) is a collection of high-performance CUDA GPU operator library for GPU training.
The library provides efficient table batched embedding bag, data layout transformation, and quantization supports.
The tests (in test folder) and benchmarks (in bench folder) are some great examples of using FBGEMM_GPU.
FBGEMM_GPU uses the standard CMAKE-based build flow and PyTorch TorchScript extension with custom C++ operator build flow.
FBGEMM_GPU requires nvcc and a Nvidia GPU with compute capability of 3.5+.
For the CUB build time dependency, if you are using conda, you can continue with
conda install -c bottler nvidiacub
Otherwise download the CUB library from https://github.com/NVIDIA/cub/releases and unpack it to a folder of your choice. Define the environment variable CUB_DIR before building and point it to the directory that contains CMakeLists.txt for CUB. For example on Linux/Mac,
curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
tar xzf 1.10.0.tar.gz
export CUB_DIR=$PWD/cub-1.10.0
googletest is required to build and run FBGEMM_GPU's tests. googletest is not required if you don't want to run FBGEMM_GPU tests. By default, building of tests is on. Turn it off by setting FBGEMMGPU_BUILD_TESTS to off.
PyTorch and Jinja2 are required to build and run the table batched embedding bag operator. One thing to note is that the implementation of this op relies on the latest version of PyTorch (1.8+), so it requires the installation with PyTorch Nightly:
conda uninstall pytorch
# update with the corresponding CUDA version
conda install pytorch cudatoolkit=9.2 -c pytorch-nightly
conda install jinja2
You can download googletest and set GOOGLETEST_SOURCE_DIR respectively for cmake to find these libraries. If any of these variables is not set, cmake will build the git submodules found in the third_party directory.
General build instructions are as follows:
git clone --recursive https://github.com/pytorch/FBGEMM.git
cd FBGEMM/fbgemm_gpu
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive
# configure the NVCC and CUB path
export CUDACXX=/usr/local/cuda/bin/nvcc
export CUB_DIR=${CUB_DIR}
# in fbgemm_gpu folder
# build the data layout transform op, quantized ops, etc.
mkdir build && cd build
cmake ..
make
# build the table batched embedding bag op
cd ..
python setup.py build develop
To run the tests or benchmarks after building FBGEMM_GPU (if tests or benchmarks are built), use the following command:
# run the tests for the data layout transform op, quantized ops, etc.
cd build && make test
# run the tests and benchmarks of table batched embedding bag op
cd ..
python test/split_table_batched_embeddings_test.py
python bench/split_table_batched_embeddings_benchmark.py
For a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM_GPU please see our Wiki (work in progress).
We have extensively used comments in our source files. The best and up-do-date documentation is available in the source files.
See the CONTRIBUTING
file for how to help out.
FBGEMM is BSD licensed, as found in the LICENSE
file.