Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Nimble is a deep learning execution engine that accelerates model inference and training by running GPU tasks (i.e., GPU kernels and memory operations) in parallel with minimal scheduling overhead. Given a PyTorch DL model, Nimble automatically generates a GPU task schedule, which employs an optimal parallelization strategy for the model. The schedule is wrapped in a Nimble object and can be seamlessly applied to PyTorch programs. Nimble improves the speed of inference and training by up to 22.34× and 3.61× compared to PyTorch, respectively. Moreover, Nimble outperforms TensorRT by up to 2.81×.

Speedup in Inference (ImageNet models)

Inference performance comparison on an NVIDIA V100 GPU.

Speedup in Training (CIFAR-10 models)

Batch 32	Batch 64	Batch 128

Training performance comparison on an NVIDIA V100 GPU.

Install Nimble

Please refer to instructions to install Nimble from source.

Use Nimble

Nimble supports both inference and training of neural networks.

Model Inference

import torch
import torchvision

# Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50()
model = model.cuda()
model.eval()

# Prepare a dummy input
input_shape = [1, 3, 224, 224]
dummy_input = torch.randn(*input_shape).cuda()

# Create a Nimble object
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=False)

# Execute the object
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)

Model Training

import torch
import torchvision

BATCH = 32

# Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50(num_classes=10)
model = model.cuda()
model.train()

# Define a loss function and an optimizer
loss_fn = torch.nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Prepare a dummy input
input_shape = [BATCH, 3, 32, 32]
dummy_input = torch.randn(*input_shape).cuda()

# Create a Nimble object
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=True)

# Execute the forward pass
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)

# Compute loss
label = torch.zeros(BATCH, dtype=torch.long).cuda()
loss = loss_fn(output, label)

# Execute the backward pass
loss.backward()

# Perform an optimization step
optimizer.step()

Reproduce Evaluation Results

Please refer to evaluation instructions to reproduce the evaluation results.

Publication

Woosuk Kwon*, Gyeong-In Yu*, Eunji Jeong, and Byung-Gon Chun (* equal contribution), Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning, 34th Conference on Neural Information Processing Systems (NeurIPS), Spotlight, December 2020.

Citation

@inproceedings{kwon2020nimble,
  title={Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning},
  author={Kwon, Woosuk and Yu, Gyeong-In and Jeong, Eunji and Chun, Byung-Gon},
  booktitle={NeurIPS},
  year={2020}
}

Troubleshooting

Create an issue for questions and bug reports.

Contribution

We welcome your contributions to Nimble! We aim to create an open-source project that is contributed by the open-source community. For general discussions about development, please subscribe to [email protected].

License

BSD 3-clause license

Name		Name	Last commit message	Last commit date
Latest commit History 22,781 Commits
.circleci		.circleci
.ctags.d		.ctags.d
.github		.github
.jenkins		.jenkins
android		android
aten		aten
benchmarks		benchmarks
binaries		binaries
c10		c10
caffe2		caffe2
cmake		cmake
docker		docker
docs		docs
experiment		experiment
figures		figures
ios		ios
modules		modules
scripts		scripts
submodules		submodules
test		test
third_party		third_party
tools		tools
torch		torch
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.python2		.python2
.travis.aten.yml		.travis.aten.yml
CITATION		CITATION
CMakeLists.txt		CMakeLists.txt
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NIMBLE_EVAL.md		NIMBLE_EVAL.md
NIMBLE_INSTALL.md		NIMBLE_INSTALL.md
NOTICE		NOTICE
README.md		README.md
mypy-README.md		mypy-README.md
mypy-files.txt		mypy-files.txt
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.py		setup.py
ubsan.supp		ubsan.supp
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Install Nimble

Use Nimble

Model Inference

Model Training

Reproduce Evaluation Results

Publication

Citation

Troubleshooting

Contribution

License

About

Releases

Packages

Languages

License

sungin-h/nimble

Folders and files

Latest commit

History

Repository files navigation

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Install Nimble

Use Nimble

Model Inference

Model Training

Reproduce Evaluation Results

Publication

Citation

Troubleshooting

Contribution

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages