Skip to content

Commit

Permalink
Fix "cuda: unknown error" on Windows (#21062)
Browse files Browse the repository at this point in the history
Summary:
Thanks Jonas1312 for validating this workground.
Fixes #20635.
However, I don't know exactly why this one is needed.
The following are my guesses:
1. It is a CUDA bug. Static linking against `cudart` is the default now, so they didn't run enough tests for dynamic ones.
2. It is related to UCRT. But (1)according to msdn, shared DLLs should share the same CRT. (2) The CUDA related objects like `CUDevice` passing to `cudart` are stored on the stack, not the heap. (3) If this is the case, it should always fail, not sometimes. https://docs.microsoft.com/en-us/cpp/c-runtime-library/potential-errors-passing-crt-objects-across-dll-boundaries?view=vs-2019
3. It is a bug of our side. However, I was unable to find it.
Pull Request resolved: pytorch/pytorch#21062

Differential Revision: D15543557

Pulled By: ezyang

fbshipit-source-id: c23af45ebf582fad93ce5f029af6e1f06cf1d49d
  • Loading branch information
peterjc123 authored and facebook-github-bot committed May 29, 2019
1 parent 157fcfc commit 8fcd80a
Showing 1 changed file with 15 additions and 5 deletions.
20 changes: 15 additions & 5 deletions cmake/public/cuda.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,21 @@ endif()
# release (3.11.3) yet. Hence we need our own Modules_CUDA_fix to enable sccache.
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_LIST_DIR}/../Modules_CUDA_fix)

# we dont want to statically link cudart, because we rely on it's dynamic linkage in
# python (follow along torch/cuda/__init__.py and usage of cudaGetErrorName).
# Technically, we can link cudart here statically, and link libtorch_python.so
# to a dynamic libcudart.so, but that's just wasteful
SET(CUDA_USE_STATIC_CUDA_RUNTIME OFF CACHE INTERNAL "")
# We don't want to statically link cudart, because we rely on it's dynamic linkage in
# python (follow along torch/cuda/__init__.py and usage of cudaGetErrorName).
# Technically, we can link cudart here statically, and link libtorch_python.so
# to a dynamic libcudart.so, but that's just wasteful.
# However, on Windows, if this one gets switched off, the error "cuda: unknown error"
# will be raised when running the following code:
# >>> import torch
# >>> torch.cuda.is_available()
# >>> torch.cuda.current_device()
# More details can be found in the following links.
# https://github.com/pytorch/pytorch/issues/20635
# https://github.com/pytorch/pytorch/issues/17108
if (NOT MSVC)
set(CUDA_USE_STATIC_CUDA_RUNTIME OFF CACHE INTERNAL "")
endif()

# Find CUDA.
find_package(CUDA)
Expand Down

0 comments on commit 8fcd80a

Please sign in to comment.