Tags · DavidBert/pytorch

v1.8.2

Wrap cub in its own namespace (pytorch#55292) (pytorch#61605)

Summary:
Tentative fix for pytorch#55027.
Wraps cub import in its name space so that static variables used by cub and thrust don't conflict if they end up in the different libraries when torch is built with BUILD_SPLIT_CUDA. cub variables end up in their own namespace, thrust variables are unwrapped, so they don't clash.
This also allows extensions to use cub without wrapping it (thrust will still be problematic). The solution to allowing extensions to use thrust is to stop using thrust in pytorch completely.
Now importing cub and importing thrust cannot coexist, so I had to move nonzero to its own file, and remove reliance on thrust functions for it. Nonzero now uses cub only.
Also, we cannot selectively import just some of cub headers, we are forced to import `cub/cub.cuh`, which is not great.
Caffe2 ops using cub are not touched (there are too many), so mixing caffe2 and torch will (can) still result in the same bug. We are moving towards disabling c2 ops, so I think this is fine.
Still, even with that compiler (correctly) warns about redefinition of `CUB_NS_PREFIX` because including `ATen/ATen.h` transitively includes `thrust/complex.h` and that in turn includes original (empty) definition of `CUB_NS_PREFIX`. We probably can just ignore this warning. Here's an example warning:
```
In file included from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:9:
/data/users/ngimel/pytorch/aten/src/ATen/cuda/CubUtils.cuh:4: warning: "CUB_NS_PREFIX" redefined
 #define CUB_NS_PREFIX namespace at{ namespace native{

In file included from /home/ngimel/local/cuda/include/thrust/system/cuda/config.h:76,
                 from /home/ngimel/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:33,
                 from /home/ngimel/local/cuda/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /home/ngimel/local/cuda/include/thrust/iterator/iterator_traits.h:111,
                 from /home/ngimel/local/cuda/include/thrust/detail/type_traits/pointer_traits.h:23,
                 from /home/ngimel/local/cuda/include/thrust/type_traits/is_contiguous_iterator.h:27,
                 from /home/ngimel/local/cuda/include/thrust/type_traits/is_trivially_relocatable.h:19,
                 from /home/ngimel/local/cuda/include/thrust/detail/complex/complex.inl:20,
                 from /home/ngimel/local/cuda/include/thrust/complex.h:1031,
                 from /data/users/ngimel/pytorch/c10/util/complex.h:9,
                 from /data/users/ngimel/pytorch/c10/core/ScalarType.h:4,
                 from /data/users/ngimel/pytorch/c10/core/Scalar.h:10,
                 from /data/users/ngimel/pytorch/build/aten/src/ATen/core/TensorBody.h:8,
                 from /data/users/ngimel/pytorch/aten/src/ATen/Tensor.h:3,
                 from /data/users/ngimel/pytorch/aten/src/ATen/Context.h:4,
                 from /data/users/ngimel/pytorch/aten/src/ATen/ATen.h:9,
                 from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:1:
/home/ngimel/local/cuda/include/cub/util_namespace.cuh:43: note: this is the location of the previous definition
 #define CUB_NS_PREFIX

```
We will need a lint rule to prevent people from including `cub/cub.cuh`, because this will lead to pytorch#55027 reappearing again for some sequence of operations (and will lead to errors with cub code in extensions).
Also, for this to work reliably we'll need to make sure that everything calling cub ends up in only one of libtorch_cuda_cu or libtorch_cuda_cpp, otherwise even namespace won't help (there still will be same symbols in 2 libraries).

Upd: libtorch_cuda_cpp and libtorch_cuda_cu still contain the same symbols, which means that there exists a sequence of operations that will cause cache bug to reappear, so this is not a solution, we need to adjust file lists for BUILD_SPLITC_CUDA:
```
(pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cu.so | grep PerDeviceAttributeCache | c++filt
000000000c6bf808 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
000000000c600830 u guard variable for cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache
00000000018625e0 t at::native::cub::PerDeviceAttributeCache::DevicePayload at::native::cub::PerDeviceAttributeCache::operator()<at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int)
00000000009ce630 t cub::PerDeviceAttributeCache::DevicePayload cub::PerDeviceAttributeCache::operator()<cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int)
000000000c6bf820 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
000000000c600840 u cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache
(pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cpp.so | grep PerDeviceAttributeCache | c++filt
0000000000ad2d98 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
0000000000ad2da0 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
```
Upd2:
Moved TensorFactories.cu to torch_cuda_cu sources (see a change to caffe2/CMakeLists.txt), so now cub-related symbols are only in libtorch_cuda_cu. We'd need a test for that, any suggestions on how best to test it?
cc zasdfgbnm malfet

Pull Request resolved: pytorch#55292

Reviewed By: anjali411

Differential Revision: D27576442

Pulled By: ngimel

fbshipit-source-id: 1ef29503a342bb214794d34a42a47052092a66c1

Co-authored-by: Natalia Gimelshein <[email protected]>

Jul 23, 2021
e0495a7
zip
tar.gz

v1.8.2-rc1

Wrap cub in its own namespace (pytorch#55292) (pytorch#61605)

Summary:
Tentative fix for pytorch#55027.
Wraps cub import in its name space so that static variables used by cub and thrust don't conflict if they end up in the different libraries when torch is built with BUILD_SPLIT_CUDA. cub variables end up in their own namespace, thrust variables are unwrapped, so they don't clash.
This also allows extensions to use cub without wrapping it (thrust will still be problematic). The solution to allowing extensions to use thrust is to stop using thrust in pytorch completely.
Now importing cub and importing thrust cannot coexist, so I had to move nonzero to its own file, and remove reliance on thrust functions for it. Nonzero now uses cub only.
Also, we cannot selectively import just some of cub headers, we are forced to import `cub/cub.cuh`, which is not great.
Caffe2 ops using cub are not touched (there are too many), so mixing caffe2 and torch will (can) still result in the same bug. We are moving towards disabling c2 ops, so I think this is fine.
Still, even with that compiler (correctly) warns about redefinition of `CUB_NS_PREFIX` because including `ATen/ATen.h` transitively includes `thrust/complex.h` and that in turn includes original (empty) definition of `CUB_NS_PREFIX`. We probably can just ignore this warning. Here's an example warning:
```
In file included from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:9:
/data/users/ngimel/pytorch/aten/src/ATen/cuda/CubUtils.cuh:4: warning: "CUB_NS_PREFIX" redefined
 #define CUB_NS_PREFIX namespace at{ namespace native{

In file included from /home/ngimel/local/cuda/include/thrust/system/cuda/config.h:76,
                 from /home/ngimel/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:33,
                 from /home/ngimel/local/cuda/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /home/ngimel/local/cuda/include/thrust/iterator/iterator_traits.h:111,
                 from /home/ngimel/local/cuda/include/thrust/detail/type_traits/pointer_traits.h:23,
                 from /home/ngimel/local/cuda/include/thrust/type_traits/is_contiguous_iterator.h:27,
                 from /home/ngimel/local/cuda/include/thrust/type_traits/is_trivially_relocatable.h:19,
                 from /home/ngimel/local/cuda/include/thrust/detail/complex/complex.inl:20,
                 from /home/ngimel/local/cuda/include/thrust/complex.h:1031,
                 from /data/users/ngimel/pytorch/c10/util/complex.h:9,
                 from /data/users/ngimel/pytorch/c10/core/ScalarType.h:4,
                 from /data/users/ngimel/pytorch/c10/core/Scalar.h:10,
                 from /data/users/ngimel/pytorch/build/aten/src/ATen/core/TensorBody.h:8,
                 from /data/users/ngimel/pytorch/aten/src/ATen/Tensor.h:3,
                 from /data/users/ngimel/pytorch/aten/src/ATen/Context.h:4,
                 from /data/users/ngimel/pytorch/aten/src/ATen/ATen.h:9,
                 from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:1:
/home/ngimel/local/cuda/include/cub/util_namespace.cuh:43: note: this is the location of the previous definition
 #define CUB_NS_PREFIX

```
We will need a lint rule to prevent people from including `cub/cub.cuh`, because this will lead to pytorch#55027 reappearing again for some sequence of operations (and will lead to errors with cub code in extensions).
Also, for this to work reliably we'll need to make sure that everything calling cub ends up in only one of libtorch_cuda_cu or libtorch_cuda_cpp, otherwise even namespace won't help (there still will be same symbols in 2 libraries).

Upd: libtorch_cuda_cpp and libtorch_cuda_cu still contain the same symbols, which means that there exists a sequence of operations that will cause cache bug to reappear, so this is not a solution, we need to adjust file lists for BUILD_SPLITC_CUDA:
```
(pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cu.so | grep PerDeviceAttributeCache | c++filt
000000000c6bf808 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
000000000c600830 u guard variable for cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache
00000000018625e0 t at::native::cub::PerDeviceAttributeCache::DevicePayload at::native::cub::PerDeviceAttributeCache::operator()<at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int)
00000000009ce630 t cub::PerDeviceAttributeCache::DevicePayload cub::PerDeviceAttributeCache::operator()<cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int)
000000000c6bf820 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
000000000c600840 u cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache
(pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cpp.so | grep PerDeviceAttributeCache | c++filt
0000000000ad2d98 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
0000000000ad2da0 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
```
Upd2:
Moved TensorFactories.cu to torch_cuda_cu sources (see a change to caffe2/CMakeLists.txt), so now cub-related symbols are only in libtorch_cuda_cu. We'd need a test for that, any suggestions on how best to test it?
cc zasdfgbnm malfet

Pull Request resolved: pytorch#55292

Reviewed By: anjali411

Differential Revision: D27576442

Pulled By: ngimel

fbshipit-source-id: 1ef29503a342bb214794d34a42a47052092a66c1

Co-authored-by: Natalia Gimelshein <[email protected]>

Jul 23, 2021
e0495a7
zip
tar.gz

v1.9.0

[docs] Add torch.package documentation for beta release (pytorch#59886)

**Summary**
This commit adds documentation for the `torch.package` module to
accompany its beta release in 1.9.

**Test Plan**
Continous integration.

Jun 11, 2021
d69c22d
zip
tar.gz

v1.9.0-rc4

[docs] Add torch.package documentation for beta release (pytorch#59886)

**Summary**
This commit adds documentation for the `torch.package` module to
accompany its beta release in 1.9.

**Test Plan**
Continous integration.

Jun 11, 2021
d69c22d
zip
tar.gz

v1.9.0-rc3

Fix test_randperm_device_compatibility for 1 GPU (pytorch#59484) (pyt…

…orch#59502)

Summary:
Do not try to create tensors on 2nd device if device_count() == 1

Fixes #{issue number}

Pull Request resolved: pytorch#59484

Reviewed By: ngimel

Differential Revision: D28910673

Pulled By: malfet

fbshipit-source-id: e3517f31a463dd049ce8a5155409b7b716c8df18

Jun 5, 2021
bc446f6
zip
tar.gz

v1.9.0-rc2

Document factory_kwargs in nn.Quantize + remove Attributes section (p…

…ytorch#59025) (pytorch#59045)

Summary:
The `factory_kwargs` kwarg was previously undocumented in `nn.Quantize`. Further, the `Attributes` section of the docs was improperly filled in, resulting in bad formatting. This section doesn't apply since `nn.Quantize` doesn't have parameters, so it has been removed.

Pull Request resolved: pytorch#59025

Reviewed By: anjali411

Differential Revision: D28723889

Pulled By: jbschlosser

fbshipit-source-id: ba86429f66d511ac35042ebd9c6cc3da7b6b5805

Co-authored-by: Joel Schlosser <[email protected]>

May 28, 2021
d417a09
zip
tar.gz

v1.9.0-rc1

[release/1.9] Fix issues regarding binary_chekcout (pytorch#58495)

Signed-off-by: Eli Uriegas <[email protected]>

May 18, 2021
085a3bc
zip
tar.gz

v1.8.1

Perform appropriate CUDA stream synchronization in distributed autogr…

…ad. (pytorch#53929) (pytorch#54358)

Summary:
Pull Request resolved: pytorch#53929

The local autograd engine performs appropriate stream synchronization
between autograd nodes in the graph to ensure a consumer's stream is
synchronized with the producer's stream before executing the consumer.

However in case of distributed autograd, the SendRpcBackward function receives
gradients over the wire and TensorPipe uses its own pool of streams for this
purpose. As a result, the tensors are received on TensorPipe's stream pool but
SendRpcBackward runs on a different stream during the backward pass and there
is no logic to synchronize these streams.

To fix this, I've enhanced DistEngine to synchronize these streams
appropriately when it receives grads over the wire.
ghstack-source-id: 124055277

(Note: this ignores all push blocking failures!)

Test Plan:
1) Added unit test which reproduced the issue.
2) waitforbuildbot.

Reviewed By: walterddr, wanchaol

Differential Revision: D27025307

fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414

Co-authored-by: Pritam Damania <[email protected]>

Mar 24, 2021
56b43f4
zip
tar.gz

v1.8.1-rc3

Perform appropriate CUDA stream synchronization in distributed autogr…

…ad. (pytorch#53929) (pytorch#54358)

Summary:
Pull Request resolved: pytorch#53929

The local autograd engine performs appropriate stream synchronization
between autograd nodes in the graph to ensure a consumer's stream is
synchronized with the producer's stream before executing the consumer.

However in case of distributed autograd, the SendRpcBackward function receives
gradients over the wire and TensorPipe uses its own pool of streams for this
purpose. As a result, the tensors are received on TensorPipe's stream pool but
SendRpcBackward runs on a different stream during the backward pass and there
is no logic to synchronize these streams.

To fix this, I've enhanced DistEngine to synchronize these streams
appropriately when it receives grads over the wire.
ghstack-source-id: 124055277

(Note: this ignores all push blocking failures!)

Test Plan:
1) Added unit test which reproduced the issue.
2) waitforbuildbot.

Reviewed By: walterddr, wanchaol

Differential Revision: D27025307

fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414

Co-authored-by: Pritam Damania <[email protected]>

Mar 24, 2021
56b43f4
zip
tar.gz

v1.8.1-rc2

third_party: Update kineto to fix libtorch builds (pytorch#54205)

Signed-off-by: Eli Uriegas <[email protected]>

Mar 17, 2021
b6f4980
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.8.2

v1.8.2-rc1

v1.9.0

v1.9.0-rc4

v1.9.0-rc3

v1.9.0-rc2

v1.9.0-rc1

v1.8.1

v1.8.1-rc3

v1.8.1-rc2

Tags: DavidBert/pytorch