-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit on input Tensor size using cunn #144
Comments
I'm also running into this. |
The grid used to launch the kernel uses the batch size as the first dimension: https://github.com/torch/cunn/blob/master/LogSoftMax.cu#L238 There's a limit on the size of these grids, which depends on the version of CUDA you're using. It's 65536 for compute capability <= 2.x and 2^31-1 for newer versions (see https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications) |
Aha, I got that far, and I couldn't see why it wasn't working out of the box on a K40. |
I use a Geforce 750Ti. The problem is solved by changing the CUDA_NVCC_FLAGS from "-arch=sm_20" to "-arch=sm_50" in the CmakeLists.txt. |
I have the same issue with SpatialSoftmax on the GPU, I am using a Titan X and cunn was compiled with CUDA 7.5 and compute capability 5.2 EDIT: I think I have fixed this by explicitly using cudnn.SpatialSoftMax |
Code to reproduce the error:
The error:
Large batch sizes work fine on CPU. Smaller batch sizes (<80k) work fine on GPU.
smth chntla on the issue:
The text was updated successfully, but these errors were encountered: