-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failed with th -lcunn -e "nn.testcuda()" #117
Comments
I have similar test failures. Just updated my torch distro today. Here is the output of nn.testcuda and cutorch.test outputs: th> nn.testcuda() Threshold_transposed
th> cutorch.test() pow2 largeNoncontiguous abs1
|
Did you try with a different GPU? I switched to a better GPU and it worked. Make sure you have >8GB of global memory |
I'm having the same problem, would it be possible to tweak parameters of the test so that it uses a bit less memory? Would be nice if it worked on the Amazon GPU instances... |
you could safely ignore out of memory errors. |
I wish I could tell Ansible that 😕 I'm automatically provisioning servers in the cloud and I would like to run the smoke test to see that everything worked. The current workaround is to comment out the entire test... I'm also seeing the errors reported by the original poster just before the out of memory issue occurs.
|
We have to increase LogSoftMax's threshold slightly, and SpatialSubSampling actually looks like it has a corner-case bug. wrt out of memory errors, the tests could be modified to keep tensor sizes within the memory sizes reported in cutorch.getDeviceProperties. Any PRs for any of these are appreciated. If not I'll fix them at my own pace. |
i have a similar problem when running ./test.sh, has this been resolved? (I use ubuntu 14.04 and cuda 7.0 in aws) LogSoftMax_forward_batch |
When I run test.sh, th -lcunn -e "nn.testcuda()" is unstable and occasionally fails with different error messages:
Every time the error message is different, some examples are:
SpatialSubSampling_backward
error on state (backward)
LT(<) violation val=1.2421855926514, condition=0.01
/root/torch/install/share/lua/5.1/torch/Tester.lua:26: in function 'assertlt'
/root/torch/install/share/lua/5.1/cunn/test.lua:1391: in function 'v'
LogSoftMax_forward_batch
error on state (forward)
LT(<) violation val=0.0010080337524414, condition=0.001
/root/torch/install/share/lua/5.1/torch/Tester.lua:26: in function 'assertlt'
/root/torch/install/share/lua/5.1/cunn/test.lua:2364: in function 'v'
I guess a similar issue was raised in #50 and solved(maybe?).
I updated to the latest torch and packages.
I use a ubuntu 14.04 docker image and cuda 7.0
Thanks.
The text was updated successfully, but these errors were encountered: