Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with train.py #39

Open
chituma110 opened this issue Mar 21, 2019 · 7 comments
Open

error with train.py #39

chituma110 opened this issue Mar 21, 2019 · 7 comments

Comments

@chituma110
Copy link

command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -c=configs/m2det512_vgg.py --ngpu 8 -t True

raceback (most recent call last):
File "train.py", line 88, in
loss_l, loss_c = criterion(out, priors, targets)
File "/home/xxx/anaconda2/envs/M2Det/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data2/xxx/Object_Detection/M2Det/layers/modules/multibox_loss.py", line 106, in forward
conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
RuntimeError: CUDA out of memory. Tried to allocate 3.80 GiB (GPU 0; 11.92 GiB total capacity; 8.33 GiB already allocated; 2.69 GiB free; 502.63 MiB cached)

@dshahrokhian
Copy link

Try reducing the batch size in the config file, it solved it for me.

@chituma110
Copy link
Author

I reduced batch size from 16 to 8,but got the same error .

@MenGuangwen-CN-0411
Copy link

MenGuangwen-CN-0411 commented Mar 22, 2019

@chituma110 Maybe,so much num_workers would cause some other cost on different pc, set num_workers=0 and have a try. Tell me the result whether it work well.
Using the default set in 320x320-VGG cause OOM ,set batch size=2 and it's still OOM.Then set num_workers=0,it's well.I have one GT-1080 and using win10 pytorch1.0

@MenGuangwen-CN-0411
Copy link

@dshahrokhian ,Sir,I want to konw whether you get the result described on coco2014 or VOC dataset in the paper :m2det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid

@DayChan
Copy link

DayChan commented Apr 1, 2019

@dshahrokhian ,Sir,I want to konw whether you get the result described on coco2014 or VOC dataset in the paper :m2det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid

Did you get the result of vgg16+m2det320 in the paper? I just can't reproduce it.

@TekiLi
Copy link

TekiLi commented Jun 21, 2019

I reduced batch size from 16 to 8,but got the same error .

you may use the pytorch version is 0.3,change the pytorch version to 0.4 or 1.0

@primary-studyer
Copy link

primary-studyer commented Jul 20, 2019

I reduced batch size from 16 to 8,but got the same error .

batch再设置小一点就可以了 就是会很慢。 epoch_size会很大

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants