error with train.py #39

chituma110 · 2019-03-21T15:03:22Z

command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train.py -c=configs/m2det512_vgg.py --ngpu 8 -t True

raceback (most recent call last):
File "train.py", line 88, in
loss_l, loss_c = criterion(out, priors, targets)
File "/home/xxx/anaconda2/envs/M2Det/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/data2/xxx/Object_Detection/M2Det/layers/modules/multibox_loss.py", line 106, in forward
conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
RuntimeError: CUDA out of memory. Tried to allocate 3.80 GiB (GPU 0; 11.92 GiB total capacity; 8.33 GiB already allocated; 2.69 GiB free; 502.63 MiB cached)

dshahrokhian · 2019-03-21T15:25:09Z

Try reducing the batch size in the config file, it solved it for me.

chituma110 · 2019-03-21T15:47:14Z

I reduced batch size from 16 to 8,but got the same error .

MenGuangwen-CN-0411 · 2019-03-22T05:32:44Z

@chituma110 Maybe，so much num_workers would cause some other cost on different pc, set num_workers=0 and have a try. Tell me the result whether it work well.
Using the default set in 320x320-VGG cause OOM ,set batch size=2 and it's still OOM.Then set num_workers=0,it's well.I have one GT-1080 and using win10 pytorch1.0

MenGuangwen-CN-0411 · 2019-03-27T02:01:08Z

@dshahrokhian ，Sir，I want to konw whether you get the result described on coco2014 or VOC dataset in the paper ：m2det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid

DayChan · 2019-04-01T06:39:23Z

@dshahrokhian ，Sir，I want to konw whether you get the result described on coco2014 or VOC dataset in the paper ：m2det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid

Did you get the result of vgg16+m2det320 in the paper? I just can't reproduce it.

TekiLi · 2019-06-21T01:31:41Z

I reduced batch size from 16 to 8,but got the same error .

you may use the pytorch version is 0.3，change the pytorch version to 0.4 or 1.0

primary-studyer · 2019-07-20T03:13:24Z

I reduced batch size from 16 to 8,but got the same error .

batch再设置小一点就可以了就是会很慢。 epoch_size会很大

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error with train.py #39

error with train.py #39

chituma110 commented Mar 21, 2019

dshahrokhian commented Mar 21, 2019

chituma110 commented Mar 21, 2019

MenGuangwen-CN-0411 commented Mar 22, 2019 •

edited

Loading

MenGuangwen-CN-0411 commented Mar 27, 2019

DayChan commented Apr 1, 2019

TekiLi commented Jun 21, 2019

primary-studyer commented Jul 20, 2019 •

edited

Loading

error with train.py #39

error with train.py #39

Comments

chituma110 commented Mar 21, 2019

dshahrokhian commented Mar 21, 2019

chituma110 commented Mar 21, 2019

MenGuangwen-CN-0411 commented Mar 22, 2019 • edited Loading

MenGuangwen-CN-0411 commented Mar 27, 2019

DayChan commented Apr 1, 2019

TekiLi commented Jun 21, 2019

primary-studyer commented Jul 20, 2019 • edited Loading

MenGuangwen-CN-0411 commented Mar 22, 2019 •

edited

Loading

primary-studyer commented Jul 20, 2019 •

edited

Loading