Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan problems during training #13

Closed
ashnair1 opened this issue Dec 31, 2018 · 7 comments
Closed

Nan problems during training #13

ashnair1 opened this issue Dec 31, 2018 · 7 comments

Comments

@ashnair1
Copy link

I'm having a possible gradient issue while training. When training on batches the following prompt comes up:

Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.463884, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.463872, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.463883, Avg Recall: -nan, count: 0
....

Does anyone how to rectify it?

@ashnair1 ashnair1 changed the title Training Problems Nan problems during training Dec 31, 2018
@avanetten
Copy link
Owner

What are the input image sizes you are training on? Sometimes inputs that are too large can give this error.

@ashnair1
Copy link
Author

ashnair1 commented Jan 10, 2019

My images are of size 900 x 900 pixels. They're from the Spacenet Off Nadir Dataset (AOI Atlanta).

@avanetten
Copy link
Owner

My guess would be an issue related to the label format, but I can't be sure without more information.

@ashnair1
Copy link
Author

ashnair1 commented Jan 13, 2019

My labels for an image are of the format class_id <x_center> <y_center> <width> <height>:

1 0.876977 0.555887 0.074074 0.042762
1 0.974364 0.451216 0.052320 0.040534
1 0.870561 0.447706 0.133511 0.036080
1 0.883241 0.374917 0.086092 0.034906
1 0.975940 0.373796 0.048898 0.031485

@avanetten
Copy link
Owner

I will have a better chance of helping if you can also provide the command you're using to begin training. One further possibility is that the labels need to be zero-indexed, so if you only have a single object class the labels should appear as:
0 0.876977 0.555887 0.074074 0.042762
0 0.974364 0.451216 0.052320 0.040534
0 0.870561 0.447706 0.133511 0.036080
0 0.883241 0.374917 0.086092 0.034906
0 0.975940 0.373796 0.048898 0.031485

@ashnair1
Copy link
Author

ashnair1 commented Jan 14, 2019

This is the command I used to begin training.

python simrdwn/core/simrdwn.py --framework=yolt --mode=train --outname=dense_buildings --yolt_object_labels_str=building --yolt_cfg_file=ave_dense.cfg --weight_file=yolov2-voc.weights --label_map_path=simrdwn/data/class_labels_building.pbtxt --nbands=3 --max_batches=30000 --batch_size=64 --subdivisions=16 --gpu=0

Oh, I thought 0 was reserved for background and categories were labelled starting from 1. I will look into this and get back to you.

@ashnair1
Copy link
Author

You were right. The problem was due to the class labelling not being zero indexed. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants