-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix None grad problem during training TOOD by adding SigmoidGeometricMean #7090
Conversation
from torch.nn import functional as F | ||
|
||
|
||
class SigmoidGeometricMean(Function): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we implement an interface named sigmoid_geometric_mean = SigmoidGeometricMean.apply here so that in tood_head we can simply use sigmoid_geometric_mean(xxx)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## dev #7090 +/- ##
==========================================
+ Coverage 62.41% 62.46% +0.04%
==========================================
Files 330 330
Lines 26199 26216 +17
Branches 4436 4437 +1
==========================================
+ Hits 16353 16375 +22
+ Misses 8976 8966 -10
- Partials 870 875 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
The training of TOOD often encounters None gradient during backpropagation, which would further cause None tensors in the next training step. Some issues in the original repo (fcjian/TOOD#11) might be also due to this error. The problem is caused by the naive implementation of sigmoid geometric mean function
cls_score = (cls_logits.sigmoid() * cls_prob.sigmoid()).sqrt()
. This output might be 0 ifcls_logits
orcls_prob
is a low negative value, which causes either inf grad of none grad during backpropagation.Modification
A reimplementation of
SigmoidGeometricMean
class as an inheritance oftorch.autograd.Function
is proposed. The backward function is derived analytically and would avoid and inf or none grad during bp.Checklist