Gradient Vanishing #6

Asichurter · 2019-10-06T14:13:53Z

Thanks for your sharing! But when I reimplemented your model, I met severe gradient vanishing, and I found it came from the feature attention module after debugging. I think it might be that feature attention module uses ReLU at the top of it and it makes all negetive values zero which makes gradient zero when backpropagating. So I think it might be better removing the last ReLU or replacing it with LeakyReLU.

gaotianyu1350 · 2019-10-08T05:48:53Z

Thanks for your feedback! I haven't met the gradient vanishing problem so I am not sure what caused it, but the choice of activation function could certainly be one of the reasons and I agree that changing ReLU to LeakyReLU can solve the problem. Also if you are trying to reproduce my results, you can check the PyTorch version, initialization method, optimizer and learning rate to see what the problem is.

gaotianyu1350 closed this as completed Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Vanishing #6

Gradient Vanishing #6

Asichurter commented Oct 6, 2019

gaotianyu1350 commented Oct 8, 2019

Gradient Vanishing #6

Gradient Vanishing #6

Comments

Asichurter commented Oct 6, 2019

gaotianyu1350 commented Oct 8, 2019