Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient Vanishing #6

Closed
Asichurter opened this issue Oct 6, 2019 · 1 comment
Closed

Gradient Vanishing #6

Asichurter opened this issue Oct 6, 2019 · 1 comment

Comments

@Asichurter
Copy link

Thanks for your sharing! But when I reimplemented your model, I met severe gradient vanishing, and I found it came from the feature attention module after debugging. I think it might be that feature attention module uses ReLU at the top of it and it makes all negetive values zero which makes gradient zero when backpropagating. So I think it might be better removing the last ReLU or replacing it with LeakyReLU.

@gaotianyu1350
Copy link
Collaborator

Thanks for your feedback! I haven't met the gradient vanishing problem so I am not sure what caused it, but the choice of activation function could certainly be one of the reasons and I agree that changing ReLU to LeakyReLU can solve the problem. Also if you are trying to reproduce my results, you can check the PyTorch version, initialization method, optimizer and learning rate to see what the problem is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants