You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your sharing! But when I reimplemented your model, I met severe gradient vanishing, and I found it came from the feature attention module after debugging. I think it might be that feature attention module uses ReLU at the top of it and it makes all negetive values zero which makes gradient zero when backpropagating. So I think it might be better removing the last ReLU or replacing it with LeakyReLU.
The text was updated successfully, but these errors were encountered:
Thanks for your feedback! I haven't met the gradient vanishing problem so I am not sure what caused it, but the choice of activation function could certainly be one of the reasons and I agree that changing ReLU to LeakyReLU can solve the problem. Also if you are trying to reproduce my results, you can check the PyTorch version, initialization method, optimizer and learning rate to see what the problem is.
Thanks for your sharing! But when I reimplemented your model, I met severe gradient vanishing, and I found it came from the feature attention module after debugging. I think it might be that feature attention module uses ReLU at the top of it and it makes all negetive values zero which makes gradient zero when backpropagating. So I think it might be better removing the last ReLU or replacing it with LeakyReLU.
The text was updated successfully, but these errors were encountered: