Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forget gate bias should probably be initialized to 1 #3

Open
twoletters opened this issue May 14, 2024 · 1 comment
Open

Forget gate bias should probably be initialized to 1 #3

twoletters opened this issue May 14, 2024 · 1 comment

Comments

@twoletters
Copy link

" self.bf = nn.Parameter(torch.randn(1))\n",

The training of traditional LSTMs benefits from initializing the forget gate bias to 1. It prevents the LSTM from forgetting until it has learned to do so, speeding up training.

It seems to me that sLSTM is essentially the same as the traditional LSTM in that regard, and initializing the forget gate biases to 1 should speed up training. Don't take my word for it, though. Test, don't trust.

@twoletters
Copy link
Author

I amend my comment: this is useful only if the sigmoid is used as the activation function of the forget gate (one proposed option in the paper). If the exponential is used, the forget gate will be close to 1 if the parameters are close to zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant