We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
add non competitive gates
add an option for layernorming the output of product key memory
ready transformers with customizable product key memory for some inde… …pendent research
patch
ability to add gumble noise for stochastic sampling of memories
bring in differentiable topk, based on coordinate descent from wright… … et al.
add an attention dropout, default to layernorm on queries