We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
add hyper connections
address #4
for adaptive attention, allow each head to have different adaptive we… …ight gating
offer attention softclamping
address #1
complete sidequest and bump to 0.1.0
handle the final norm at the end of the transformer, but make optional
ability to add a few register tokens
fix logic for skipping last feedforward in mmdit