Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orthogonal lora layer init #2389

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

winglian
Copy link
Contributor

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding an option for orthogonal initialization of LoRA weights.

Note that the OLoRA initialization is also aimed at orthogonal initialization. Maybe it would be worth it to compare the two. A disadvantage of OLoRA is, however, that the base weights are also modified, which requires users to take some extra steps if they want to load the model with other LoRA adapters, for instance. Pinging @tokenizer-decode just in case they wanna check this PR.

Before merging, we would also need some more additions to this PR:

  • Update the docstring of LoraConfig, similar to the help. How about also adding a link to the blog post (AFAICT there is no paper?).
  • Add a unit test. Check out the tests in this test class.
  • Let's run make style to satisfy the linter.

X = torch.randn(rank, rank)
Q, _ = torch.linalg.qr(X)
set1 = Q[0::2,:] # Odd rows
set2 = Q[1::2,:] # Even rows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r needs to be even for this to work, right? Let's check it and raise an error with a helpful message if it's not.

@tokenizer-decode
Copy link
Contributor

This is just OLoRA but starting from random weights. How can starting from random weights, rather than getting that information from pretrained weights, converge faster? Did you actually run tests? Because in our research, and every other subsequent research showed that OLoRA and other derivatives like PISSA etc. perform better than any random initialization. For a list of studies see.

@@ -369,6 +369,7 @@ class LoraConfig(PeftConfig):
"nonnegative integer. "
"Passing `'corda'` results in CorDA initialization. "
"Pass `'loftq'` to use LoftQ initialization."
"Pass `'orthogonal'` to use orthogonal initialization."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing to the user.

@BenjaminBossan
Copy link
Member

@tokenizer-decode Thanks for commenting. It would indeed by nice to see a comparison with OLoRA or PiSSA, which the linked blog post didn't test. I could see an argument for the proposed initialization method being easier to use, as the base weights are unchanged, so even if it's not as good, there could be some value. WDYT?

@tokenizer-decode
Copy link
Contributor

I honestly don't see the performance benefit. But if you think there is an ease of use benefit, there could be some value.

This goes for every other decomposition method, SVD e.g.. If the value is not updating the base weights, we can always let the user use the method with a parameter like no_update and we would turn off the part where we update the base weights.

But I might add, for future readers who are confused, updating base weights is generally where you get the performance.

@winglian
Copy link
Contributor Author

winglian commented Feb 21, 2025

Screenshot 2025-02-20 at 10 36 09 PM here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step. orthogonal outperforms dora too.

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

EDIT: something like

class InitLoraWeights(Protocol):
    def __call__(self, layer, adapter_name) -> None:
        pass

and the Config typing would look something like:

bool | Literal[...] | InitLoraWeights

@BenjaminBossan
Copy link
Member

here's GRPO + PEFT. olora initialization goes straight to 0.0 rewards after the first step.

Thanks for running the tests 🎉 Is the script open so that we can check what might be going on with OLoRA?

If it's easier, I can convert this so that the init_lora accepts a callable and users can provide their own initialization function

In general, we would like to avoid this, even though it could be practical. The reason is that we wouldn't be able to serialize the LoraConfig into JSON with values that are Python code.

In sum, I think we can still proceed with the orthogonal weight initialization method. As I mentioned, even if it did not outperform OLoRA or similar methods, it could still be valuable as a more user friendly option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants