New method suggestions for additional merge potential (code+output comparisons included) #264
Replies: 30 comments 6 replies
-
Thanks for the suggestion, I've been interested in lossless merging, but the complexity of the code has kept me from getting into it. I will think about what you suggested. As for the Save button, it used to exist in the past, but has been removed. This is because merging is done again even if the Save button is pressed. The loaded model is in fp16 format and needs to be merged back. |
Beta Was this translation helpful? Give feedback.
-
No worries, I'm especially glad then I was able to share here how I ended up implementing it and how another method could be added too :) . Good luck whenever you might get to it. Ahh if that's the case it makes sense, with a button you would expect it to be able to save immediately but if there's that limitation and it needs to merge again anyway that button would be confusing, thanks for clarifying. |
Beta Was this translation helpful? Give feedback.
-
Added features. |
Beta Was this translation helpful? Give feedback.
-
You can actually get a lot of the performance back when using the filters by offloading it to the gpu by using CuPy. It shouldn't be too difficult to implement. |
Beta Was this translation helpful? Give feedback.
-
Very smooth implementation, thank you for the great work :) An error however I found in the latest update, when trying to save a file specifically as a safetesnor (with both normal calculation and cosine):
Also with how smooth this latest implementation, I was able to add another version of the Cosine merge, which mixes the weights separately before calculating cosine, and you can see in the demonstrated comparison below how this can result in favoring structures of A and details of B/and the other way round from modelA vs modelB (calculated in a different sequence, it's not a result you would get by just swapping the A/B model around), so I added and adjusted them as cosineA and cosineB calculation modes. I've also added in as a calculation mode smoothAdd which is the smoother filtered add difference method from the original post that was missed here. In supermerger.py I replace
with
Then in mergers.py I replace
with
and
with
and I added these includes necessary
@mariaWitch Thanks for the suggestion, however though I believe I got the code for that method, it looks to be restricted to CUDA devices and while trying to pip install it, it wouldn't work (couldn't find CUDA for some reason) and was trying to built it on my system itself which it couldn't, and so not sure myself how that would be properly implemented? Especially when just the necessary include would cause these problems. |
Beta Was this translation helpful? Give feedback.
-
I actually properly implemented it into Bayesian Merger, which has a very similar code structure to Super Merger. You can check it out here: github.com/mariaWitch/sd-webui-bayesian-merger/blob/double-diff-cosine/sd_webui_bayesian_merger/merger.py#L250-L263 But essentially I had to convert the tensor that gets passed to SciPy (In this case CuPyx) to a dl pack and then use CuPy to convert it into a CuPy Array, and then pass that to the filters. Once that was done, I was able to convert it back into a DLpack, and then convert it back into a tensor with from_dlpack. The reason why we have to convert the tensor to a DL pack is that this is the only supported way doing Zero-copy transfer of a cpu tensor into a CuPy array (which is on the gpu, as CuPyX does not support standard numpy arrays). By doing it this way, we avoid costly memory transfers between system ram and VRAM that would otherwise decrease performance. from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack
import cupy as cp
import cupyx.scipy as scipy
from cupyx.scipy.ndimage._filters import median_filter as filter These would be the imports that you would bring in if CuPy was installed, (these would be imported in place of scipy) |
Beta Was this translation helpful? Give feedback.
-
As for your installation issues, I have had no such bad luck. But, I think CuPy has experiemental support for ROCM as well. Either way it can exist as something that can work if the script can import it, otherwise it could just fail over. |
Beta Was this translation helpful? Give feedback.
-
Also, could you elaborate a little on what you mean by the "structure" of the model and the "details" of a model in the context that you used them in, it seems a bit abstract, and could mean a lot of different things. |
Beta Was this translation helpful? Give feedback.
-
@SwiftIllusion @mariaWitch |
Beta Was this translation helpful? Give feedback.
-
No worries :) glad to hear. I don't have the technical wizardy or verbal expertise as some, but I've tried with my own observations in its development/output alongside chatgpt to help provide some more guidance/details below, as I know what it's like to see new tech and have no idea what it's doing/how to take advantage of it. Hope it helps. normalAvailable modes : AllNormal calculation method. Can be used in all modes. cosineA/cosineBAvailable modes : weight sumThe comparison of two models is performed using cosine similarity, centered on the set ratio, and is calculated to eliminate loss due to merging. See below for further details. The original simple weight mode is the most basic method and works by linearly interpolating between the two models based on a given weight alpha. At alpha = 0, the output is the first model (model A), and at alpha = 1, the output is the second model (model B). Any other value of alpha results in a weighted average of the two models.
One key advantage of the cosine methods over the original simple weight mode is that they take into account the structural similarity between the two models, which can lead to better results when the two models are similar but not identical. Another advantage of the cosine methods is that they can help prevent overfitting and improve generalization by limiting the amount of detail from one model that is incorporated into the other. In the case of CosineA, we normalize the vectors of the first model (model A) before merging, so the resulting merged model will favor the structure of the first model while incorporating details from the second model. This is because we are essentially aligning the direction of the first model's vectors with the direction of the corresponding vectors in the second model.
Detail-wise for example note how above and below, in all cases there's more blur preserved for the background compared to foreground, instead of the linear difference in the original merge. On the other hand, in CosineB, we normalize the vectors of the second model (model B) before merging, so the resulting merged model will favor the structure of the second model while incorporating details from the first model. This is because we are aligning the direction of the second model's vectors with the direction of the corresponding vectors in the first model.
In summary, the choice between CosineA and CosineB depends on which model's structure you want to prioritize in the resulting merged model. If you want to prioritize the structure of the first model, use CosineA. If you want to prioritize the structure of the second model, use CosineB. Note also how the second model is more the 'reference point' for the merging looking at Alpha 1 compared to the changes at 0, so the order of models can also change the end result to look for your desired output. smoothAddAvailable modes : Add differenceA method of add difference that mixes the benefits of Median and Gaussian filters, to add model differences in a smoother way trying to avoid the negative 'burning' effect that can be seen when adding too many models this way. This also achieves more than just simply adding the difference at a lower value.
The functionality and result of just the Median filter
The functionality and result of just the Gaussian filter
|
Beta Was this translation helpful? Give feedback.
-
@mariaWitch Regrettably without being able to install the requirements for the include of your method, I've been unable to see that here. I also spent hours trying to implement other methods/performance improvements to the filters within the existing scope, but the closest I got was a different method for one of the two filters that resulted in a completely different/wrong output, the rest of the time was spent with errors, so I've had to consider that beyond the scope of what I can achieve. |
Beta Was this translation helpful? Give feedback.
-
So Structure refers to the background and pose, and details refer to the actual character details on the subject, that makes it a lot more clear. |
Beta Was this translation helpful? Give feedback.
-
@SwiftIllusion |
Beta Was this translation helpful? Give feedback.
-
I just checked the A/B cosine and the results are impressive. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
@recoilme Awesome, I'm really happy to hear that :), thank you very much for the original inspiration. |
Beta Was this translation helpful? Give feedback.
-
@SwiftIllusion Why was this changed from theta_0[key]= theta_0[key] * (1-k) + theta_1[key] * k to the line above? |
Beta Was this translation helpful? Give feedback.
-
@mariaWitch This was to fix the fact it was actually previously merging backwards (if you put the weight at 0.75, it would have been 0.75 to modelA instead of modelB). Now, as per the examples in the guide which was made after this fix, it has the output correctly going from 0 A to 1 B. |
Beta Was this translation helpful? Give feedback.
-
Thank you! That is clear enough for me. |
Beta Was this translation helpful? Give feedback.
-
@hako-mikan I hope you can and would appreciate you adding this whenever you get the opportunity, I've provided the code at the end of this post like previously, adding it into the latest version (Commits on Jun 5, 2023) as a new choice for Add Difference. The guide for this new methodtrainDifferenceAvailable modes : Add differenceThis method at its simplest, can be thought of as a 'super Lora' for permanent merges, Comparisons
Usage guidancePossibilities and general usageExpand a model with new concepts, or reinforce existing concepts (and quality output), instead of mixingSci-Fi Diffusion as an example https://civitai.com/models/4404?modelVersionId=4980 was trained on general sci-fi images. Direction of trainDifference and style of the difference mattersIt is harder for a model to learn to be realistic, than to be stylistic. trainDifference is not always the best solutionSometimes depending on the type/scope of the difference, cosine similarity merge can provide better results (if the differences aren't from SDv1.5 already, trainDifference both onto SDv1.5, and then cosine similiarity merge them from there before you trainDifference it back onto your working model). Gain the benefits of a trained model anywhereModels like knollingcase and Bubble Toys are cool, but their effort has been limited by the framework they were trained on. Now you can trainDifference them onto any of the newer models that people have developed. Limitations and what to avoid/problems and solutionsKnowing and having access to the origin of the model pre-training is requiredA lot of models have some mix of SDv1.4 now. This trainDifference merge is accurate enough that, if you were to try and for example train 'rev animated' onto 'Sci-fi Diffusion' with SDv1.5 as model (C), because 'rev animated's origin is an unknown ratio between SDv1.4 and SDv1.5 (and mix of individual in/out weights too), the merge would negatively affect the output (the 'training' would be offset/distorted), but you could trainDifference 'sci-fi Diffusion' onto 'rev animated' because it was trained on SDv1.5. After enough time / with similar materials, 'burning'/'over training' can eventually occurYou can 'pull back' the model at this point by cosine similarity merging it with SDv1.5, which helps ground it while keeping more qualities from the training. After enough merges, the 'clip/comprehension' can become heavy, negatively effecting simple promptsFor example complex prompts may still look good, but 'female portrait, blue eyes' could spill the 'blue' concept too much. Practical demonstration
The code for this new methodIn supermerger.py replace
with
Then in mergers.py replace
with
and replace
with
and replace
with
and between "cosineB" and "smoothAdd" methods, add
and after the last "del theta_1"add
|
Beta Was this translation helpful? Give feedback.
-
Added trainDifference. |
Beta Was this translation helpful? Give feedback.
-
No worries, thank you very much for your work on this/implementing it :) . |
Beta Was this translation helpful? Give feedback.
-
@SwiftIllusion @mariaWitch I just want to say that @sverfier8807 has implemented multi-threading for smoothAdd and now it works much faster. |
Beta Was this translation helpful? Give feedback.
-
@SwiftIllusion @hako-mikan maybe here instead of 1.8 multiplier should be 2? |
Beta Was this translation helpful? Give feedback.
-
@StAlKeR7779 Sorry as I don't know the value of the math you're displaying and I appreciate the thought to improve it further, but I did many tests across different merges (e.g. including merging models trained on SDv1.4 to SDv1.5 or SDv1.5 to Practically the same model, and different Lora comparisons which you can see in the guide how it correlates in strength). Anything beyond 1.8 started to 'burn/over-train' in a way that appeared greater than the original (I tested from 1 to 2). Even 1.9 appeared too much which surprised me at the time as I was expecting 2 to be the most natural value if it required more than 1, but 1.8 was the most representative and I've used it greatly since then with that value. |
Beta Was this translation helpful? Give feedback.
-
Q: what are the functional difference between these multi-LoRA merge algorithms (assuming no Checkpoints), with regard to overlapping ideas (or similar designs) vs independent concepts? (for @SwiftIllusion) |
Beta Was this translation helpful? Give feedback.
-
@TomLucidor |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
As for my recent tries — model B always wins. Even with alpha=0 the result of any cosA/B is never modA, it always greatly changed instead. Even super strong Haveall X falls being at ModA position. |
Beta Was this translation helpful? Give feedback.
-
Just a note that "train difference" is really just add difference but with an element-wise scale factor applied to where Because of this, since weights are typically initialized randomly for training and the distribution after training is similar to the initial distribution, |
Beta Was this translation helpful? Give feedback.
-
While trying to find methods to improve models, one of the things to look into was merging and hopefully the below discoveries are valuable in helping improve/provide additional options for merging.
Sum merging
Initially for this I started with inspiration after finding https://github.com/recoilme/losslessmix. However through ChatGPT (regardless of your feelings about utilizing this/concerns with accuracy the below outputs hopefully show the interactions held value), I found that to just be working with the vector orientations. I expanded that to also take into account the magnitude and combined the results for the best merging outputs in my comparisons.
One of the difficulties with sum merging is you can find you lose some things through the merge, below is a comparison with different prompts and 2 seeds between regular merge and the new method.
You can see below the improved details/depth especially in the jewelry, and in the top girls background, the top birds twig also connects better besides the extra details, and improved hands for the guy on the right.
Add difference merging
One of the things that has been most difficult with add merging, is the rate in which consecutive merging in attempts to gain more learning can lead to burnt/overexposed-like colors and edges.
To demonstrate this and what the new method achieves, these are comparisons starting with seekArtMega20, then adding dreamlike and openjourneyV2 (with sdv1.5 as model C for the difference).
Note this does meaningfully increase merge time (5-10+ times approx) but for a better outcome for future image generations that's absolutely worth it.
The code
This may be a bit messy for implementation, I just replaced the existing methods for merging/adding (I don't have the ability or experience to make this into additional options/a pull request), but here is what I used.
Relevant changes
Replacing "theta_0[key] = (1 - current_alpha) * theta_0[key] + current_alpha * theta_1[key]" with
Required 'pip install scipy' in the Automatic1111 directory for the filters
Add above "theta_0[key] = theta_0[key] + current_alpha * theta_1[key]"
The full script
(The above within the full context of the merger script as it was at the time I made the above changes)
Minor suggestion
Not necessary but would be appreciated if "save settings" had a "save merge" button so you didn't need to toggle "save model" and then re-merge (more relevant with the longer time the above methods take to merge).
Beta Was this translation helpful? Give feedback.
All reactions