New method suggestions for additional merge potential (code+output comparisons included) #264

SwiftIllusion · 2023-03-19T20:54:46Z

SwiftIllusion
Mar 19, 2023

While trying to find methods to improve models, one of the things to look into was merging and hopefully the below discoveries are valuable in helping improve/provide additional options for merging.

Sum merging

Initially for this I started with inspiration after finding https://github.com/recoilme/losslessmix. However through ChatGPT (regardless of your feelings about utilizing this/concerns with accuracy the below outputs hopefully show the interactions held value), I found that to just be working with the vector orientations. I expanded that to also take into account the magnitude and combined the results for the best merging outputs in my comparisons.
One of the difficulties with sum merging is you can find you lose some things through the merge, below is a comparison with different prompts and 2 seeds between regular merge and the new method.

You can see below the improved details/depth especially in the jewelry, and in the top girls background, the top birds twig also connects better besides the extra details, and improved hands for the guy on the right.

Add difference merging

One of the things that has been most difficult with add merging, is the rate in which consecutive merging in attempts to gain more learning can lead to burnt/overexposed-like colors and edges.
To demonstrate this and what the new method achieves, these are comparisons starting with seekArtMega20, then adding dreamlike and openjourneyV2 (with sdv1.5 as model C for the difference).

The starting point
The merge complete (with the burning evident)
The merges each at only 0.5 strength for a comparison of what you normally need to do-and it's still a bit much.
The first method using a Gaussian filter to smooth the vectors in the process
The second method using a Median filter
The final completed method that combines the Gaussian and Median filter to get the best result

Note this does meaningfully increase merge time (5-10+ times approx) but for a better outcome for future image generations that's absolutely worth it.

The code

This may be a bit messy for implementation, I just replaced the existing methods for merging/adding (I don't have the ability or experience to make this into additional options/a pull request), but here is what I used.

Relevant changes

Sum merging

import numpy as np

    sim = torch.nn.CosineSimilarity(dim=0)
    sims = np.array([], dtype=np.float64)
    for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
        # skip VAE model parameters to get better results
        if "first_stage_model" in key: continue
        if "model" in key and key in theta_1:
            simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
            dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
            magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
            combined_similarity = (simab + magnitude_similarity) / 2.0
            sims = np.append(sims, combined_similarity.numpy())
    sims = sims[~np.isnan(sims)]
    sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
    sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

Replacing "theta_0[key] = (1 - current_alpha) * theta_0[key] + current_alpha * theta_1[key]" with

                    # skip VAE model parameters to get better results
                    if "first_stage_model" in key: continue
                    if "model" in key and key in theta_0:
                        simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                        dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                        magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                        combined_similarity = (simab + magnitude_similarity) / 2.0
                        k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                        k = k - current_alpha
                        k = k.clip(min=.0,max=1.)
                        theta_0[key] = theta_0[key] * (1 - k) + theta_1[key] * k

Add Difference merging
Required 'pip install scipy' in the Automatic1111 directory for the filters

import scipy.ndimage
from scipy.ndimage.filters import median_filter as filter

Add above "theta_0[key] = theta_0[key] + current_alpha * theta_1[key]"

                # Apply median filter to the weight differences
                filtered_diff = scipy.ndimage.median_filter(theta_1[key].to(torch.float32).cpu().numpy(), size=3)

                # Apply Gaussian filter to the filtered differences
                filtered_diff = scipy.ndimage.gaussian_filter(filtered_diff, sigma=1)
                
                theta_1[key] = torch.tensor(filtered_diff)
                
                # Add the filtered differences to the original weights

The full script

(The above within the full context of the merger script as it was at the time I made the above changes)

from linecache import clearcache
import os
import gc
import os.path
import re
import torch
import tqdm
import datetime
import csv
import numpy as np
import scipy.ndimage
from scipy.ndimage.filters import median_filter as filter
from PIL import Image, ImageFont, ImageDraw
from fonts.ttf import Roboto
from tqdm import tqdm
from modules import shared, processing, sd_models, images, sd_samplers,scripts
from modules.ui import  plaintext_to_html
from modules.shared import opts
from modules.processing import create_infotext,Processed
from modules.sd_models import  load_model,checkpoints_loaded
from scripts.mergers.model_util import usemodelgen,filenamecutter,savemodel

from inspect import currentframe

mergedmodel=[]
typesg = ["none","alpha","beta (if Triple or Twice is not selected,Twice automatically enable)","alpha and beta","seed", "mbw alpha","mbw beta","mbw alpha and beta", "model_A","model_B","model_C","pinpoint blocks (alpha or beta must be selected for another axis)","elemental","pinpoint element","effective elemental checker"]
types = ["none","alpha","beta","alpha and beta","seed", "mbw alpha ","mbw beta","mbw alpha and beta", "model_A","model_B","model_C","pinpoint blocks","elemental","pinpoint element","effective"]
modes=["Weight" ,"Add" ,"Triple","Twice"]
sevemodes=["save model", "overwrite"]
#type[0:aplha,1:beta,2:seed,3:mbw,4:model_A,5:model_B,6:model_C]
#msettings=[0 weights_a,1 weights_b,2 model_a,3 model_b,4 model_c,5 base_alpha,6 base_beta,7 mode,8 useblocks,9 custom_name,10 save_sets,11 id_sets,12 wpresets]
#id sets "image", "PNG info","XY grid"

hear = False
hearm = False
non3 = [None]*3


def caster(news,hear):
    if hear: print(news)

def casterr(*args,hear=hear):
    if hear:
        names = {id(v): k for k, v in currentframe().f_back.f_locals.items()}
        print('\n'.join([names.get(id(arg), '???') + ' = ' + repr(arg) for arg in args]))
    
  #msettings=[weights_a,weights_b,model_a,model_b,model_c,device,base_alpha,base_beta,mode,loranames,useblocks,custom_name,save_sets,id_sets,wpresets,deep]  
def smergegen(weights_a,weights_b,model_a,model_b,model_c,base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,wpresets,deep,esettings,
                    prompt,nprompt,steps,sampler,cfg,seed,w,h,currentmodel,imggen):

    deepprint  = True if "print change" in esettings else False

    result,currentmodel,modelid,theta_0 = smerge(weights_a,weights_b,model_a,model_b,model_c,base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,wpresets,deep,deepprint=deepprint)

    if "ERROR" in result: return result, *non3

    usemodelgen(theta_0,model_a)

    save = True if sevemodes[0] in save_sets else False

    result = savemodel(theta_0,currentmodel,custom_name,save_sets,model_a) if save else "Merged model loaded:"+currentmodel

    gc.collect()

    if imggen :
        images = simggen(prompt,nprompt,steps,sampler,cfg,seed,w,h,currentmodel,id_sets,modelid)
        return result,currentmodel,*images[:4]
    else:
        return result,currentmodel

NUM_INPUT_BLOCKS = 12
NUM_MID_BLOCK = 1
NUM_OUTPUT_BLOCKS = 12
NUM_TOTAL_BLOCKS = NUM_INPUT_BLOCKS + NUM_MID_BLOCK + NUM_OUTPUT_BLOCKS
blockid=["BASE","IN00","IN01","IN02","IN03","IN04","IN05","IN06","IN07","IN08","IN09","IN10","IN11","M00","OUT00","OUT01","OUT02","OUT03","OUT04","OUT05","OUT06","OUT07","OUT08","OUT09","OUT10","OUT11"]
     
def smerge(weights_a,weights_b,model_a,model_b,model_c,base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,wpresets,deep,deepprint = False):
    caster("merge start",hearm)
    global hear
    global mergedmodel

    gc.collect()

    # for from file
    if type(useblocks) is str:
        useblocks = True if useblocks =="True" else False
    if type(base_alpha) == str:base_alpha = float(base_alpha)
    if type(base_beta) == str:base_beta  = float(base_beta)

    # preset to weights
    if wpresets != False and useblocks:
        weights_a = wpreseter(weights_a,wpresets)
        weights_b = wpreseter(weights_b,wpresets)

    # mode select booleans
    save = True if sevemodes[0] in save_sets else False
    usebeta = modes[2] in mode or modes[3] in mode
    
    if not useblocks:
        weights_a = weights_b = ""
    #for save log and save current model
    mergedmodel =[weights_a,weights_b,
                            hashfromname(model_a),hashfromname(model_b),hashfromname(model_c),
                            base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,deep].copy()
    
    model_a = namefromhash(model_a)
    model_b = namefromhash(model_b)
    model_c = namefromhash(model_c)

    caster(mergedmodel,False)

    if len(deep) > 0:
        deep = deep.replace("\n",",")
        deep = deep.split(",")

    #format check
    if model_a =="" or model_b =="" or ((not modes[0] in mode) and model_c=="") : 
        return "ERROR: Necessary model is not selected",*non3
    
    #for MBW text to list
    if useblocks:
        weights_a_t=weights_a.split(',',1)
        weights_b_t=weights_b.split(',',1)
        base_alpha  = float(weights_a_t[0])    
        weights_a = [float(w) for w in weights_a_t[1].split(',')]
        caster(f"from {weights_a_t}, alpha = {base_alpha},weights_a ={weights_a}",hearm)
        if len(weights_a) != 25:return f"ERROR: weights alpha value must be {26}.",*non3
        if usebeta:
            base_beta = float(weights_b_t[0]) 
            weights_b = [float(w) for w in weights_b_t[1].split(',')]
            caster(f"from {weights_b_t}, beta = {base_beta},weights_a ={weights_b}",hearm)
            if len(weights_b) != 25: return f"ERROR: weights beta value must be {26}.",*non3

    caster("model load start",hearm)

    print(f"  model A  \t: {model_a}")
    print(f"  model B  \t: {model_b}")
    print(f"  model C  \t: {model_c}")
    print(f"  alpha,beta\t: {base_alpha,base_beta}")
    print(f"  weights_alpha\t: {weights_a}")
    print(f"  weights_beta\t: {weights_b}")
    print(f"  mode\t\t: {mode}")
    print(f"  MBW \t\t: {useblocks}")

    theta_1=load_model_weights_m(model_b,False,True,save).copy()

    if modes[1] in mode:#Add
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
        for key in tqdm(theta_1.keys()):
            if 'model' in key:
                if key in theta_2:
                    t2 = theta_2.get(key, torch.zeros_like(theta_1[key]))
                    theta_1[key] = theta_1[key]- t2
                else:
                    theta_1[key] = torch.zeros_like(theta_1[key])
        del theta_2

    theta_0=load_model_weights_m(model_a,True,False,save).copy()

    if modes[2] in mode or modes[3] in mode:#Tripe or Twice
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()

    alpha = base_alpha
    beta = base_beta

    re_inp = re.compile(r'\.input_blocks\.(\d+)\.')  # 12
    re_mid = re.compile(r'\.middle_block\.(\d+)\.')  # 1
    re_out = re.compile(r'\.output_blocks\.(\d+)\.') # 12

    chckpoint_dict_skip_on_merge = ["cond_stage_model.transformer.text_model.embeddings.position_ids"]
    count_target_of_basealpha = 0

    
    sim = torch.nn.CosineSimilarity(dim=0)
    sims = np.array([], dtype=np.float64)
    for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
        # skip VAE model parameters to get better results
        if "first_stage_model" in key: continue
        if "model" in key and key in theta_1:
            simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
            dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
            magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
            combined_similarity = (simab + magnitude_similarity) / 2.0
            sims = np.append(sims, combined_similarity.numpy())
    sims = sims[~np.isnan(sims)]
    sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
    sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

    for key in (tqdm(theta_0.keys(), desc="Stage 1/2") if not False else theta_0.keys()):
        if "model" in key and key in theta_1:
            if usebeta and not key in theta_2:
                continue

            weight_index = -1
            current_alpha = alpha
            current_beta = beta

            if key in chckpoint_dict_skip_on_merge:
                continue

            # check weighted and U-Net or not
            if weights_a is not None and 'model.diffusion_model.' in key:
                # check block index
                weight_index = -1

                if 'time_embed' in key:
                    weight_index = 0                # before input blocks
                elif '.out.' in key:
                    weight_index = NUM_TOTAL_BLOCKS - 1     # after output blocks
                else:
                    m = re_inp.search(key)
                    if m:
                        inp_idx = int(m.groups()[0])
                        weight_index = inp_idx
                    else:
                        m = re_mid.search(key)
                        if m:
                            weight_index = NUM_INPUT_BLOCKS
                        else:
                            m = re_out.search(key)
                            if m:
                                out_idx = int(m.groups()[0])
                                weight_index = NUM_INPUT_BLOCKS + NUM_MID_BLOCK + out_idx

                if weight_index >= NUM_TOTAL_BLOCKS:
                    print(f"ERROR: illegal block index: {key}")
                    return f"ERROR: illegal block index: {key}",None,None
                
                if weight_index >= 0 and useblocks:
                    current_alpha = weights_a[weight_index]
                    if usebeta: current_beta = weights_b[weight_index]
            else:
                count_target_of_basealpha = count_target_of_basealpha + 1

            if len(deep) > 0:
                skey = key + blockid[weight_index+1]
                for d in deep:
                    if d.count(":") != 2 :continue
                    dbs,dws,dr = d.split(":")[0],d.split(":")[1],d.split(":")[2]
                    dbs,dws = dbs.split(" "), dws.split(" ")
                    dbn,dbs = (True,dbs[1:]) if dbs[0] == "NOT" else (False,dbs)
                    dwn,dws = (True,dws[1:]) if dws[0] == "NOT" else (False,dws)
                    flag = dbn
                    for db in dbs:
                        if db in skey:
                            flag = not dbn
                    if flag:flag = dwn
                    else:continue
                    for dw in dws:
                        if dw in skey:
                            flag = not dwn
                    if flag:
                        dr = float(dr)
                        if deepprint :print(dbs,dws,key,dr)
                        current_alpha = dr

            if modes[1] in mode:#Add
                caster(f"model A[{key}] +  {current_alpha} + * (model B - model C)[{key}]", hear)
                
                # Apply median filter to the weight differences
                filtered_diff = scipy.ndimage.median_filter(theta_1[key].to(torch.float32).cpu().numpy(), size=3)

                # Apply Gaussian filter to the filtered differences
                filtered_diff = scipy.ndimage.gaussian_filter(filtered_diff, sigma=1)
                
                theta_1[key] = torch.tensor(filtered_diff)
                
                # Add the filtered differences to the original weights
                theta_0[key] = theta_0[key] + current_alpha * theta_1[key]
            elif modes[2] in mode:#Triple
                caster(f"model A[{key}] +  {1-current_alpha-current_beta} +  model B[{key}]*{current_alpha} + model C[{key}]*{current_beta}",hear)
                theta_0[key] = (1 - current_alpha-current_beta) * theta_0[key] + current_alpha * theta_1[key]+current_beta * theta_2[key]
            elif modes[3] in mode:#Twice
                caster(f"model A[{key}] +  {1-current_alpha} + * model B[{key}]*{alpha}",hear)
                caster(f"model A+B[{key}] +  {1-current_beta} + * model C[{key}]*{beta}",hear)
                theta_0[key] = (1 - current_alpha) * theta_0[key] + current_alpha * theta_1[key]
                theta_0[key] = (1 - current_beta) * theta_0[key] + current_beta * theta_2[key]
            else:#Weight
                if current_alpha == 1:
                    caster(f"alpha = 0,model A[{key}=model B[{key}",hear)
                    theta_0[key] = theta_1[key]
                elif current_alpha !=0:
                    caster(f"model A[{key}] +  {1-current_alpha} + * (model B)[{key}]*{alpha}",hear)
                    
                    # skip VAE model parameters to get better results
                    if "first_stage_model" in key: continue
                    if "model" in key and key in theta_0:
                        simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                        dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                        magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                        combined_similarity = (simab + magnitude_similarity) / 2.0
                        k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                        k = k - current_alpha
                        k = k.clip(min=.0,max=1.)
                        theta_0[key] = theta_0[key] * (1 - k) + theta_1[key] * k

    currentmodel = makemodelname(weights_a,weights_b,model_a, model_b,model_c, base_alpha,base_beta,useblocks,mode)

    for key in tqdm(theta_1.keys(), desc="Stage 2/2"):
        if key in chckpoint_dict_skip_on_merge:
            continue
        if "model" in key and key not in theta_0:
            theta_0.update({key:theta_1[key]})

    modelid = rwmergelog(currentmodel,mergedmodel)

    caster(mergedmodel,False)

    return "",currentmodel,modelid,theta_0

def load_model_weights_m(model,model_a,model_b,save):
    checkpoint_info = sd_models.get_closet_checkpoint_match(model)
    sd_model_name = checkpoint_info.model_name

    cachenum = shared.opts.sd_checkpoint_cache
    
    if save:        
        if model_a:
            load_model(checkpoint_info)
        print(f"Loading weights [{sd_model_name}] from file")
        return sd_models.read_state_dict(checkpoint_info.filename,"cuda")

    if checkpoint_info in checkpoints_loaded:
        print(f"Loading weights [{sd_model_name}] from cache")
        return checkpoints_loaded[checkpoint_info]
    elif cachenum>0 and model_a:
        load_model(checkpoint_info)
        print(f"Loading weights [{sd_model_name}] from cache")
        return checkpoints_loaded[checkpoint_info]
    elif cachenum>1 and model_b:
        load_model(checkpoint_info)
        print(f"Loading weights [{sd_model_name}] from cache")
        return checkpoints_loaded[checkpoint_info]
    elif cachenum>2:
        load_model(checkpoint_info)
        print(f"Loading weights [{sd_model_name}] from cache")
        return checkpoints_loaded[checkpoint_info]
    else:
        if model_a:
            load_model(checkpoint_info)
        print(f"Loading weights [{sd_model_name}] from file")
        return sd_models.read_state_dict(checkpoint_info.filename,"cuda")

def makemodelname(weights_a,weights_b,model_a, model_b,model_c, alpha,beta,useblocks,mode):
    model_a=filenamecutter(model_a)
    model_b=filenamecutter(model_b)
    model_c=filenamecutter(model_c)

    modes=["Weight" ,"Add" ,"Triple","Twice","Diff"]

    if type(alpha) == str:alpha = float(alpha)
    if type(beta)== str:beta  = float(beta)

    if useblocks:
        if modes[1] in mode:#add
            currentmodel =f"{model_a} + ({model_b} - {model_c}) x alpha ({str(round(alpha,3))},{','.join(str(s) for s in weights_a)}"
        elif modes[2] in mode:#triple
            currentmodel =f"{model_a} x (1-alpha-beta) + {model_b} x alpha + {model_c} x beta (alpha = {str(round(alpha,3))},{','.join(str(s) for s in weights_a)},beta = {beta},{','.join(str(s) for s in weights_b)})"
        elif modes[3] in mode:#twice
            currentmodel =f"({model_a} x (1-alpha) + {model_b} x alpha)x(1-beta)+  {model_c} x beta ({str(round(alpha,3))},{','.join(str(s) for s in weights_a)})_({str(round(beta,3))},{','.join(str(s) for s in weights_b)})"
        else:
            currentmodel =f"{model_a} x (1-alpha) + {model_b} x alpha ({str(round(alpha,3))},{','.join(str(s) for s in weights_a)})"
    else:
        if modes[1] in mode:#add
            currentmodel =f"{model_a} + ({model_b} -  {model_c}) x {str(round(alpha,3))}"
        elif modes[2] in mode:#triple
            currentmodel =f"{model_a} x {str(round(1-alpha-beta,3))} + {model_b} x {str(round(alpha,3))} + {model_c} x {str(round(beta,3))}"
        elif modes[3] in mode:#twice
            currentmodel =f"({model_a} x {str(round(1-alpha,3))} +{model_b} x {str(round(alpha,3))}) x {str(round(1-beta,3))} + {model_c} x {str(round(beta,3))}"
        else:
            currentmodel =f"{model_a} x {str(round(1-alpha,3))} + {model_b} x {str(round(alpha,3))}"
    return currentmodel

path_root = scripts.basedir()

def rwmergelog(mergedname = "",settings= [],id = 0):
    setting = settings.copy()
    filepath = os.path.join(path_root, "mergehistory.csv")
    is_file = os.path.isfile(filepath)
    if not is_file:
        with open(filepath, 'a') as f:
                                       #msettings=[0 weights_a,1 weights_b,2 model_a,3 model_b,4 model_c,5 base_alpha,6 base_beta,7 mode,8 useblocks,9 custom_name,10 save_sets,11 id_sets,12 wpresets]
            f.writelines('"ID","time","name","weights alpha","weights beta","model A","model B","model C","alpha","beta","mode","use MBW","plus lora","custum name","save setting","use ID"\n')
    with  open(filepath, 'r+') as f:
        reader = csv.reader(f)
        mlist = [raw for raw in reader]
        if mergedname != "":
            mergeid = len(mlist)
            setting.insert(0,mergedname)
            for i,x in enumerate(setting):
                if "," in str(x):setting[i] = f'"{str(setting[i])}"'
            text = ",".join(map(str, setting))
            text=str(mergeid)+","+datetime.datetime.now().strftime('%Y.%m.%d %H.%M.%S.%f')[:-7]+"," + text + "\n"
            f.writelines(text)
            return mergeid
        try:
            out = mlist[int(id)]
        except:
            out = "ERROR: OUT of ID index"
        return out

def draw_origin(grid, text,width,height,width_one):
    grid_d= Image.new("RGB", (grid.width,grid.height), "white")
    grid_d.paste(grid,(0,0))
    def get_font(fontsize):
        try:
            return ImageFont.truetype(opts.font or Roboto, fontsize)
        except Exception:
            return ImageFont.truetype(Roboto, fontsize)
    d= ImageDraw.Draw(grid_d)
    color_active = (0, 0, 0)
    fontsize = (width+height)//25
    fnt = get_font(fontsize)

    if grid.width != width_one:
        while d.multiline_textsize(text, font=fnt)[0] > width_one*0.75 and fontsize > 0:
            fontsize -=1
            fnt = get_font(fontsize)
    d.multiline_text((0,0), text, font=fnt, fill=color_active,align="center")
    return grid_d

def wpreseter(w,presets):
    if "," not in w and w != "":
        presets=presets.splitlines()
        wdict={}
        for l in presets:
            if ":" in l :
                key = l.split(":",1)[0]
                wdict[key.strip()]=l.split(":",1)[1]
            if "\t" in l:
                key = l.split("\t",1)[0]
                wdict[key.strip()]=l.split("\t",1)[1]
        if w.strip() in wdict:
            name = w
            w = wdict[w.strip()]
            print(f"weights {name} imported from presets : {w}")
    return w

def fullpathfromname(name):
    if hash == "" or hash ==[]: return ""
    checkpoint_info = sd_models.get_closet_checkpoint_match(name)
    return checkpoint_info.filename

def namefromhash(hash):
    if hash == "" or hash ==[]: return ""
    checkpoint_info = sd_models.get_closet_checkpoint_match(hash)
    return checkpoint_info.model_name

def hashfromname(name):
    from modules import sd_models
    if name == "" or name ==[]: return ""
    checkpoint_info = sd_models.get_closet_checkpoint_match(name)
    if checkpoint_info.shorthash is not None:
        return checkpoint_info.shorthash
    return checkpoint_info.calculate_shorthash()

def simggen(prompt, nprompt, steps, sampler, cfg, seed, w, h,mergeinfo="",id_sets=[],modelid = "no id"):
    shared.state.begin()
    p = processing.StableDiffusionProcessingTxt2Img(
        sd_model=shared.sd_model,
        do_not_save_grid=True,
        do_not_save_samples=True,
        do_not_reload_embeddings=True,
    )
    p.batch_size = 1
    p.prompt = prompt
    p.negative_prompt = nprompt
    p.steps = steps
    p.sampler_name = sd_samplers.samplers[sampler].name
    p.cfg_scale = cfg
    p.seed = seed
    p.width = w
    p.height = h
    p.seed_resize_from_w=0
    p.seed_resize_from_h=0
    p.denoising_strength=None

    if type(p.prompt) == list:
        p.all_prompts = [shared.prompt_styles.apply_styles_to_prompt(x, p.styles) for x in p.prompt]
    else:
        p.all_prompts = [shared.prompt_styles.apply_styles_to_prompt(p.prompt, p.styles)]

    if type(p.negative_prompt) == list:
        p.all_negative_prompts = [shared.prompt_styles.apply_negative_styles_to_prompt(x, p.styles) for x in p.negative_prompt]
    else:
        p.all_negative_prompts = [shared.prompt_styles.apply_negative_styles_to_prompt(p.negative_prompt, p.styles)]

    processed:Processed = processing.process_images(p)
    if "image" in id_sets: processed.images[0] =  draw_origin(processed.images[0], str(modelid),w,h,w)
    image = processed.images[0]
    if "PNG info" in id_sets:mergeinfo = mergeinfo + " ID " + str(modelid)

    infotext = create_infotext(p, p.all_prompts, p.all_seeds, p.all_subseeds)
    if infotext.count("Steps: ")>1:
        infotext = infotext[:infotext.rindex("Steps")]

    infotexts = infotext.split(",")
    for i,x in enumerate(infotexts):
        if "Model:"in x:
            infotexts[i] = " Model: "+mergeinfo.replace(","," ")
    infotext= ",".join(infotexts)
    images.save_image(image, opts.outdir_txt2img_samples, "",p.seed, p.prompt,shared.opts.samples_format, p=p,info=infotext)
    shared.state.end()
    return processed.images,infotext,plaintext_to_html(processed.info), plaintext_to_html(processed.comments),p

Minor suggestion

Not necessary but would be appreciated if "save settings" had a "save merge" button so you didn't need to toggle "save model" and then re-merge (more relevant with the longer time the above methods take to merge).

hako-mikan · 2023-03-21T17:04:37Z

hako-mikan
Mar 21, 2023
Maintainer

Thanks for the suggestion, I've been interested in lossless merging, but the complexity of the code has kept me from getting into it. I will think about what you suggested.

As for the Save button, it used to exist in the past, but has been removed. This is because merging is done again even if the Save button is pressed. The loaded model is in fp16 format and needs to be merged back.

0 replies

SwiftIllusion · 2023-03-21T17:41:26Z

SwiftIllusion
Mar 21, 2023
Author

No worries, I'm especially glad then I was able to share here how I ended up implementing it and how another method could be added too :) . Good luck whenever you might get to it.

Ahh if that's the case it makes sense, with a button you would expect it to be able to save immediately but if there's that limitation and it needs to merge again anyway that button would be confusing, thanks for clarifying.

0 replies

hako-mikan · 2023-04-13T11:04:55Z

hako-mikan
Apr 13, 2023
Maintainer

Added features.
Thanks!

0 replies

mariaWitch · 2023-04-14T01:44:20Z

mariaWitch
Apr 14, 2023

You can actually get a lot of the performance back when using the filters by offloading it to the gpu by using CuPy. It shouldn't be too difficult to implement.

0 replies

SwiftIllusion · 2023-04-15T21:25:54Z

SwiftIllusion
Apr 15, 2023
Author

Very smooth implementation, thank you for the great work :)

An error however I found in the latest update, when trying to save a file specifically as a safetesnor (with both normal calculation and cosine):

Traceback (most recent call last):
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 833, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\extensions\sd-webui-supermerger\scripts\mergers\mergers.py", line 72, in smergegen
    result = savemodel(theta_0,currentmodel,custom_name,save_sets,model_a,metadata) if save else "Merged model loaded:"+currentmodel
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\extensions\sd-webui-supermerger\scripts\mergers\model_util.py", line 700, in savemodel
    safetensors.torch.save_file(state_dict, fname, metadata=metadata)
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\safetensors\torch.py", line 71, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
TypeError: argument 'metadata': 'dict' object cannot be converted to 'PyString'

Also with how smooth this latest implementation, I was able to add another version of the Cosine merge, which mixes the weights separately before calculating cosine, and you can see in the demonstrated comparison below how this can result in favoring structures of A and details of B/and the other way round from modelA vs modelB (calculated in a different sequence, it's not a result you would get by just swapping the A/B model around), so I added and adjusted them as cosineA and cosineB calculation modes.

Regular merging
Merging with cosineA
Merging with cosineB

I've also added in as a calculation mode smoothAdd which is the smoother filtered add difference method from the original post that was missed here.
And thanks to the implementation of the first Cosine I was able to implement them 'properly' this time instead of just replacing existing things.

In supermerger.py I replace

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosine"], value = "normal")

with

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosineA", "cosineB", "smoothAdd"], value = "normal")

Then in mergers.py I replace

            elif calcmode == "cosine":
                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                    dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                    magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    caster(f"model A[{key}] +  {1-k} + * (model B)[{key}]*{k}",hear)
                    theta_0[key] = theta_0[key] * (1 - k) + theta_1[key] * k

with

            elif calcmode == "cosineA": #favors modelA's structure with details from B
                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    # Normalize the vectors before merging
                    theta_0_norm = nn.functional.normalize(theta_0[key].to(torch.float32), p=2, dim=0)
                    theta_1_norm = nn.functional.normalize(theta_1[key].to(torch.float32), p=2, dim=0)
                    simab = sim(theta_0_norm, theta_1_norm)
                    dot_product = torch.dot(theta_0_norm.view(-1), theta_1_norm.view(-1))
                    magnitude_similarity = dot_product / (torch.norm(theta_0_norm) * torch.norm(theta_1_norm))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    caster(f"model A[{key}] +  {1-k} + * (model B)[{key}]*{k}",hear)
                    theta_0[key] = theta_1[key] * (1 - k) + theta_0[key] * k

            elif calcmode == "cosineB": #favors modelB's structure with details from A
                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                    dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                    magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    caster(f"model A[{key}] +  {1-k} + * (model B)[{key}]*{k}",hear)
                    theta_0[key] = theta_1[key] * (1 - k) + theta_0[key] * k

            elif calcmode == "smoothAdd":
                caster(f"model A[{key}] +  {current_alpha} + * (model B - model C)[{key}]", hear)
                # Apply median filter to the weight differences
                filtered_diff = scipy.ndimage.median_filter(theta_1[key].to(torch.float32).cpu().numpy(), size=3)
                # Apply Gaussian filter to the filtered differences
                filtered_diff = scipy.ndimage.gaussian_filter(filtered_diff, sigma=1)
                theta_1[key] = torch.tensor(filtered_diff)
                # Add the filtered differences to the original weights
                theta_0[key] = theta_0[key] + current_alpha * theta_1[key]

and

    if calcmode =="cosine":
        if stopmerge: return "STOPPED", *non4
        sim = torch.nn.CosineSimilarity(dim=0)
        sims = np.array([], dtype=np.float64)
        for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
            # skip VAE model parameters to get better results
            if "first_stage_model" in key: continue
            if "model" in key and key in theta_1:
                simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                combined_similarity = (simab + magnitude_similarity) / 2.0
                sims = np.append(sims, combined_similarity.numpy())
        sims = sims[~np.isnan(sims)]
        sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
        sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

with

    if calcmode =="cosineA": #favors modelA's structure with details from B
        if stopmerge: return "STOPPED", *non4
        sim = torch.nn.CosineSimilarity(dim=0)
        sims = np.array([], dtype=np.float64)
        for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
            # skip VAE model parameters to get better results
            if "first_stage_model" in key: continue
            if "model" in key and key in theta_1:
                theta_0_norm = nn.functional.normalize(theta_0[key].to(torch.float32), p=2, dim=0)
                theta_1_norm = nn.functional.normalize(theta_1[key].to(torch.float32), p=2, dim=0)
                simab = sim(theta_0_norm, theta_1_norm)
                sims = np.append(sims,simab.numpy())
        sims = sims[~np.isnan(sims)]
        sims = np.delete(sims, np.where(sims<np.percentile(sims, 1 ,method = 'midpoint')))
        sims = np.delete(sims, np.where(sims>np.percentile(sims, 99 ,method = 'midpoint')))

    if calcmode =="cosineB": #favors modelB's structure with details from A
        if stopmerge: return "STOPPED", *non4
        sim = torch.nn.CosineSimilarity(dim=0)
        sims = np.array([], dtype=np.float64)
        for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
            # skip VAE model parameters to get better results
            if "first_stage_model" in key: continue
            if "model" in key and key in theta_1:
                simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                combined_similarity = (simab + magnitude_similarity) / 2.0
                sims = np.append(sims, combined_similarity.numpy())
        sims = sims[~np.isnan(sims)]
        sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
        sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

and I added these includes necessary

import torch.nn as nn
import scipy.ndimage
from scipy.ndimage.filters import median_filter as filter

@mariaWitch Thanks for the suggestion, however though I believe I got the code for that method, it looks to be restricted to CUDA devices and while trying to pip install it, it wouldn't work (couldn't find CUDA for some reason) and was trying to built it on my system itself which it couldn't, and so not sure myself how that would be properly implemented? Especially when just the necessary include would cause these problems.

0 replies

mariaWitch · 2023-04-16T20:19:21Z

mariaWitch
Apr 16, 2023

Thanks for the suggestion, however though I believe I got the code for that method, it looks to be restricted to CUDA devices and while trying to pip install it, it wouldn't work (couldn't find CUDA for some reason) and was trying to built it on my system itself which it couldn't, and so not sure myself how that would be properly implemented? Especially when just the necessary include would cause these problems.

I actually properly implemented it into Bayesian Merger, which has a very similar code structure to Super Merger. You can check it out here:

github.com/mariaWitch/sd-webui-bayesian-merger/blob/double-diff-cosine/sd_webui_bayesian_merger/merger.py#L250-L263

But essentially I had to convert the tensor that gets passed to SciPy (In this case CuPyx) to a dl pack and then use CuPy to convert it into a CuPy Array, and then pass that to the filters. Once that was done, I was able to convert it back into a DLpack, and then convert it back into a tensor with from_dlpack. The reason why we have to convert the tensor to a DL pack is that this is the only supported way doing Zero-copy transfer of a cpu tensor into a CuPy array (which is on the gpu, as CuPyX does not support standard numpy arrays). By doing it this way, we avoid costly memory transfers between system ram and VRAM that would otherwise decrease performance.
It should be noted that CuPy and CuPyX (part of the same package) both have nearly identitcal functions to their Non Cuda Counterparts, so much so that I literally just recasted cupyx.scipy as scipy when I imported.

from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack
import cupy as cp
import cupyx.scipy as scipy
from cupyx.scipy.ndimage._filters import median_filter as filter

These would be the imports that you would bring in if CuPy was installed, (these would be imported in place of scipy)

0 replies

mariaWitch · 2023-04-16T20:23:02Z

mariaWitch
Apr 16, 2023

As for your installation issues, I have had no such bad luck. But, I think CuPy has experiemental support for ROCM as well. Either way it can exist as something that can work if the script can import it, otherwise it could just fail over.
But I saw a 10x speed reduction just from using CuPy instead of SciPy for the filters, so I think it is definitely worth trying to get working.

0 replies

mariaWitch · 2023-04-16T21:53:35Z

mariaWitch
Apr 16, 2023

Also, could you elaborate a little on what you mean by the "structure" of the model and the "details" of a model in the context that you used them in, it seems a bit abstract, and could mean a lot of different things.

0 replies

hako-mikan · 2023-04-17T16:02:05Z

hako-mikan
Apr 17, 2023
Maintainer

@SwiftIllusion
Thanks a lot! I I will be implementing the method you described in the next update.
I need your help.
I am also planning to implement a new calculation method and will create a new README about the calculation method.
Could you please write an explanation about the calculation method you have introduced?
https://github.com/hako-mikan/sd-webui-supermerger/blob/ver10/calcmode.md

@mariaWitch
Thanks for your advice.
Certainly the faster the calculation the better, so it would be good if we could implement the method you have introduced.
On the other hand, methods that depend on the environment cause many problems.
Especially users who use google colab seem to have a lot of import problems.
Thus, I would consider it to work with or without installation.

0 replies

SwiftIllusion · 2023-04-18T13:58:59Z

SwiftIllusion
Apr 18, 2023
Author

No worries :) glad to hear.

I don't have the technical wizardy or verbal expertise as some, but I've tried with my own observations in its development/output alongside chatgpt to help provide some more guidance/details below, as I know what it's like to see new tech and have no idea what it's doing/how to take advantage of it. Hope it helps.

normal

Available modes : All

Normal calculation method. Can be used in all modes.

cosineA/cosineB

Available modes : weight sum

The comparison of two models is performed using cosine similarity, centered on the set ratio, and is calculated to eliminate loss due to merging. See below for further details.
#33 https://github.com/recoilme/losslessmix

The original simple weight mode is the most basic method and works by linearly interpolating between the two models based on a given weight alpha. At alpha = 0, the output is the first model (model A), and at alpha = 1, the output is the second model (model B). Any other value of alpha results in a weighted average of the two models.

Original merge results between AnythingV3 and FeverDream model
charming girl mid-shot. scenery-beautiful majestic

One key advantage of the cosine methods over the original simple weight mode is that they take into account the structural similarity between the two models, which can lead to better results when the two models are similar but not identical. Another advantage of the cosine methods is that they can help prevent overfitting and improve generalization by limiting the amount of detail from one model that is incorporated into the other.

In the case of CosineA, we normalize the vectors of the first model (model A) before merging, so the resulting merged model will favor the structure of the first model while incorporating details from the second model. This is because we are essentially aligning the direction of the first model's vectors with the direction of the corresponding vectors in the second model.

CosineA merge results between AnythingV3 and FeverDream model
Note structure-wise the pose direction/flow and face area

Detail-wise for example note how above and below, in all cases there's more blur preserved for the background compared to foreground, instead of the linear difference in the original merge.

On the other hand, in CosineB, we normalize the vectors of the second model (model B) before merging, so the resulting merged model will favor the structure of the second model while incorporating details from the first model. This is because we are aligning the direction of the second model's vectors with the direction of the corresponding vectors in the first model.

CosineB merge results between AnythingV3 and FeverDream model
Note structure-wise the pose direction/flow and face area, and how in the background it tried to keep the form more from the right too

In summary, the choice between CosineA and CosineB depends on which model's structure you want to prioritize in the resulting merged model. If you want to prioritize the structure of the first model, use CosineA. If you want to prioritize the structure of the second model, use CosineB.

Note also how the second model is more the 'reference point' for the merging looking at Alpha 1 compared to the changes at 0, so the order of models can also change the end result to look for your desired output.

CosineA merge results between FeverDream and AnythingV3 model

smoothAdd

Available modes : Add difference

A method of add difference that mixes the benefits of Median and Gaussian filters, to add model differences in a smoother way trying to avoid the negative 'burning' effect that can be seen when adding too many models this way. This also achieves more than just simply adding the difference at a lower value.

The starting point for reference
Adding a collection of models on top of it, each with a value of 1
The burn here is very obvious
Adding a collection of models on top of it, each with a value of 0.5
Still not an outcome I would accept, especially you can see with the bird

The functionality and result of just the Median filter

Reduces noise in the difference by replacing each value with the median of the neighboring values.
Preserves edges and structures in the difference, which is helpful when you want to transfer the learning related to object shapes and boundaries.
Non-linear filtering, which means it can better preserve the important features in the difference while reducing noise.

The functionality and result of just the Gaussian filter

Smooths the difference by applying a Gaussian kernel, which reduces high-frequency noise and retains the low-frequency components.
The level of smoothing can be controlled by the sigma parameter, allowing you to experiment with different levels of smoothing.
Linear filtering, which means it can better preserve the global structure in the difference while reducing noise.

The final result when instead using the combination of Median and Gaussian filters
Note also compared with either the Median/Guassin filters individually how you can see the top left of the mans hair in the top right image doesn't get 'stuck' when combining them here, achieving the best result overall

TIP
Sometimes you may want to use this smooth Add difference as an alternative to the regular, even without the risk of burning.
In these cases you could increase the Alpha up to 2, as smooth Add at 1 is a lower impact change individually than regular Add, but this of course depends on your desired outcome.

0 replies

SwiftIllusion · 2023-04-18T14:15:34Z

SwiftIllusion
Apr 18, 2023
Author

@mariaWitch Regrettably without being able to install the requirements for the include of your method, I've been unable to see that here. I also spent hours trying to implement other methods/performance improvements to the filters within the existing scope, but the closest I got was a different method for one of the two filters that resulted in a completely different/wrong output, the rest of the time was spent with errors, so I've had to consider that beyond the scope of what I can achieve.
At least though with also seeing the additional prompt about the readme, I've now outlined everything better above to hopefully help you and others better understand what I was referring to in the cosine merge methods and how to take advantage of it all.

0 replies

mariaWitch · 2023-04-18T14:52:36Z

mariaWitch
Apr 18, 2023

So Structure refers to the background and pose, and details refer to the actual character details on the subject, that makes it a lot more clear.

0 replies

hako-mikan · 2023-04-18T15:20:16Z

hako-mikan
Apr 18, 2023
Maintainer

@SwiftIllusion
Great! Thanks a lot!
Your explanation with the poses is very clear.

0 replies

hako-mikan · 2023-04-19T11:35:19Z

hako-mikan
Apr 19, 2023
Maintainer

Updated

0 replies

recoilme · 2023-04-19T15:06:40Z

recoilme
Apr 19, 2023

I just checked the A/B cosine and the results are impressive. Thanks a lot!

0 replies

SwiftIllusion · 2023-04-21T02:57:45Z

SwiftIllusion
Apr 21, 2023
Author

@recoilme Awesome, I'm really happy to hear that :), thank you very much for the original inspiration.
That result is amazing :D.

0 replies

mariaWitch · 2023-05-01T14:29:53Z

mariaWitch
May 1, 2023

theta_0[key] = theta_1[key] * (1 - k) + theta_0[key] * k

@SwiftIllusion Why was this changed from theta_0[key]= theta_0[key] * (1-k) + theta_1[key] * k to the line above?
This seems a bit backwards now.

0 replies

SwiftIllusion · 2023-05-02T08:50:57Z

SwiftIllusion
May 2, 2023
Author

@mariaWitch This was to fix the fact it was actually previously merging backwards (if you put the weight at 0.75, it would have been 0.75 to modelA instead of modelB). Now, as per the examples in the guide which was made after this fix, it has the output correctly going from 0 A to 1 B.

0 replies

mariaWitch · 2023-05-02T10:57:51Z

mariaWitch
May 2, 2023

Thank you! That is clear enough for me.

0 replies

SwiftIllusion · 2023-06-01T18:35:55Z

SwiftIllusion
Jun 1, 2023
Author

@hako-mikan
After getting the previous merge methods working, I was left with a theory of a theory to tackle a problem I wasn't sure would be possible, eventually I discovered it was possible.
I couldn't explain the technicality of it, and GPT never knew what I was trying to do, but I tested it and have since been working with it after discovering/confirming it properly worked, to see just how far I could push its potential myself and so I could work out a guide/tips for it too to give people a full head start on how to work with it and what to try avoid.
I don't know why it works, but it works, and significantly expands the possibilities of merges and models.
It's something I hope can positively evolve how people develop merged models and share trained models.

I hope you can and would appreciate you adding this whenever you get the opportunity, I've provided the code at the end of this post like previously, adding it into the latest version (Commits on Jun 5, 2023) as a new choice for Add Difference.

The guide for this new method

trainDifference

Available modes : Add difference

This method at its simplest, can be thought of as a 'super Lora' for permanent merges,
it no longer adds the calculated difference between (B)-(C) models to model (A),
now it 'trains' that difference as if it was finetuning it relative to model (A).

Comparisons

Regular addDifference vs trainDifference
With rev animated and isometric-future
"IsometricFuture, garden, IsometricFuture"
Generated with addDifference ('rev animated')+('isometric future'-'sdv1.5')

Generated with trainDifference ('rev animated')+('isometric future'-'sdv1.5')

With rev animated and anything v3
"man smiling"
Generated with 'rev animated'

Generated with addDifference ('rev animated')+('anything v3'-'sdv1.4')

Generated with trainDifference ('rev animated')+('anything v3'-'sdv1.4')
Lora vs trainDifference
Lora's obviously aren't invalidated by this because of their utility, plug-and-play flexibility, etc.
However it's often discussed how some models 'don't work well' with Lora's, and you've got some models like 'AnyLoRA' which was developed for that user on civitai to train their Lora's with in relation to this. You can see how to take advantage of this and trainDifference here.
Using FeverDream (a model definitely further away from the 'compatibility' an anime Lora would require), and Thicker Lines Anime Style LoRA Mix who provided both a Lora version and pre-merged with Anything V4.5 version we'll use for this, a direct comparison between the Lora on 'FeverDream' vs trainDifference ('Feverdream')+('Thicker Lines'-'Anythingv4.5') can be seen.
We'll compare at 1/1.2/1.4/1.6 lora/merge strength, as it's easiest to see at the extremes how the Lora pulls apart compared to the train difference.

Usage guidance

Possibilities and general usage

Expand a model with new concepts, or reinforce existing concepts (and quality output), instead of mixing

Sci-Fi Diffusion as an example https://civitai.com/models/4404?modelVersionId=4980 was trained on general sci-fi images.
You don't have to merge/mix into it anymore, you can use this to practically train Sci-Fi into your model by trainDifferencing it against SDv1.5, you aren't limited to generating an aproximated Lora difference for expansion.
Another example is you could cosime similarity merge Analog Diffusion and Timeless Diffusion that are similar in nature (and you wouldn't want to re-inforce the negative elements of the photographs too much) then trainDifference Modelshoot Style ontop of that which focuses on medium body shots with a stronger photography foundation built by the previous merge.
The potential for models, being able to now in a sense 'continue training' with broad models like Surreality and seek.art MEGA that gratefully lifted their license restrictions with V2, is now much larger than when it was limited to mixing them into models (though of course the utility for styling with different weighting of ins/outs etc all still has its value, and everything depends on your goal).
Also models like RPG with v5 sounding like it is being developed from SDv1.5 instead of a merge, with this can be trained into models without the heavy NSFW/female bias in many from F222/etc merges.

Direction of trainDifference and style of the difference matters

It is harder for a model to learn to be realistic, than to be stylistic.
For example if building a model that intends to eventually be stylistic, consider having multiple model branches based on similar styles, to eventually trainDifference the stylistic branch onto the most realistic branch.
Generally you should merge anime/cartoon > stylish > realistic, if the styles differ.

trainDifference is not always the best solution

Sometimes depending on the type/scope of the difference, cosine similarity merge can provide better results (if the differences aren't from SDv1.5 already, trainDifference both onto SDv1.5, and then cosine similiarity merge them from there before you trainDifference it back onto your working model).
Also, sometimes if the material is similar but large and varied, the best result can come from using trainDifference in both directions, and then weight-sum merge between those 2 to find the best result, like waifu diffusion and Acertainty.

Gain the benefits of a trained model anywhere

Models like knollingcase and Bubble Toys are cool, but their effort has been limited by the framework they were trained on. Now you can trainDifference them onto any of the newer models that people have developed.
Additionally some people that have made checkpoints instead of Lora's mentioned trying Lora first but without getting valuable results, with trainDifference their work can still be applied onto any model.

Limitations and what to avoid/problems and solutions

Knowing and having access to the origin of the model pre-training is required

A lot of models have some mix of SDv1.4 now. This trainDifference merge is accurate enough that, if you were to try and for example train 'rev animated' onto 'Sci-fi Diffusion' with SDv1.5 as model (C), because 'rev animated's origin is an unknown ratio between SDv1.4 and SDv1.5 (and mix of individual in/out weights too), the merge would negatively affect the output (the 'training' would be offset/distorted), but you could trainDifference 'sci-fi Diffusion' onto 'rev animated' because it was trained on SDv1.5.

After enough time / with similar materials, 'burning'/'over training' can eventually occur

You can 'pull back' the model at this point by cosine similarity merging it with SDv1.5, which helps ground it while keeping more qualities from the training.

After enough merges, the 'clip/comprehension' can become heavy, negatively effecting simple prompts

For example complex prompts may still look good, but 'female portrait, blue eyes' could spill the 'blue' concept too much.
To help avoid this, as you make trainDifference merges or large scope, you can use model toolkit to manipulate the clip.
Load the final model into that extension, and create 2 different models. 'clipA' importing the clip of your base model, 'clipB' importing the clip of what you trained into it, and use a regular weightsum merge to find the best output/comprehension between those 2 models, to soften out the clip as you expand your model.
Sometimes weightsum merging the final model with a version of it using the SDv1.5 clip can be better than mixing between clipA and clipB.

Practical demonstration

One of the simpler ways you can take advantage of this is for more natural/accurate Lora styling of a different model.
In this we'll use BreakDomainAnime and Mika Pikazo Style LoRA that was trained on AnyLora
"1girl, smiling, scenic background BREAK [mika-pikazo]"
Generated with 'BreakDomainAnime'

Generated with 'BreakDomainAnime' using 'Mika Pikazo Style LoRA' at 1 strength

Now instead of having the Lora apply over 'BreakDomainAnime', we'll use trainDifference to get a better alignment.
Using the Lora tab of SuperMerger, Merge to Checkpoint 'Mika Pikazo Style LoRA' onto "anyloraCheckpoint_novaeFp16" (the checkpoint they describe as the one to use for training, so assumed to be what they use for their training) as "anyloraCheckpoint_mika_pikazo".
Then trainDifference ('BreakDomainAnime')+('Desired Lora combination merged onto AnyLora, in this case anyloraCheckpoint_mika_pikazo'-'AnyLora') to generate

Another more immediately visible comparison between Lora/the above technique, for a trainDifference of a background Lora that was originally trained on an anime model moved to a realistic model.
"An eco-friendly residential building covered in vertical gardens in an urban setting"

The code for this new method

In supermerger.py replace

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosineA", "cosineB", "smoothAdd","tensor"], value = "normal")

with

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosineA", "cosineB", "trainDifference", "smoothAdd","tensor"], value = "normal")

Then in mergers.py replace

    if MODES[1] in mode:#Add
        if stopmerge: return "STOPPED", *non4
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
        for key in tqdm(theta_1.keys()):
            if 'model' in key:
                if key in theta_2:
                    t2 = theta_2.get(key, torch.zeros_like(theta_1[key]))
                    theta_1[key] = theta_1[key]- t2
                else:
                    theta_1[key] = torch.zeros_like(theta_1[key])
        del theta_2

with

    if MODES[1] in mode:#Add
        if stopmerge: return "STOPPED", *non4
        if calcmode == "trainDifference":
            theta_2 = load_model_weights_m(model_c,True,False,save).copy()
        else:
            theta_2 = load_model_weights_m(model_c,False,False,save).copy()
            for key in tqdm(theta_1.keys()):
                if 'model' in key:
                    if key in theta_2:
                        t2 = theta_2.get(key, torch.zeros_like(theta_1[key]))
                        theta_1[key] = theta_1[key]- t2
                    else:
                        theta_1[key] = torch.zeros_like(theta_1[key])
            del theta_2

and replace

    if MODES[2] in mode or MODES[3] in mode:#Tripe or Twice
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
    else:
        theta_2 = {}

with

    if MODES[2] in mode or MODES[3] in mode:#Tripe or Twice
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
    else:
        if calcmode != "trainDifference":
            theta_2 = {}

and replace

            if usebeta and (not key in theta_2) and (not theta_2 == {}) :
                continue

with

            if calcmode == "trainDifference":
                if key not in theta_2:
                    continue
            else:
               if usebeta and (not key in theta_2) and (not theta_2 == {}) :
                    continue

and between "cosineB" and "smoothAdd" methods, add
(note multiplying current_alpha by 1.8 is intentional, I don't understand the maths, but from testing that makes the 'training' amount equivelant to 1:1 when current_alpha is set to 1)

            elif calcmode == "trainDifference":
                # Check if theta_1[key] is equal to theta_2[key]
                if torch.allclose(theta_1[key].float(), theta_2[key].float(), rtol=0, atol=0):
                    theta_2[key] = theta_0[key]
                    continue

                diff_AB = theta_1[key].float() - theta_2[key].float()

                distance_A0 = torch.abs(theta_1[key].float() - theta_2[key].float())
                distance_A1 = torch.abs(theta_1[key].float() - theta_0[key].float())

                sum_distances = distance_A0 + distance_A1

                scale = torch.where(sum_distances != 0, distance_A1 / sum_distances, torch.tensor(0.).float())
                sign_scale = torch.sign(theta_1[key].float() - theta_2[key].float())
                scale = sign_scale * torch.abs(scale)

                new_diff = scale * torch.abs(diff_AB)
                theta_0[key] = theta_0[key] + (new_diff * (current_alpha*1.8))

and after the last "del theta_1"add

    if calcmode == "trainDifference":
        del theta_2

0 replies

hako-mikan · 2023-06-27T13:25:44Z

hako-mikan
Jun 27, 2023
Maintainer

Added trainDifference.
Thanks!!

0 replies

SwiftIllusion · 2023-06-27T13:38:03Z

SwiftIllusion
Jun 27, 2023
Author

No worries, thank you very much for your work on this/implementing it :) .

0 replies

miasik · 2023-07-04T13:07:15Z

miasik
Jul 4, 2023

@SwiftIllusion @mariaWitch I just want to say that @sverfier8807 has implemented multi-threading for smoothAdd and now it works much faster.
#144

0 replies

StAlKeR7779 · 2023-07-20T21:22:12Z

StAlKeR7779
Jul 20, 2023

@SwiftIllusion @hako-mikan maybe here instead of 1.8 multiplier should be 2?
//H - harmonic mean

0 replies

SwiftIllusion · 2023-07-21T20:56:35Z

SwiftIllusion
Jul 21, 2023
Author

@StAlKeR7779 Sorry as I don't know the value of the math you're displaying and I appreciate the thought to improve it further, but I did many tests across different merges (e.g. including merging models trained on SDv1.4 to SDv1.5 or SDv1.5 to Practically the same model, and different Lora comparisons which you can see in the guide how it correlates in strength). Anything beyond 1.8 started to 'burn/over-train' in a way that appeared greater than the original (I tested from 1 to 2). Even 1.9 appeared too much which surprised me at the time as I was expecting 2 to be the most natural value if it required more than 1, but 1.8 was the most representative and I've used it greatly since then with that value.

0 replies

TomLucidor · 2024-01-22T09:32:22Z

TomLucidor
Jan 22, 2024

Q: what are the functional difference between these multi-LoRA merge algorithms (assuming no Checkpoints), with regard to overlapping ideas (or similar designs) vs independent concepts? (for @SwiftIllusion)

0 replies

SwiftIllusion · 2024-01-22T17:00:49Z

SwiftIllusion
Jan 22, 2024
Author

@TomLucidor
I don't know what you're refering to as multi-Lora merge algorithms, as the merges between multiple training with cosine is shown (it will depend on what visual results you're getting for each Lora).
But as far as consecutive trainDifference merging, it will favor the 'shape/transformation direction' of the latest Lora you trainDifference onto it. So it will depend on how generalized/targeted the captioning/training parameters etc were as to how much is 'overwritten'. The "usage guidance" at the end of the trainDifference guide covers basically all else I could think to say.

1 reply

TomLucidor Oct 7, 2024

Multi-LoRA, as in given a standard Checkpoint A, with LoRAs B, C, and D, where you want to generate file E that aggregates B, C and D... But then the concern is "burning" with additive merging onn one end, and loss of vital information in the other.

miasik · 2024-02-22T07:20:09Z

miasik
Feb 22, 2024

I've been playing with cosA/B for a while recently and noticed a strange thing.
Using any cosA/B ModA+ModB vs ModB+ModA is a really game changer, but cosA/B almost don't have difference. As I see the resulting image always get as much as possible from ModB. Let me show:

The original images are really different:
Mod1 aardvark original images:

Mod2 HaveAll.X original images:

HAX + aard 0.5 cosA:

HAX + aard 0.5 cosB:

There are some difference between the resulting images but they both look more like ModB(aard).
Lets switch ModA and ModB:

aard + HAX 0.5 cosB:

aard + HAX 0.5 cosA:

It's easy to see that ModB(HAX) wins again.
Even with alpha 0.25 the result of cosA/B looks more as ModB, almost ignoring cosA/B so no way to get any result with ModA likeness.

My question is — is it the expected behavior or something wrong?

2 replies

recoilme Feb 22, 2024

From my experience, if the models are significantly different, one of them will tend to dominate. I select the coefficient at the "breaking point", dividing in half, I look for a point at which there is an influence of both models, it can be 0.12, for example (never 0.5)

miasik Mar 2, 2024

ModA gets heavy changed even using alpha 0 for cosA/B. I hope mу could have an option to switch direction.

miasik · 2024-02-22T16:05:01Z

miasik
Feb 22, 2024

From my experience, if the models are significantly different, one of them will tend to dominate. I select the coefficient at the "breaking point", dividing in half, I look for a point at which there is an influence of both models, it can be 0.12, for example (never 0.5)

As for my recent tries — model B always wins. Even with alpha=0 the result of any cosA/B is never modA, it always greatly changed instead. Even super strong Haveall X falls being at ModA position.
So I'm saying that not cosA/B rule the merging but position.
I'd love to have an option to prefer ModA over ModB for cosA/B

0 replies

ljleb · 2024-03-01T08:03:53Z

ljleb
Mar 1, 2024

Just a note that "train difference" is really just add difference but with an element-wise scale factor applied to $B - C$ . It is equivalent to:

$M = A + S α (B - C)$

where $S = 1.8 \frac{| B - A |}{| B - A | + | B - C |}$

$S$ is a factor between $0$ and $1.8$ . It describes how far B is from A in proportion to the total distance from A to B and then from B to C. The expected value of this equation is $1.8 * 0.5$ given random inputs.

Because of this, since weights are typically initialized randomly for training and the distribution after training is similar to the initial distribution, $2$ is probably a more theoretically correct scaling factor than $1.8$ . In practice, this depends on how similar the models are, which will vary because we typically merge non-randomly initialized models like finetunes instead of independently trained models.

3 replies

ljleb Mar 2, 2024

Following the alternative understanding of train diff above, I went ahead and tried alternative metrics for the filter calculations.

I found a metric for element-wise similarity a long time ago, however I did not think of using it as a factor of $α$ . Replacing the train diff metric with an "element-wise cosine similarity", we get:

$M = A + S^{^{'}} α (B - C)$

where $S^{^{'}} = \frac{1 - \frac{(A - C) (B - C)}{m a x (| A - C |, | B - C |)^{2}}}{2}$

When $| A - C | = | B - C |$ , $S^{^{'}}$ results in $0$ . As soon as either $A - C$ or $B - C$ are $0$ , $S^{^{'}}$ results in $1$ . This is strictly bounded between $0$ and $1$ , ~~and will be 0.5 in average over random noise in [-1,1]~~. (this part is wrong, read below)

The intent is so that if A and B agree on the value of a particular parameter, then we do not change it. However, if they disagree, we add $B - C$ to $A$ based on how much disagreement there is. This way, we can overcome the parameter doubling problem at the locations where A and B agree in standard add difference.

After some testing, I am able to crank alpha up to $2$ and it still doesn't explode the latents when generating. It otherwise looks very similar to train difference (using $α = 1$ for both) albeit less noisy and burnt out when using $S^{^{'}}$ .

This definitely needs more testing. The quality of my results could be caused by some relationship of my input models I'm using that I'm unaware of.

ljleb Mar 2, 2024

To correct myself on one point, the expected value does not appear to be $0.5$ :

import torch

# Set random seed for reproducibility
torch.manual_seed(0)

# Generate 100k random samples for a, b, and c
a = torch.randn(100000)
b = torch.randn(100000)
c = torch.randn(100000)

# Calculate the similarity factor
threshold = torch.maximum(torch.abs(a - c), torch.abs(b - c))
similarity = (1 - (a - c) * (b - c) / threshold**2) / 2
factor = torch.nan_to_num(similarity, nan=0)

# Calculate the average value of the factor
average_factor = factor.mean().item()
average_factor

0.4017455577850342

While this is generated by GPT, it clearly shows that the expected value over noise is not $0.5$ . I wonder whether we want it to be $0.5$ for a more accurate metric?

ljleb Oct 5, 2024

Some news on this. A while back in july I was struggling with fully understanding the intent of original similarity metric of train difference. One very interesting result I got was that by substituting $a - b$ by $x$ and $b - c$ by $y$ , then normalizing, this gives us a 2D version of S:

$S (A, B, C) = 1.8 \frac{| A - B |}{| A - B | + | B - C |}$

$S_{2 D} (x, y) = \frac{| x |}{| y | + | x |}$

I then did the same with the new metric I had devised, substituting this time $a - c$ by $x$ and $b - c$ by $y$ :

$S^{'} (A, B, C) = \frac{1 - \frac{(A - C) (B - C)}{m a x (| A - C |, | B - C |)^{2}}}{2}$

$S_{2 D}^{'} (x, y) = \frac{1 - \frac{x y}{m a x (| x |, | y |)^{2}}}{2}$

Then, I went ahead and plotted this in 3D in geogebra online:

Do you see something? It wasn't very rigorous, but at the time I was seeing loosely that one of the 2D functions is a 45 degrees rotation from the other. I then went ahead and rotated the input of $S_{2 D}$ by 45 degrees to verify whether this was truly the case:

    x, y = math.sqrt(2) / 2 * torch.stack([a - b, a + b - 2*c])  # rotate by 45d
    mask_s_2d = torch.nan_to_num(x.abs() / (x.abs() + y.abs()), nan=0)

    x, y = a - c, b - c
    threshold = torch.maximum(x.abs(), y.abs())
    mask_s_prime_2d = (1 - torch.nan_to_num(x * y / threshold**2, nan=0)) / 2

and I found that these two functions are actually equivalent to each other.

Seeing this, I tried using this identity to convert $S$ and $S^{'}$ from one form to the other, and what I found is that

$S^{'} (A, B, C) = \frac{1 - \frac{(A - C) (B - C)}{m a x (| A - C |, | B - C |)^{2}}}{2} = \frac{| A - B |}{| A - B | + | A + B - 2 C |}$

Which is very close to the original train difference, although it still differs by the second term of the denominator and the scaling by 1.8. What I find interesting with this is that we can actually interpolate between the two metrics using:

$S_{b o t h} (A, B, C, β) = \frac{(1.8 (1 - β) + 2 β) | A - B |}{| A - B | + | B - C + β (A - C) |}$

$β \in [0, 1]$ still give valid models, and it allows us to compare both approaches smoothly.

New method suggestions for additional merge potential (code+output comparisons included) #264

Sum merging

Add difference merging

The code

Relevant changes

The full script

Minor suggestion

Replies: 30 comments · 6 replies

hako-mikan Mar 21, 2023 Maintainer

SwiftIllusion Mar 21, 2023 Author

hako-mikan Apr 13, 2023 Maintainer

SwiftIllusion Apr 15, 2023 Author

hako-mikan Apr 17, 2023 Maintainer

SwiftIllusion Apr 18, 2023 Author

normal

Available modes : All

cosineA/cosineB

Available modes : weight sum

smoothAdd

Available modes : Add difference

SwiftIllusion Apr 18, 2023 Author

hako-mikan Apr 18, 2023 Maintainer

hako-mikan Apr 19, 2023 Maintainer

SwiftIllusion Apr 21, 2023 Author

SwiftIllusion May 2, 2023 Author

SwiftIllusion Jun 1, 2023 Author

The guide for this new method

trainDifference

Available modes : Add difference

Comparisons

Usage guidance

Possibilities and general usage

Limitations and what to avoid/problems and solutions

Practical demonstration

The code for this new method

hako-mikan Jun 27, 2023 Maintainer

SwiftIllusion Jun 27, 2023 Author

SwiftIllusion Jul 21, 2023 Author

SwiftIllusion Jan 22, 2024 Author

Replies: 30 comments 6 replies

hako-mikan
Mar 21, 2023
Maintainer

SwiftIllusion
Mar 21, 2023
Author

hako-mikan
Apr 13, 2023
Maintainer

SwiftIllusion
Apr 15, 2023
Author

hako-mikan
Apr 17, 2023
Maintainer

SwiftIllusion
Apr 18, 2023
Author

SwiftIllusion
Apr 18, 2023
Author

hako-mikan
Apr 18, 2023
Maintainer

hako-mikan
Apr 19, 2023
Maintainer

SwiftIllusion
Apr 21, 2023
Author

SwiftIllusion
May 2, 2023
Author

SwiftIllusion
Jun 1, 2023
Author

hako-mikan
Jun 27, 2023
Maintainer

SwiftIllusion
Jun 27, 2023
Author

SwiftIllusion
Jul 21, 2023
Author

SwiftIllusion
Jan 22, 2024
Author