|
| 1 | +--- |
| 2 | +title: "SegMoE: Segmind Mixture of Diffusion Experts" |
| 3 | +thumbnail: /blog/assets/segmoe/thumbnail.png |
| 4 | +authors: |
| 5 | +- user: Warlord-K |
| 6 | + guest: true |
| 7 | +- user: Icar |
| 8 | + guest: true |
| 9 | +- user: harishp |
| 10 | + guest: true |
| 11 | +--- |
| 12 | + |
| 13 | +# SegMoE: Segmind Mixture of Diffusion Experts |
| 14 | + |
| 15 | +SegMoE is an exciting framework for creating Mixture-of-Experts Diffusion models from scratch! SegMoE is comprehensively integrated within the Hugging Face ecosystem and comes supported with `diffusers` 🔥! |
| 16 | + |
| 17 | +Among the features and integrations being released today: |
| 18 | + |
| 19 | +- [Models on the Hub](https://huggingface.co/models?search=segmind/SegMoE), with their model cards and licenses (Apache 2.0) |
| 20 | +- [Github Repository](https://github.com/segmind/segmoe) to create your own MoE-style models. |
| 21 | + |
| 22 | +## Table of Contents |
| 23 | + |
| 24 | +- [What is SegMoE](#what-is-segmoe) |
| 25 | + - [About the name](#about-the-name) |
| 26 | +- [Inference](#inference) |
| 27 | + - [Samples](#Samples) |
| 28 | + - [Using 🤗 Diffusers](#using-🤗-diffusers) |
| 29 | + - [Using a Local Model](#using-a-local-model) |
| 30 | +- [Comparison](#comparison) |
| 31 | +- [Creating your Own SegMoE](#creating-your-own-segmoe) |
| 32 | +- [Disclaimers and ongoing work](#disclaimers-and-ongoing-work) |
| 33 | +- [Additional Resources](#additional-resources) |
| 34 | +- [Conclusion](#conclusion) |
| 35 | + |
| 36 | +## What is SegMoE? |
| 37 | + |
| 38 | +SegMoE models follow the same architecture as Stable Diffusion. Like [Mixtral 8x7b](https://huggingface.co/blog/mixtral), a SegMoE model comes with multiple models in one. The way this works is by replacing some Feed-Forward layers with a sparse MoE layer. A MoE layer contains a router network to select which experts process which tokens most efficiently. |
| 39 | +You can use the `segmoe` package to create your own MoE models! The process takes just a few minutes. For further information, please visit [the Github Repository](https://github.com/segmind/segmoe). We take inspiration from the popular library [`mergekit`](https://github.com/arcee-ai/mergekit) to design `segmoe`. We thank the contributors of `mergekit` for such a useful library. |
| 40 | + |
| 41 | +For more details on MoEs, see the Hugging Face 🤗 post: [hf.co/blog/moe](https://huggingface.co/blog/moe). |
| 42 | + |
| 43 | +**SegMoE release TL;DR;** |
| 44 | + |
| 45 | +- Release of SegMoE-4x2, SegMoE-2x1 and SegMoE-SD4x2 versions |
| 46 | +- Release of custom MoE-making code |
| 47 | + |
| 48 | +### About the name |
| 49 | + |
| 50 | +The SegMoE MoEs are called **SegMoE-AxB**, where `A` refers to the number of expert models MoE-d together, while the second number refers to the number of experts involved in the generation of each image. Only some layers of the model (the feed-forward blocks, attentions, or all) are replicated depending on the configuration settings; the rest of the parameters are the same as in a Stable Diffusion model. For more details about how MoEs work, please refer to [the "Mixture of Experts Explained" post](https://huggingface.co/blog/moe). |
| 51 | + |
| 52 | +## Inference |
| 53 | + |
| 54 | +We release 3 merges on the Hub: |
| 55 | + |
| 56 | +1. [SegMoE 2x1](https://huggingface.co/segmind/SegMoE-2x1-v0) has two expert models. |
| 57 | +2. [SegMoE 4x2](https://huggingface.co/segmind/SegMoE-4x2-v0) has four expert models. |
| 58 | +3. [SegMoE SD 4x2](https://huggingface.co/segmind/SegMoE-SD-4x2-v0) has four Stable Diffusion 1.5 expert models. |
| 59 | + |
| 60 | +### Samples |
| 61 | + |
| 62 | +Images generated using [SegMoE 4x2](https://huggingface.co/segmind/SegMoE-4x2-v0) |
| 63 | + |
| 64 | + |
| 65 | + |
| 66 | +Images generated using [SegMoE 2x1](https://huggingface.co/segmind/SegMoE-2x1-v0): |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +Images generated using [SegMoE SD 4x2](https://huggingface.co/segmind/SegMoE-SD-4x2-v0) |
| 71 | + |
| 72 | + |
| 73 | + |
| 74 | +### Using 🤗 Diffusers |
| 75 | + |
| 76 | +Please, run the following command to install the `segmoe` package. Make sure you have the latest version of `diffusers` and `transformers` installed. |
| 77 | +```bash |
| 78 | +pip install -U segmoe diffusers transformers |
| 79 | +``` |
| 80 | + |
| 81 | +The following loads up the second model ("SegMoE 4x2") from the list above, and runs generation on it. |
| 82 | + |
| 83 | +```python |
| 84 | +from segmoe import SegMoEPipeline |
| 85 | + |
| 86 | +pipeline = SegMoEPipeline("segmind/SegMoE-4x2-v0", device="cuda") |
| 87 | + |
| 88 | +prompt = "cosmic canvas, orange city background, painting of a chubby cat" |
| 89 | +negative_prompt = "nsfw, bad quality, worse quality" |
| 90 | +img = pipeline( |
| 91 | + prompt=prompt, |
| 92 | + negative_prompt=negative_prompt, |
| 93 | + height=1024, |
| 94 | + width=1024, |
| 95 | + num_inference_steps=25, |
| 96 | + guidance_scale=7.5, |
| 97 | +).images[0] |
| 98 | +img.save("image.png") |
| 99 | +``` |
| 100 | + |
| 101 | + |
| 102 | + |
| 103 | +### Using a Local Model |
| 104 | + |
| 105 | +Alternatively, a local model can also be loaded up, here `segmoe_v0` is the path to the directory containing the local SegMoE model. Checkout [Creating your Own SegMoE](#creating-your-own-segmoe) to learn how to build your own! |
| 106 | + |
| 107 | +```python |
| 108 | +from segmoe import SegMoEPipeline |
| 109 | + |
| 110 | +pipeline = SegMoEPipeline("segmoe_v0", device="cuda") |
| 111 | + |
| 112 | +prompt = "cosmic canvas, orange city background, painting of a chubby cat" |
| 113 | +negative_prompt = "nsfw, bad quality, worse quality" |
| 114 | +img = pipeline( |
| 115 | + prompt=prompt, |
| 116 | + negative_prompt=negative_prompt, |
| 117 | + height=1024, |
| 118 | + width=1024, |
| 119 | + num_inference_steps=25, |
| 120 | + guidance_scale=7.5, |
| 121 | +).images[0] |
| 122 | +img.save("image.png") |
| 123 | +``` |
| 124 | + |
| 125 | +## Comparison |
| 126 | + |
| 127 | +Prompt understanding seems to improve, as shown in the images below. Each image shows the following models left to right: [SegMoE-2x1-v0](https://huggingface.co/segmind/SegMoE-2x1-v0), [SegMoE-4x2-v0](https://huggingface.co/segmind/SegMoE-4x2-v0), Base Model ([RealVisXL_V3.0](https://huggingface.co/SG161222/RealVisXL_V3.0)) |
| 128 | + |
| 129 | + |
| 130 | + |
| 131 | +<div align="center">three green glass bottles</div> |
| 132 | +<br> |
| 133 | + |
| 134 | + |
| 135 | + |
| 136 | +<div align="center">panda bear with aviator glasses on its head</div> |
| 137 | +<br> |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | +<div align="center">the statue of Liberty next to the Washington Monument</div> |
| 142 | + |
| 143 | + |
| 144 | + |
| 145 | +<div align="center">Taj Mahal with its reflection. detailed charcoal sketch.</div> |
| 146 | + |
| 147 | +## Creating your Own SegMoE |
| 148 | + |
| 149 | +Simply prepare a `config.yaml` file, with the following structure: |
| 150 | + |
| 151 | +```yaml |
| 152 | +base_model: Base Model Path, Model Card or CivitAI Download Link |
| 153 | +num_experts: Number of experts to use |
| 154 | +moe_layers: Type of Layers to Mix (can be "ff", "attn" or "all"). Defaults to "attn" |
| 155 | +num_experts_per_tok: Number of Experts to use |
| 156 | +experts: |
| 157 | + - source_model: Expert 1 Path, Model Card or CivitAI Download Link |
| 158 | + positive_prompt: Positive Prompt for computing gate weights |
| 159 | + negative_prompt: Negative Prompt for computing gate weights |
| 160 | + - source_model: Expert 2 Path, Model Card or CivitAI Download Link |
| 161 | + positive_prompt: Positive Prompt for computing gate weights |
| 162 | + negative_prompt: Negative Prompt for computing gate weights |
| 163 | + - source_model: Expert 3 Path, Model Card or CivitAI Download Link |
| 164 | + positive_prompt: Positive Prompt for computing gate weights |
| 165 | + negative_prompt: Negative Prompt for computing gate weights |
| 166 | + - source_model: Expert 4 Path, Model Card or CivitAI Download Link |
| 167 | + positive_prompt: Positive Prompt for computing gate weights |
| 168 | + negative_prompt: Negative Prompt for computing gate weights |
| 169 | +``` |
| 170 | +
|
| 171 | +Any number of models can be combined. For detailed information on how to create a config file, please refer to the [github repository](https://github.com/segmind/segmoe) |
| 172 | +
|
| 173 | +**Note** |
| 174 | +Both Hugging Face and CivitAI models are supported. For CivitAI models, paste the download link of the model, for example: "https://civitai.com/api/download/models/239306" |
| 175 | +
|
| 176 | +
|
| 177 | +Then run the following command: |
| 178 | +
|
| 179 | +```bash |
| 180 | +segmoe config.yaml segmoe_v0 |
| 181 | +``` |
| 182 | + |
| 183 | +This will create a folder called `segmoe_v0` with the following structure: |
| 184 | + |
| 185 | +```bash |
| 186 | +├── model_index.json |
| 187 | +├── scheduler |
| 188 | +│ └── scheduler_config.json |
| 189 | +├── text_encoder |
| 190 | +│ ├── config.json |
| 191 | +│ └── model.safetensors |
| 192 | +├── text_encoder_2 |
| 193 | +│ ├── config.json |
| 194 | +│ └── model.safetensors |
| 195 | +├── tokenizer |
| 196 | +│ ├── merges.txt |
| 197 | +│ ├── special_tokens_map.json |
| 198 | +│ ├── tokenizer_config.json |
| 199 | +│ └── vocab.json |
| 200 | +├── tokenizer_2 |
| 201 | +│ ├── merges.txt |
| 202 | +│ ├── special_tokens_map.json |
| 203 | +│ ├── tokenizer_config.json |
| 204 | +│ └── vocab.json |
| 205 | +├── unet |
| 206 | +│ ├── config.json |
| 207 | +│ └── diffusion_pytorch_model.safetensors |
| 208 | +└──vae |
| 209 | + ├── config.json |
| 210 | + └── diffusion_pytorch_model.safetensors |
| 211 | +``` |
| 212 | + |
| 213 | +Alternatively, you can also use the Python API to create a mixture of experts model: |
| 214 | + |
| 215 | +```python |
| 216 | +from segmoe import SegMoEPipeline |
| 217 | + |
| 218 | +pipeline = SegMoEPipeline("config.yaml", device="cuda") |
| 219 | + |
| 220 | +pipeline.save_pretrained("segmoe_v0") |
| 221 | +``` |
| 222 | + |
| 223 | +### Push to Hub |
| 224 | + |
| 225 | +The Model can be pushed to the hub via the huggingface-cli |
| 226 | + |
| 227 | +```bash |
| 228 | +huggingface-cli upload segmind/segmoe_v0 ./segmoe_v0 |
| 229 | +``` |
| 230 | + |
| 231 | +The model can also be pushed to the Hub directly from Python: |
| 232 | + |
| 233 | +```python |
| 234 | +from huggingface_hub import create_repo, upload_folder |
| 235 | + |
| 236 | +model_id = "segmind/SegMoE-v0" |
| 237 | + |
| 238 | +repo_id = create_repo(repo_id=model_id, exist_ok=True).repo_id |
| 239 | + |
| 240 | +upload_folder( |
| 241 | + repo_id=repo_id, |
| 242 | + folder_path="segmoe_v0", |
| 243 | + commit_message="Inital Commit", |
| 244 | + ignore_patterns=["step_*", "epoch_*"], |
| 245 | +) |
| 246 | +``` |
| 247 | + |
| 248 | +Detailed usage can be found [here](https://huggingface.co/docs/huggingface_hub/guides/upload) |
| 249 | + |
| 250 | +## Disclaimers and ongoing work |
| 251 | + |
| 252 | +- **Slower Speed**: If the number of experts per token is larger than 1, the MoE performs computation across several expert models. This makes it slower than a single SD 1.5 or SDXL model. |
| 253 | + |
| 254 | +- **High VRAM usage**: MoEs run inference very quickly but still need a large amount of VRAM (and hence an expensive GPU). This makes it challenging to use them in local setups, but they are great for deployments with multiple GPUs. As a reference point, SegMoE-4x2 requires 24GB of VRAM in half-precision. |
| 255 | + |
| 256 | +## Conclusion |
| 257 | + |
| 258 | +We built SegMoE to provide the community a new tool that can potentially create SOTA Diffusion Models with ease, just by combining pretrained models while keeping inference times low. We're excited to see what you can build with it! |
| 259 | + |
| 260 | +## Additional Resources |
| 261 | + |
| 262 | +- [Mixture of Experts Explained](https://huggingface.co/blog/moe) |
| 263 | +- [Mixture of Experts Models on Hugging Face](https://huggingface.co/models?other=moe) |
| 264 | + |
0 commit comments