Skip to content

Commit

Permalink
add sdxl to prompt weighting (huggingface#4439)
Browse files Browse the repository at this point in the history
* add sdxl to prompt weighting

* Update docs/source/en/using-diffusers/weighted_prompts.md

* Update docs/source/en/using-diffusers/weighted_prompts.md

* add sdxl to prompt weighting

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* Apply suggestions from code review

* Update docs/source/en/using-diffusers/weighted_prompts.md

* Apply suggestions from code review

* correct

---------

Co-authored-by: Steven Liu <[email protected]>
  • Loading branch information
patrickvonplaten and stevhliu authored Aug 3, 2023
1 parent e391b78 commit 1a8843f
Showing 1 changed file with 62 additions and 3 deletions.
65 changes: 62 additions & 3 deletions docs/source/en/using-diffusers/weighted_prompts.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,16 @@ This is called "prompt-weighting" and has been a highly demanded feature by the

## How to do prompt-weighting in Diffusers

We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/v0.14.0/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline.
We believe the role of `diffusers` is to be a toolbox that provides essential features that enable other projects, such as [InvokeAI](https://github.com/invoke-ai/InvokeAI) or [diffuzers](https://github.com/abhishekkrthakur/diffuzers), to build powerful UIs. In order to support arbitrary methods to manipulate prompts, `diffusers` exposes a [`prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.prompt_embeds) function argument and an optional [`negative_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.negative_prompt_embeds) function argument to many pipelines such as [`StableDiffusionPipeline`], [`StableDiffusionControlNetPipeline`], [`StableDiffusionXLPipeline`], allowing to directly pass the "prompt-weighted"/scaled text embeddings to the pipeline.

The [compel library](https://github.com/damian0815/compel) provides an easy way to emphasize or de-emphasize portions of the prompt for you. We strongly recommend it instead of preparing the embeddings yourself.

Let's look at a simple example. Imagine you want to generate an image of `"a red cat playing with a ball"` as
follows:


### StableDiffusionPipeline

```py
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler

Expand All @@ -53,8 +56,8 @@ As you can see, there is no "ball" in the image. Let's emphasize this part!

For this we should install the `compel` library:

```
pip install compel
```py
pip install compel --upgrade
```

and then create a `Compel` object:
Expand Down Expand Up @@ -108,3 +111,59 @@ compel = Compel(

Also, please check out the documentation of the [compel](https://github.com/damian0815/compel) library for
more information.

### StableDiffusionXLPipeline

For StableDiffusionXL we need to not only pass `prompt_embeds` (and optionally `negative_prompt_embeds`), but also [`pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.pooled_prompt_embeds) and optionally [`negative_pooled_prompt_embeds`](https://huggingface.co/docs/diffusers/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLInpaintPipeline.__call__.negative_pooled_prompt_embeds).
In addition, [`StableDiffusionXLPipeline`] has two tokenizers and two text encoders which both need to be used to weight the prompt.
Luckily, [`compel`](https://github.com/damian0815/compel) takes care of SDXL's special needs - all we have to do is to pass both tokenizers and text encoders to the `Compel` class.


```py
from compel import Compel, ReturnedEmbeddingsType
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16",
use_safetensors=True,
torch_dtype=torch.float16
).to("cuda")

compel = Compel(
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] ,
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
requires_pooled=[False, True]
)
```

Let's try our example from above again. We use the same seed for both prompts and upweight ball by a factor of 1.5 for the first
prompt and downweight ball by 40% for the second prompt.

```py
# upweight "ball"
prompt = ["a red cat playing with a (ball)1.5", "a red cat playing with a (ball)0.6"]
conditioning, pooled = compel(prompt)


# generate image
generator = [torch.Generator().manual_seed(33) for _ in range(len(prompt))]
images = pipeline(prompt_embeds=conditioning, pooled_prompt_embeds=pooled, generator=generator, num_inference_steps=30).images
```

Let's have a look at the result.

<div class="flex gap-4">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball1.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">"a red cat playing with a (ball)1.5"</figcaption>
</div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/compel/sdxl_ball2.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">a red cat playing with a (ball)0.6</figcaption>
</div>
</div>

We can see that the ball is almost completely gone on the right image while it's clearly visible on the left image.
For more information and more tricks you can use `compel` with, please have a look at the [compel docs](https://github.com/damian0815/compel/blob/main/doc/syntax.md) as well.

0 comments on commit 1a8843f

Please sign in to comment.