Skip to content

Commit

Permalink
Merge branch 'main' into rl
Browse files Browse the repository at this point in the history
  • Loading branch information
natolambert committed Nov 14, 2022
2 parents 915c41e + 33d7e89 commit c901889
Show file tree
Hide file tree
Showing 55 changed files with 2,501 additions and 1,187 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ jobs:
- name: Run fast PyTorch tests on M1 (MPS)
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps tests/
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
- name: Failure short reports
if: ${{ failure() }}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ If you just want to play around with some web demos, you can try out the followi
<p>

**Schedulers**: Algorithm class for both **inference** and **training**.
The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training.
The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. Also known as **Samplers**.
*Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902)

<p align="center">
Expand Down
3 changes: 2 additions & 1 deletion docs/source/api/pipelines/ddim.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ The abstract of the paper is the following:

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

The original codebase of this paper can be found [here](https://github.com/ermongroup/ddim).
The original codebase of this paper can be found here: [ermongroup/ddim](https://github.com/ermongroup/ddim).
For questions, feel free to contact the author on [tsong.me](https://tsong.me/).

## Available Pipelines:

Expand Down
5 changes: 5 additions & 0 deletions docs/source/api/pipelines/latent_diffusion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,15 @@ The original codebase can be found [here](https://github.com/CompVis/latent-diff
| Pipeline | Tasks | Colab
|---|---|:---:|
| [pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) | *Text-to-Image Generation* | - |
| [pipeline_latent_diffusion_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion_superresolution.py) | *Super Resolution* | - |

## Examples:


## LDMTextToImagePipeline
[[autodoc]] pipelines.latent_diffusion.pipeline_latent_diffusion.LDMTextToImagePipeline
- __call__

## LDMSuperResolutionPipeline
[[autodoc]] pipelines.latent_diffusion.pipeline_latent_diffusion_superresolution.LDMSuperResolutionPipeline
- __call__
2 changes: 1 addition & 1 deletion docs/source/api/schedulers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Diffusers contains multiple pre-built schedule functions for the diffusion proce

## What is a scheduler?

The schedule functions, denoted *Schedulers* in the library take in the output of a trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample.
The schedule functions, denoted *Schedulers* in the library take in the output of a trained model, a sample which the diffusion process is iterating on, and a timestep to return a denoised sample. That's why schedulers may also be called *Samplers* in other diffusion models implementations.

- Schedulers define the methodology for iteratively adding noise to an image or for updating a sample based on model outputs.
- adding noise in different manners represent the algorithmic processes to train a diffusion model by adding noise to images.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/using-diffusers/img2img.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/st

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
init_image.thumbnail((768, 768))

prompt = "A fantasy landscape, trending on artstation"

Expand Down
17 changes: 14 additions & 3 deletions examples/community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,9 +179,20 @@ images = pipe.inpaint(prompt=prompt, init_image=init_image, mask_image=mask_imag
As shown above this one pipeline can run all both "text-to-image", "image-to-image", and "inpainting" in one pipeline.

### Long Prompt Weighting Stable Diffusion

The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]"
The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class.
Features of this custom pipeline:
- Input a prompt without the 77 token length limit.
- Includes tx2img, img2img. and inpainting pipelines.
- Emphasize/weigh part of your prompt with parentheses as so: `a baby deer with (big eyes)`
- De-emphasize part of your prompt as so: `a [baby] deer with big eyes`
- Precisely weigh part of your prompt as so: `a baby deer with (big eyes:1.3)`

Prompt weighting equivalents:
- `a baby deer with` == `(a baby deer with:1.0)`
- `(big eyes)` == `(big eyes:1.1)`
- `((big eyes))` == `(big eyes:1.21)`
- `[big eyes]` == `(big eyes:0.91)`

You can run this custom pipeline as so:

#### pytorch

Expand Down
30 changes: 26 additions & 4 deletions examples/community/clip_guided_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,14 @@
from torch import nn
from torch.nn import functional as F

from diffusers import AutoencoderKL, DiffusionPipeline, LMSDiscreteScheduler, PNDMScheduler, UNet2DConditionModel
from diffusers import (
AutoencoderKL,
DDIMScheduler,
DiffusionPipeline,
LMSDiscreteScheduler,
PNDMScheduler,
UNet2DConditionModel,
)
from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion import StableDiffusionPipelineOutput
from torchvision import transforms
from transformers import CLIPFeatureExtractor, CLIPModel, CLIPTextModel, CLIPTokenizer
Expand Down Expand Up @@ -56,7 +63,7 @@ def __init__(
clip_model: CLIPModel,
tokenizer: CLIPTokenizer,
unet: UNet2DConditionModel,
scheduler: Union[PNDMScheduler, LMSDiscreteScheduler],
scheduler: Union[PNDMScheduler, LMSDiscreteScheduler, DDIMScheduler],
feature_extractor: CLIPFeatureExtractor,
):
super().__init__()
Expand Down Expand Up @@ -123,7 +130,7 @@ def cond_fn(
# predict the noise residual
noise_pred = self.unet(latent_model_input, timestep, encoder_hidden_states=text_embeddings).sample

if isinstance(self.scheduler, PNDMScheduler):
if isinstance(self.scheduler, (PNDMScheduler, DDIMScheduler)):
alpha_prod_t = self.scheduler.alphas_cumprod[timestep]
beta_prod_t = 1 - alpha_prod_t
# compute predicted original sample from predicted noise also called
Expand Down Expand Up @@ -176,6 +183,7 @@ def __call__(
num_inference_steps: Optional[int] = 50,
guidance_scale: Optional[float] = 7.5,
num_images_per_prompt: Optional[int] = 1,
eta: float = 0.0,
clip_guidance_scale: Optional[float] = 100,
clip_prompt: Optional[Union[str, List[str]]] = None,
num_cutouts: Optional[int] = 4,
Expand Down Expand Up @@ -275,6 +283,20 @@ def __call__(
# scale the initial noise by the standard deviation required by the scheduler
latents = latents * self.scheduler.init_noise_sigma

# prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
# eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
# eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
# and should be between [0, 1]
accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
extra_step_kwargs = {}
if accepts_eta:
extra_step_kwargs["eta"] = eta

# check if the scheduler accepts generator
accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
if accepts_generator:
extra_step_kwargs["generator"] = generator

for i, t in enumerate(self.progress_bar(timesteps_tensor)):
# expand the latents if we are doing classifier free guidance
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
Expand Down Expand Up @@ -306,7 +328,7 @@ def __call__(
)

# compute the previous noisy sample x_t -> x_t-1
latents = self.scheduler.step(noise_pred, t, latents).prev_sample
latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample

# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
Expand Down
16 changes: 0 additions & 16 deletions examples/dreambooth/train_dreambooth_flax.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,22 +327,6 @@ def main():
if args.seed is not None:
set_seed(args.seed)

if jax.process_index() == 0:
if args.push_to_hub:
if args.hub_model_id is None:
repo_name = get_full_repo_name(Path(args.output_dir).name, token=args.hub_token)
else:
repo_name = args.hub_model_id
repo = Repository(args.output_dir, clone_from=repo_name)

with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
if "step_*" not in gitignore:
gitignore.write("step_*\n")
if "epoch_*" not in gitignore:
gitignore.write("epoch_*\n")
elif args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)

rng = jax.random.PRNGKey(args.seed)

if args.with_prior_preservation:
Expand Down
11 changes: 9 additions & 2 deletions examples/unconditional_image_generation/train_unconditional.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
from accelerate import Accelerator
from accelerate.logging import get_logger
from datasets import load_dataset
from diffusers import DDPMPipeline, DDPMScheduler, UNet2DModel
from diffusers import DDPMPipeline, DDPMScheduler, UNet2DModel, __version__
from diffusers.optimization import get_scheduler
from diffusers.training_utils import EMAModel
from diffusers.utils import deprecate
from huggingface_hub import HfFolder, Repository, whoami
from packaging import version
from torchvision.transforms import (
CenterCrop,
Compose,
Expand All @@ -28,6 +30,7 @@


logger = get_logger(__name__)
diffusers_version = version.parse(version.parse(__version__).base_version)


def _extract_into_tensor(arr, timesteps, broadcast_shape):
Expand Down Expand Up @@ -406,7 +409,11 @@ def transforms(examples):
scheduler=noise_scheduler,
)

generator = torch.manual_seed(0)
deprecate("todo: remove this check", "0.10.0", "when the most used version is >= 0.8.0")
if diffusers_version < version.parse("0.8.0"):
generator = torch.manual_seed(0)
else:
generator = torch.Generator(device=pipeline.device).manual_seed(0)
# run pipeline in inference (sample random noise and denoise)
images = pipeline(
generator=generator,
Expand Down
27 changes: 14 additions & 13 deletions scripts/convert_ldm_original_checkpoint_to_diffusers.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,9 @@ def assign_to_checkpoint(
continue

# Global renaming happens here
new_path = new_path.replace("middle_block.0", "mid.resnets.0")
new_path = new_path.replace("middle_block.1", "mid.attentions.0")
new_path = new_path.replace("middle_block.2", "mid.resnets.1")
new_path = new_path.replace("middle_block.0", "mid_block.resnets.0")
new_path = new_path.replace("middle_block.1", "mid_block.attentions.0")
new_path = new_path.replace("middle_block.2", "mid_block.resnets.1")

if additional_replacements is not None:
for replacement in additional_replacements:
Expand Down Expand Up @@ -175,15 +175,16 @@ def convert_ldm_checkpoint(checkpoint, config):
attentions = [key for key in input_blocks[i] if f"input_blocks.{i}.1" in key]

if f"input_blocks.{i}.0.op.weight" in checkpoint:
new_checkpoint[f"downsample_blocks.{block_id}.downsamplers.0.conv.weight"] = checkpoint[
new_checkpoint[f"down_blocks.{block_id}.downsamplers.0.conv.weight"] = checkpoint[
f"input_blocks.{i}.0.op.weight"
]
new_checkpoint[f"downsample_blocks.{block_id}.downsamplers.0.conv.bias"] = checkpoint[
new_checkpoint[f"down_blocks.{block_id}.downsamplers.0.conv.bias"] = checkpoint[
f"input_blocks.{i}.0.op.bias"
]
continue

paths = renew_resnet_paths(resnets)
meta_path = {"old": f"input_blocks.{i}.0", "new": f"downsample_blocks.{block_id}.resnets.{layer_in_block_id}"}
meta_path = {"old": f"input_blocks.{i}.0", "new": f"down_blocks.{block_id}.resnets.{layer_in_block_id}"}
resnet_op = {"old": "resnets.2.op", "new": "downsamplers.0.op"}
assign_to_checkpoint(
paths, new_checkpoint, checkpoint, additional_replacements=[meta_path, resnet_op], config=config
Expand All @@ -193,18 +194,18 @@ def convert_ldm_checkpoint(checkpoint, config):
paths = renew_attention_paths(attentions)
meta_path = {
"old": f"input_blocks.{i}.1",
"new": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}",
"new": f"down_blocks.{block_id}.attentions.{layer_in_block_id}",
}
to_split = {
f"input_blocks.{i}.1.qkv.bias": {
"key": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}.key.bias",
"query": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}.query.bias",
"value": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}.value.bias",
"key": f"down_blocks.{block_id}.attentions.{layer_in_block_id}.key.bias",
"query": f"down_blocks.{block_id}.attentions.{layer_in_block_id}.query.bias",
"value": f"down_blocks.{block_id}.attentions.{layer_in_block_id}.value.bias",
},
f"input_blocks.{i}.1.qkv.weight": {
"key": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}.key.weight",
"query": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}.query.weight",
"value": f"downsample_blocks.{block_id}.attentions.{layer_in_block_id}.value.weight",
"key": f"down_blocks.{block_id}.attentions.{layer_in_block_id}.key.weight",
"query": f"down_blocks.{block_id}.attentions.{layer_in_block_id}.query.weight",
"value": f"down_blocks.{block_id}.attentions.{layer_in_block_id}.value.weight",
},
}
assign_to_checkpoint(
Expand Down
15 changes: 14 additions & 1 deletion scripts/convert_original_stable_diffusion_to_diffusers.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@
from diffusers import (
AutoencoderKL,
DDIMScheduler,
DPMSolverMultistepScheduler,
EulerAncestralDiscreteScheduler,
EulerDiscreteScheduler,
LDMTextToImagePipeline,
LMSDiscreteScheduler,
PNDMScheduler,
Expand Down Expand Up @@ -647,7 +650,7 @@ def convert_ldm_clip_checkpoint(checkpoint):
"--scheduler_type",
default="pndm",
type=str,
help="Type of scheduler to use. Should be one of ['pndm', 'lms', 'ddim']",
help="Type of scheduler to use. Should be one of ['pndm', 'lms', 'ddim', 'euler', 'euler-ancest', 'dpm']",
)
parser.add_argument(
"--extract_ema",
Expand Down Expand Up @@ -686,6 +689,16 @@ def convert_ldm_clip_checkpoint(checkpoint):
)
elif args.scheduler_type == "lms":
scheduler = LMSDiscreteScheduler(beta_start=beta_start, beta_end=beta_end, beta_schedule="scaled_linear")
elif args.scheduler_type == "euler":
scheduler = EulerDiscreteScheduler(beta_start=beta_start, beta_end=beta_end, beta_schedule="scaled_linear")
elif args.scheduler_type == "euler-ancestral":
scheduler = EulerAncestralDiscreteScheduler(
beta_start=beta_start, beta_end=beta_end, beta_schedule="scaled_linear"
)
elif args.scheduler_type == "dpm":
scheduler = DPMSolverMultistepScheduler(
beta_start=beta_start, beta_end=beta_end, beta_schedule="scaled_linear"
)
elif args.scheduler_type == "ddim":
scheduler = DDIMScheduler(
beta_start=beta_start,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
DDPMPipeline,
KarrasVePipeline,
LDMPipeline,
LDMSuperResolutionPipeline,
PNDMPipeline,
RePaintPipeline,
ScoreSdeVePipeline,
Expand Down
3 changes: 3 additions & 0 deletions src/diffusers/models/attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,9 @@ def _sliced_attention(self, query, key, value, sequence_length, dim):
return hidden_states

def _memory_efficient_attention_xformers(self, query, key, value):
query = query.contiguous()
key = key.contiguous()
value = value.contiguous()
hidden_states = xformers.ops.memory_efficient_attention(query, key, value, attn_bias=None)
hidden_states = self.reshape_batch_dim_to_heads(hidden_states)
return hidden_states
Expand Down
Loading

0 comments on commit c901889

Please sign in to comment.