This extension aim for integrating AnimateDiff w/ CLI into AUTOMATIC1111 Stable Diffusion WebUI w/ ControlNet. You can generate GIFs in exactly the same way as generating images after enabling this extension.
This extension implements AnimateDiff in a different way. It does not require you to clone the whole SD1.5 repository. It also applied (probably) the least modification to ldm
, so that you do not need to reload your model weights if you don't want to.
You might also be interested in another extension I created: Segment Anything for Stable Diffusion WebUI.
- Update
- How to Use
- WebUI Parameters
- Img2GIF
- Motion LoRA
- Prompt Travel
- ControlNet V2V
- SDXL
- Optimizations
- Model Zoo
- VRAM
- Batch Size
- Demo
- Tutorial
- Thanks
- Star History
- Sponsor
2023/07/20
v1.1.0: Fix gif duration, add loop number, remove auto-download, remove xformers, remove instructions on gradio UI, refactor README, add sponsor QR code.2023/07/24
v1.2.0: Fix incorrect insertion of motion modules, add option to change path to motion modules inSettings/AnimateDiff
, fix loading different motion modules.2023/09/04
v1.3.0: Support any community models with the same architecture; fix grey problem via #632023/09/11
v1.4.0: Support official v2 motion module (different architecture: GroupNorm not hacked, UNet middle layer has motion module).2023/09/14
: v1.4.1: Always changebeta
,alpha_comprod
andalpha_comprod_prev
to resolve grey problem in other samplers.2023/09/16
: v1.5.0: Randomize init latent to support better img2gif; add other output formats and infotext output; add appending reversed frames; refactor code to ease maintaining.2023/09/19
: v1.5.1: Support xformers, sdp, sub-quadratic attention optimization - VRAM usage decrease to 5.60GB with default setting.2023/09/22
: v1.5.2: Option to disable xformers atSettings/AnimateDiff
due to a bug in xformers, API support, option to enable GIF paletter optimization atSettings/AnimateDiff
, gifsicle optimization move toSettings/AnimateDiff
.2023/09/25
: v1.6.0: Motion LoRA supported. See Motion Lora for more information.2023/09/27
: v1.7.0: ControlNet supported. See ControlNet V2V for more information. Safetensors for some motion modules are also available now.2023/09/29
: v1.8.0: Infinite generation supported. See WebUI Parameters for more information.2023/10/01
: v1.8.1: Now you can uncheckBatch cond/uncond
inSettings/Optimization
if you want. This will reduce your VRAM (5.31GB -> 4.21GB for SDP) but take longer time.2023/10/08
: v1.9.0: Prompt travel supported. You must have ControlNet installed (you do not need to enable ControlNet) to try it. See Prompt Travel for how to trigger this feature.2023/10/11
: v1.9.1: Use state_dict key to guess mm version, replace match case with if else to support python<3.10, option to save PNG to custom dir (seeSettings/AnimateDiff
for detail), move hints to js, install imageio[ffmpeg] automatically when MP4 save fails.2023/10/16
: v1.9.2: Add context generator to completely remove any closed loop, prompt travel support closed loop, infotext fully supported including prompt travel, README refactor2023/10/19
: v1.9.3: Support webp output format. See #233 for more information.2023/10/21
: v1.9.4: Save prompt travel to output images,Reverse
merged toClosed loop
(See WebUI Parameters), removeTimestepEmbedSequential
hijack, removehints.js
, better explanation of several context-related parameters.2023/10/25
: v1.10.0: Support img2img batch. You need ControlNet installed to make it work properly (you do not need to enable ControlNet). See ControlNet V2V for more information.2023/10/29
: v1.11.0: Support HotShot-XL for SDXL. See SDXL for more information.2023/11/06
: v1.11.1: Optimize VRAM for ControlNet V2V, patch encode_pil_to_base64 for api return a video, save frames toAnimateDiff/yy-mm-dd/
, recover from assertion error, optional request id for API.2023/11/10
: v1.12.0: AnimateDiff for SDXL supported. See SDXL for more information. You need to add--disable-safe-unpickle
to your command line arguments to get rid of the bad file error.2023/11/16
: v1.12.1: FP8 precision and LCM sampler supported. See Optimizations for more information. You can also optionally upload videos to AWS S3 storage by configuring appropriately viaSettings/AnimateDiff AWS
.
For future update plan, please query here.
- Update your WebUI to v1.6.0 and ControlNet to v1.1.410, then install this extension via link. I do not plan to support older version.
- Download motion modules and put the model weights under
stable-diffusion-webui/extensions/sd-webui-animatediff/model/
. If you want to use another directory to save model weights, please go toSettings/AnimateDiff
. See model zoo for a list of available motion modules. - Enable
Pad prompt/negative prompt to be same length
inSettings/Optimization
and clickApply settings
. You must do this to prevent generating two separate unrelated GIFs. CheckingBatch cond/uncond
is optional, which can improve speed but increase VRAM usage. - DO NOT disable hash calculation, otherwise AnimateDiff will have trouble figuring out when you switch motion module.
- Go to txt2img if you want to try txt2gif and img2img if you want to try img2gif.
- Choose an SD1.5 checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please change batch number, instead of batch size.
- Enable AnimateDiff extension, set up each parameter, then click
Generate
. - You should see the output GIF on the output gallery. You can access GIF output at
stable-diffusion-webui/outputs/{txt2img or img2img}-images/AnimateDiff/{yy-mm-dd}
. You can also access image frames atstable-diffusion-webui/outputs/{txt2img or img2img}-images/{yy-mm-dd}
. You may choose to save frames for each generation into separate directories inSettings/AnimateDiff
.
It is quite similar to the way you use ControlNet. API will return a video in base64 format. In format
, PNG
means to save frames to your file system without returning all the frames. If you want your API to return all frames, please add Frame
to format
list. For most up-to-date parameters, please read here.
'alwayson_scripts': {
'AnimateDiff': {
'args': [{
'model': 'mm_sd_v15_v2.ckpt', # Motion module
'format': ['GIF'], # Save format, 'GIF' | 'MP4' | 'PNG' | 'WEBP' | 'WEBM' | 'TXT' | 'Frame'
'enable': True, # Enable AnimateDiff
'video_length': 16, # Number of frames
'fps': 8, # FPS
'loop_number': 0, # Display loop number
'closed_loop': 'R+P', # Closed loop, 'N' | 'R-P' | 'R+P' | 'A'
'batch_size': 16, # Context batch size
'stride': 1, # Stride
'overlap': -1, # Overlap
'interp': 'Off', # Frame interpolation, 'Off' | 'FILM'
'interp_x': 10 # Interp X
'video_source': 'path/to/video.mp4', # Video source
'video_path': 'path/to/frames', # Video path
'latent_power': 1, # Latent power
'latent_scale': 32, # Latent scale
'last_frame': None, # Optional last frame
'latent_power_last': 1, # Optional latent power for last frame
'latent_scale_last': 32,# Optional latent scale for last frame
'request_id': '' # Optional request id. If provided, outputs will have request id as filename suffix
}
]
}
},
-
Save format — Format of the output. Choose at least one of "GIF"|"MP4"|"WEBP"|"WEBM"|"PNG". Check "TXT" if you want infotext, which will live in the same directory as the output GIF. Infotext is also accessible via
stable-diffusion-webui/params.txt
and outputs in all formats.- You can optimize GIF with
gifsicle
(apt install gifsicle
required, read #91 for more information) and/orpalette
(read #104 for more information). Go toSettings/AnimateDiff
to enable them. - You can set quality and lossless for WEBP via
Settings/AnimateDiff
. Read #233 for more information. - If you are using API, by adding "PNG" to
format
, you can save all frames to your file system without returning all the frames. If you want your API to return all frames, please addFrame
toformat
list.
- You can optimize GIF with
-
Number of frames — Choose whatever number you like.
If you enter 0 (default):
- If you submit a video via
Video source
/ enter a video path viaVideo path
/ enable ANY batch ControlNet, the number of frames will be the number of frames in the video (use shortest if more than one videos are submitted). - Otherwise, the number of frames will be your
Context batch size
described below.
If you enter something smaller than your
Context batch size
other than 0: you will get the firstNumber of frames
frames as your output GIF from your whole generation. All following frames will not appear in your generated GIF, but will be saved as PNGs as usual. Do not setNumber of frames
to be something smaler thanContext batch size
other than 0 because of #213. - If you submit a video via
-
FPS — Frames per second, which is how many frames (images) are shown every second. If 16 frames are generated at 8 frames per second, your GIF’s duration is 2 seconds. If you submit a source video, your FPS will be the same as the source video.
-
Display loop number — How many times the GIF is played. A value of
0
means the GIF never stops playing. -
Context batch size — How many frames will be passed into the motion module at once. The SD1.5 motion modules are trained with 16 frames, so it’ll give the best results when the number of frames is set to
16
. SDXL HotShotXL motion modules are trained with 8 frames instead. Choose [1, 24] for V1 / HotShotXL motion modules and [1, 32] for V2 / AnimateDiffXL motion modules. -
Closed loop — Closed loop means that this extension will try to make the last frame the same as the first frame.
- When
Number of frames
>Context batch size
, including when ControlNet is enabled and the source video frame number >Context batch size
andNumber of frames
is 0, closed loop will be performed by AnimateDiff infinite context generator. - When
Number of frames
<=Context batch size
, AnimateDiff infinite context generator will not be effective. Only when you chooseA
will AnimateDiff append reversed list of frames to the original list of frames to form closed loop.
See below for explanation of each choice:
-
N
means absolutely no closed loop - this is the only available option ifNumber of frames
is smaller thanContext batch size
other than 0. -
R-P
means that the extension will try to reduce the number of closed loop context. The prompt travel will not be interpolated to be a closed loop. -
R+P
means that the extension will try to reduce the number of closed loop context. The prompt travel will be interpolated to be a closed loop. -
A
means that the extension will aggressively try to make the last frame the same as the first frame. The prompt travel will be interpolated to be a closed loop.
- When
-
Stride — Max motion stride as a power of 2 (default: 1).
- Due to the limitation of the infinite context generator, this parameter is effective only when
Number of frames
>Context batch size
, including when ControlNet is enabled and the source video frame number >Context batch size
andNumber of frames
is 0. - "Absolutely no closed loop" is only possible when
Stride
is 1. - For each 1 <=
$2^i$ <=Stride
, the infinite context generator will try to make frames$2^i$ apart temporal consistent. For example, ifStride
is 4 andNumber of frames
is 8, it will make the following frames temporal consistent:-
Stride
== 1: [0, 1, 2, 3, 4, 5, 6, 7] -
Stride
== 2: [0, 2, 4, 6], [1, 3, 5, 7] -
Stride
== 4: [0, 4], [1, 5], [2, 6], [3, 7]
-
- Due to the limitation of the infinite context generator, this parameter is effective only when
-
Overlap — Number of frames to overlap in context. If overlap is -1 (default): your overlap will be
Context batch size
// 4.- Due to the limitation of the infinite context generator, this parameter is effective only when
Number of frames
>Context batch size
, including when ControlNet is enabled and the source video frame number >Context batch size
andNumber of frames
is 0.
- Due to the limitation of the infinite context generator, this parameter is effective only when
-
Frame Interpolation — Interpolate between frames with Deforum's FILM implementation. Requires Deforum extension. #128
-
Interp X — Replace each input frame with X interpolated output frames. #128.
-
Video source — [Optional] Video source file for ControlNet V2V. You MUST enable ControlNet. It will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel. You can of course submit one control image via
Single Image
tab or an input directory viaBatch
tab, which will override this video source input and work as usual. -
Video path — [Optional] Folder for source frames for ControlNet V2V, but lower priority than
Video source
. You MUST enable ControlNet. It will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet. You can of course submit one control image viaSingle Image
tab or an input directory viaBatch
tab, which will override this video path input and work as usual.- For people who want to inpaint videos: enter a folder which contains two sub-folders
image
andmask
on ControlNet inpainting unit. These two sub-folders should contain the same number of images. This extension will match them according to the same sequence. Using my Segment Anything extension can make your life much easier.
- For people who want to inpaint videos: enter a folder which contains two sub-folders
Please read
- Img2GIF for extra parameters on img2gif panel.
- Motion LoRA for how to use Motion LoRA.
- Prompt Travel for how to trigger prompt travel.
- ControlNet V2V for how to use ControlNet V2V.
You need to go to img2img and submit an init frame via A1111 panel. You can optionally submit a last frame via extension panel.
By default: your init_latent
will be changed to
init_alpha = (1 - frame_number ^ latent_power / latent_scale)
init_latent = init_latent * init_alpha + random_tensor * (1 - init_alpha)
If you upload a last frame: your init_latent
will be changed in a similar way. Read this code to understand how it works.
Download and use them like any other LoRA you use (example: download motion lora to stable-diffusion-webui/models/Lora
and add <lora:v2_lora_PanDown:0.8>
to your positive prompt). Motion LoRA only supports V2 motion modules.
Write positive prompt following the example below.
The first line is head prompt, which is optional. You can write no/single/multiple lines of head prompts.
The second and third lines are for prompt interpolation, in format frame number
: prompt
. Your frame number
should be in ascending order, smaller than the total Number of frames
. The first frame is 0 index.
The last line is tail prompt, which is optional. You can write no/single/multiple lines of tail prompts. If you don't need this feature, just write prompts in the old way.
1girl, yoimiya (genshin impact), origen, line, comet, wink, Masterpiece, BestQuality. UltraDetailed, <lora:LineLine2D:0.7>, <lora:yoimiya:0.8>,
0: closed mouth
8: open mouth
smile
You need to go to txt2img / img2img-batch and submit source video or path to frames. Each ControlNet will find control images according to this priority:
- ControlNet
Single Image
tab orBatch
tab. Simply upload a control image or a directory of control frames is enough. - Img2img Batch tab
Input directory
if you are using img2img batch. If you upload a directory of control frames, it will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel. - AnimateDiff
Video Source
. If you upload a video throughVideo Source
, it will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel. - AnimateDiff
Video Path
. If you upload a path to frames throughVideo Path
, it will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel.
Number of frames
will be capped to the minimum number of images among all folders you provide. Each control image in each folder will be applied to one single frame. If you upload one single image for a ControlNet unit, that image will control ALL frames.
For people who want to inpaint videos: enter a folder which contains two sub-folders image
and mask
on ControlNet inpainting unit. These two sub-folders should contain the same number of images. This extension will match them according to the same sequence. Using my Segment Anything extension can make your life much easier.
AnimateDiff in img2img batch will be available in v1.10.0.
AnimateDiffXL and HotShot-XL have identical architecture to AnimateDiff-SD1.5. The only 2 difference are
- HotShot-XL is trained with 8 frames instead of 16 frames. You are recommended to set
Context batch size
to 8 for HotShot-XL. - AnimateDiffXL is still trained with 16 frames. You do not need to change
Context batch size
for AnimateDiffXL. - AnimateDiffXL & HotShot-XL have fewer layers compared to AnimateDiff-SD1.5 because of SDXL.
- AnimateDiffXL is trained with higher resolution compared to HotShot-XL.
Although AnimateDiffXL & HotShot-XL have identical structure with AnimateDiff-SD1.5, I strongly discourage you from using AnimateDiff-SD1.5 for SDXL, or using HotShot / AnimateDiffXL for SD1.5 - you will get severe artifect if you do that. I have decided not to supported that, despite the fact that it is not hard for me to do that.
Technically all features available for AnimateDiff + SD1.5 are also available for (AnimateDiff / HotShot) + SDXL. However, I have not tested all of them. I have tested infinite context generation and prompt travel; I have not tested ControlNet. If you find any bug, please report it to me.
For download link, please read Model Zoo. For VRAM usage, please read VRAM. For demo, please see demo.
Optimizations can be significantly helpful if you want to improve speed and reduce VRAM usage. With attention optimization, FP8 and unchecking Batch cond/uncond
in Settings/Optimization
, I am able to run 4 x ControlNet + AnimateDiff + Stable Diffusion to generate 36 frames of 1024 * 1024 images with 18GB VRAM.
Adding --xformers
/ --opt-sdp-attention
to your command lines can significantly reduce VRAM and improve speed. However, due to a bug in xformers, you may or may not get CUDA error. If you get CUDA error, please either completely switch to --opt-sdp-attention
, or preserve --xformers
-> go to Settings/AnimateDiff
-> choose "Optimize attention layers with sdp (torch >= 2.0.0 required)".
FP8 requires torch >= 2.1.0 and WebUI test-fp8 branch by @KohakuBlueleaf. Follow these steps to enable FP8:
- Switch to
test-fp8
branch viagit checkout test-fp8
in yourstable-diffusion-webui
directory. - Reinstall torch via adding
--reinstall-torch
ONCE to your command line arguments. - Add
--opt-unet-fp8-storage
to your command line arguments and launch WebUI.
Latent Consistency Model is a recent breakthrough in Stable Diffusion community. I provide a "gift" to everyone who update this extension to >= v1.12.1 - you will find LCM
sampler in the normal place you select samplers in WebUI. You can generate images / videos within 6-8 steps if you
- select
Euler A
/Euler
/LCM
sampler (other samplers may also work, subject to further experiments) - use LCM LoRA
- use a low CFG denoising strength (1-2 is recommended)
Note that LCM sampler is still under experiment and subject to change adhering to @luosiallen's wish.
Benefits of using this extension instead of sd-webui-lcm are
- you do not need to install diffusers
- you can use LCM sampler with any other extensions, such as ControlNet and AnimateDiff
- Remove any VRAM heavy arguments such as
--no-half
. These arguments can significantly increase VRAM usage and reduce speed. - Check
Batch cond/uncond
inSettings/Optimization
to improve speed; uncheck it to reduce VRAM usage.
mm_sd_v14.ckpt
&mm_sd_v15.ckpt
&mm_sd_v15_v2.ckpt
&mm_sdxl_v10_beta.ckpt
by @guoyww: Google Drive | HuggingFace | CivitAImm_sd_v14.safetensors
&mm_sd_v15.safetensors
&mm_sd_v15_v2.safetensors
by @neph1: HuggingFacemm_sd_v14.fp16.safetensors
&mm_sd_v15.fp16.safetensors
&mm_sd_v15_v2.fp16.safetensors
by @neggles: HuggingFacemm-Stabilized_high.pth
&mm-Stabbilized_mid.pth
by @manshoety: HuggingFacetemporaldiff-v1-animatediff.ckpt
by @CiaraRowles: HuggingFacehsxl_temporal_layers.safetensors
&hsxl_tenporal_layers.f16.safetensors
by @hotshotco: HuggingFace
Actual VRAM usage depends on your image size and context batch size. You can try to reduce image size or context batch size to reduce VRAM usage.
The following data are SD1.5 + AnimateDiff, tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=W=512, frame=16 (default setting). w/
/w/o
means Batch cond/uncond
in Settings/Optimization
is checked/unchecked.
Optimization | VRAM w/ | VRAM w/o |
---|---|---|
No optimization | 12.13GB | |
xformers/sdp | 5.60GB | 4.21GB |
sub-quadratic | 10.39GB |
For SDXL + HotShot + SDP, tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=W=512, frame=8 (default setting), you need 8.66GB VRAM.
For SDXL + AnimateDiff + SDP, tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=1024, W=768, frame=16, you need 13.87GB VRAM.
Batch size on WebUI will be replaced by GIF frame number internally: 1 full GIF generated in 1 batch. If you want to generate multiple GIF at once, please change batch number.
Batch number is NOT the same as batch size. In A1111 WebUI, batch number is above batch size. Batch number means the number of sequential steps, but batch size means the number of parallel steps. You do not have to worry too much when you increase batch number, but you do need to worry about your VRAM when you increase your batch size (where in this extension, video frame number). You do not need to change batch size at all when you are using this extension.
We are currently developing approach to support batch size on WebUI in the near future.
AnimateDiff | Extension | img2img |
---|---|---|
No LoRA | PanDown | PanLeft |
---|---|---|
The prompt is similar to above.
You should be able to read infotext to understand how I generated this sample.
TODO
TODO
I thank researchers from Shanghai AI Lab, especially @guoyww for creating AnimateDiff. I also thank @neggles and @s9roll7 for creating and improving AnimateDiff CLI Prompt Travel. This extension could not be made possible without these creative works.
I also thank community developers, especially
- @zappityzap who developed the majority of the output features
- @TDS4874 and @opparco for resolving the grey issue which significantly improve the performance
- @talesofai who developed i2v in this forked repo
- @rkfg for developing GIF palette optimization
and many others who have contributed to this extension.
I also thank community users, especially @streamline who provided dataset and workflow of ControlNet V2V. His workflow is extremely amazing and definitely worth checking out.
You can sponsor me via WeChat, AliPay or PayPal. You can also support me via patreon, ko-fi or afdian.
AliPay | PayPal | |
---|---|---|