For context: Stable Diffusion is an AI image generation tool and AUTOMATIC1111/stable-diffusion-webui is a web ui for that tool.
Parseq is a parameter sequencer for AUTOMATIC1111/stable-diffusion-webui. You can use it to generate videos with tight control and flexible interpolation over many Stable Diffusion parameters (such as seed, scale, prompt weights, denoising strength...), as well as input processing parameter (such as zoom, pan, 3D rotation...).
It can be used in 2 ways: with a custom script for Automatic1111, or with the Deforum extension for Automatic1111.
You can jump straight into the UI here: https://sd-parseq.web.app/ .
Preferred approach. Use this if you like animating with Deforum and want to use Parseq as an alternative to math keyframing. To get started:
- Have a working installation of Automatic1111's Stable Diffusion UI
- Install the Deforum extension from this branch.
- Relaunch Auto1111 – You should now see a
Parseq
tab under theDeforum
extension:
This is mostly a legacy approach: Deforum is by far a more powerful animation back-end. Use this approach if you don't want to use Deforum for some reason, and would prefer to use Parseq's own back-end integration with A1111.
- Have a working installation of Automatic1111's Stable Diffusion UI
- Install ffmpeg
- Ensure A1111 can resolve the python library
ffmpeg-python
:- On Windows: edit
requirements_versions.txt
in the top level directory of A111, and add the line:ffmpeg-python==0.2.0
. - On Mac/Linux: edit
requirements.txt
in the top level directory of A111, and add the line:ffmpeg-python
.
- On Windows: edit
- From this repository, copy
scripts/parseq_script.py
andscripts/parseq_core.py
to the/scripts
subdirectory in your SD webui installation. - Restart the webui (or do a full Gradio reload from the settings screen). You should now see
SD Parseq <version>
as a script available in the img2img section:
Here are some examples of what you can do with this. Most of these were generated at 20fps then smoothed to 60fps with ffmpeg minterpolate.
- Img2img loopback using Deforum, with fluctiations on many parameters, and sync'ed to audio :
20221031165546.mp4-audio.mp4.30fps-smooth-fade.mp4
- Vid2vid using the Parseq script backend, with fluctuations on many different params to attempt to synchronise param changes and image movement to music. The input audio is an excerpt of The Prodigy - Smack My Bitch Up (Noisia Remix), and the original input video was generated with Butterchurn. Includes a side-by-side comparison of the original input video, the "dry run" video (which includes all pre-processing but no Stable Diffusion), and the final output:
prod-hur.mp4.60fps-dupe.mp4-prod-hurd-dr-20221027-193717.mp4.60fps-dupe.mp4-prod-hurd-dr-20221027-184506.mp4.60fps-smooth.mp4-audio.mp4-hstacked.mp4
- Loopback using Deforum video where we oscillate between a few famous faces with some 3d movement and occasional denoising spikes to reset the context:
20221029004637.mp4-60fps-smooth.mp4
- Go to https://sd-parseq.web.app/ or https://sd-parseq.web.app/deforum (or run the UI yourself from this repo with
npm start
) - Edit the table at the top to specify your keyframes and parameter values at those keyframes. See below for more information about what you can do here.
- Hit
Render
to generate the JSON config for every frame. - Copy the contents of the textbox at the bottom
- Head to the SD web UI, go to the img2img tab and select the SD Parseq script OR the Deforum tab and then the Parseq tab.
- Paste in the JSON blob you copied in step 1.
- Fiddle with any other settings you want to tweak.
- Click generate.
- UI can get sluggish with 1000s of frames. Lots of room for optimisation.
- The script deliberately overrides/ignores various settings that are defined outside of the script's parameters, including: seed, denoise strength, denoise strength scale factor, color correction, output path, etc... This is intentional but may be a source of confusion.
- Does not yet support batches. Only 1 output is ever generated per run. Batch size and batch count are ignored.
- Does not yet add noise in the blank areas created when zooming out or rotating.
- Chokes on .mov inputs because of a failure to get the total frame count. Seems to work with mp4 (so you just need to preprocess with ffmpeg).
- Rotation and zoom params have a very different impact on loopback. For example, if you linearly interpolate z-rotation from 0 to 360 over 36 frames, with vid2vid you'll get a single full rotation (10deg per frame), whereas with loopback you'll get an accelarating rotation because each looped-back input frame is already rotated.
- A seed value of -1 will be re-evaluated as a random value on every frame. If you want a fixed random seed on all frames, pick a number and lock it in over all your frames in Parseq.
Parseq's main feature is advanced control over parameter values, with interesting interpolation features.
This all happens in the grid. Start by selecting the values you want to work with in the "fields to display" drop-down on the right. In this example, we'll use denoise
, prompt_1_weight
, and prompt_2_weight
:
Next we'll set the number of frames we want to work with by setting the frame number of the last row. We'll set it to 101 frames. You can always change this later. Tip: if you want to match a frame count from an input video, you can count the video's frames quickly from the CLI with ffmpeg's ffprobe -v error -select_streams v:0 -count_frames -show_entries stream=nb_read_frames -print_format csv <input_video.mp4>
.
Now we'll add some keyframes. We'll set them at frames 25, 50 and 75. We can always change them later or add more.
(Note that the first and last frames MUST have values for all fields. Rendering will fail if you remove any because start and end values are required for interpolation.)
In this video, we'd like prompt 1 to start off weak, become strong in the middle of the video, and then become weak again. Easy! Put in some values for prompt_1_weight and hit render. You'll see it interpolates linearly by default, and if a value is empty in a keyframe we interpolate straight through it.
You might be wondering what the arrow (➟) columns are next to the value columns. These are the interpolation columns, and they let you specify how the value should "travel" from this point onwards. The default is linear interpolation, but you override this with S
for Step, C
for cubic, and P
for Polinomial. Let's give it a go:
You can also switch interpolation part way through.
But that's not all! Let's say you want to make something happen rhythmically, such as synchronising prompt strength to the beat of a song. Adding keyframe for each beat would be a pain the arse. Instead, you can use oscillators. Here, we enter sin(0.5, 0, 50, 0.5)
to make prompt 2's weight oscillate along a sine wave with y offset 0.5 and 0 phase shift, with a period of 50 frames and an amplitude of 0.5:
You can experiment with other oscillators such as tri
for a triangle wave, sq
for a square wave, saw
for a sawtooth wave and pulse
for a pulsewave. See below for more information.
Parseq also supports simple expressions so you can combine oscillators and even mix them with the interpolation values, as well as if/else statements:
TODO: descriptions and examples.
All functions can be called either with unnamed args (e.g. sin(10)
) or named args (e.g. sin(period=10, amplitude=2)
). Most arguments have long and short names.
function | description | example |
---|---|---|
sin() |
Sine wave oscillator | |
sq() |
Square wave oscillator | |
tri() |
Triangle wave oscillator | |
saw() |
Sawtooth wave oscillator | |
pulse() |
Pulse wave oscillator | |
bez() |
Return a point on a Bezier curve between previous and next keyframe. Arguments are the same as https://cubic-bezier.com/ . If none specified, defaults to bez(0.5,0,0.5,1) |
|
min() |
Return the minimum of 2 argument | |
max() |
Return the maximum of 2 argument | |
abs() |
Return the asolute value of the argument |
Units can be used to modify numbers representing frame ranges to match second of beat offsets calculated using the FPS and BPM values. This is particularly useful when specifying the period of an oscillator.
unit | description | example |
---|---|---|
f |
(default) frames | |
s |
seconds | |
b |
beats |
if expression | description | example |
---|---|---|
if <cond> <consequent> else <alt> |
operator | description | example |
---|---|---|
+ |
||
- |
||
* |
||
/ |
||
% |
||
!= |
||
== |
||
< |
||
<= |
||
>= |
||
> |
||
and |
||
or |
TODO: list & describe keyframable parameters. TODO: describe purpose of Delta values
Only relevant if you are not using Deforum. I recommend using Deforum. :)
- To process an input video, simply specify the path of the input video in the SD UI before hitting generate.
- To loopback over the input image loaded into SD's img2img UI, leave the input video field blank.
- You can approximate do txt2vid by pinning the denoise strength param to 1, which means input images will be ignored entirely. It's not really the same as txt2vid though (frames are not identical to what you'd get with txt2img).
You can control the value of the following SD parameters:
- Seed: you can even decimal seed values, which will translate to an adjacent subseed with subseed strength proportional to the decimal part.
- Prompt weights: you can specify up to 4 prompts, and control the weight of each one, allowing you to morph between them.
- Scale
- Denoising strength
Values specified in the main SD GUI for the above parameters will be ignored in favour of those submitted through Parseq.
In addition to SD parameters, Parseq also allows you to control the following pre-processing steps on each input:
- Pan & Zoom
- pseudo-3d rotation (on x, y and z axes)
- Historical frame blending: choose how many previously generated frames should be blended into the input, and with what decay.
You can specify color correction window size and slide rate as specified in https://github.com/rewbs/stable-diffusion-loopback-color-correction-script, and optionally force the input frame to always be included in the target histogram. Only recommended for loopback. Set window size to 0 for vid2vid (this is the default).
Dump a video without applying Stable Diffusion. This is very valuable to debug and to confirm your param sequence is synchronised in the way you want. You can also overlay text to see the parameter values at each frame (see examples above for what that looks like )
The script applies processing in the following order:
- Retrieve frame from video or from previous iteration if doing loopback.
- Resize the input frame to match the desired output.
- Blend in historical frames.
- Apply zoom, pan & 3d rotation.
- Apply color correction.
- Feed into SD.
- Save video frame and optionally the standalone image.
Coming soon?
- To run the Parseq UI locally,
npm start
. - To develop the python script independently of Stable Diffusion, take a look at
parseq_test.py
.
This script includes ideas and code sourced from many other scripts. Thanks in particular to the following sources of inspiration:
- Everyone behind Deforum: https://github.com/deforum-art/
- Filarus for their vid2vid script: https://github.com/Filarius/stable-diffusion-webui/blob/master/scripts/vid2vid.py .
- Animator-Anon for their animation script: https://github.com/Animator-Anon/Animator/blob/main/animation.py . I picked up some good ideas from this.
- Yownas for their seed travelling script: https://github.com/yownas/seed_travel . sd-parsec can only travel between consecutive seeds so only offers a fraction of the possible seed variations that Yownas's script does.
- feffy380 for the prompt-morph script https://github.com/feffy380/prompt-morph
- eborboihuc for the clear implementation of 3d rotations using
cv2.warpPerspective()
: https://github.com/eborboihuc/rotate_3d/blob/master/image_transformer.py