You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: pages/advanced/image-to-image.en-US.mdx
+120
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,127 @@ The second method does not redraw the input image, but uses the input image as p
52
52
</PhotoView>
53
53
</PhotoProvider>
54
54
55
+
Pay attention, there is one difference in the image compared to what was introduced before, the first step is not a Text Encoder but an Encoder. This is easy to understand because not only the textual prompt needs to be input to the model, but the image prompt also needs to be input.
56
+
57
+
If you analogise it with sculpture, this process is similar to you giving the sculptor a text instruction (prompt) and also showing the sculptor a picture for reference, and then the sculptor re-sculpts a statue (output image) according to your instruction and picture.
58
+
59
+
Additionally, how to use the Image data can also affect the effect of the generated image:
60
+
61
+
* If you use the image as a supplement to the text prompt, the generated image will also contain elements from the original image. For example, in the diagram illustrating the principle above, I input a picture with a bridge, river, city wall into the model, and the text prompt did not mention these contents in the picture.
62
+
But the final image contains information from the original image (bridge, river, city wall), and even the style is very similar to the original image. This kind of workflow is generally referred to as the unCLIP model workflow.
63
+
* Another approach is to input only the style of the original image to the model, so the generated image will have the style of the original image but not the content of the original image. This kind of workflow is generally referred to as the Style model workflow.
64
+
55
65
<Subscribe />
56
66
57
67
## Simple img2img workflow
68
+
Since the principle is simple, you should be able to guess how to set up a simple img2img workflow. You might as well try it yourself first, set up a workflow 😎
69
+
70
+
<Tabsitems={['Hint', 'Answer']}>
71
+
<Tabs.Tab>
72
+
* The key is to input the image into KSampler, and KSampler can only input latent images.
73
+
* You can find the option to load images by right-clicking → All Node → image.
74
+
</Tabs.Tab>
75
+
<Tabs.Tab>
76
+
<Steps>
77
+
### Add Load Image Node
78
+
The first step, we start with the Default workflow. Then, based on the existing foundation, add a load image node, which you can find by right-click → All node → image.
79
+
80
+
### Connect Nodes
81
+
As I mentioned in the Hint, the key is to convert the image into a latent space image, which requires VAE Encode. So, you just need to connect Load Image and VAE Encode, and then connect them to KSampler.
82
+
83
+
### Set up KSampler
84
+
Additionally, there are a few small details to note:
85
+
1. The denoise setting in KSampler can be set a little smaller, which can make the generated image more like the input image, the smaller it is, the more similar it is. Comparing to sculpture will make it easier to understand. Before the sculptor (model) sculpts the old statue (input image), it needs to smear some plaster (noise) on it, and then re-sculpt the new statue (output image) according to your instructions. And this denoise option represents how much plaster to smear, if less is smeared, won't it be more like the old statue (input image)?
86
+
2. If the Checkpoint model you use is 1024X1024, it's best for the image you load to also be 1024X1024, and the aspect ratio should also be consistent. In this way, the effect of the generated image will be better.
87
+
<Callouttype="info">
88
+
Note: I used the Dreamshaper model to generate images, so the images you generate may not be the same as mine.
89
+
</Callout>
90
+
<br/>
91
+
<PhotoProvider>
92
+
<PhotoViewsrc="/comfyui-img2img/001.png">
93
+
<imgsrc="/comfyui-img2img/001.png"alt="" />
94
+
</PhotoView>
95
+
</PhotoProvider>
96
+
</Steps>
97
+
</Tabs.Tab>
98
+
</Tabs>
99
+
100
+
## unCLIP model workflow
101
+
102
+
After introducing the redraw, let's talk about the reference. First is the unCLIP model workflow. The previous principle part did not elaborate on how to implement it, let me explain in detail using ComfyUI's workflow.
103
+
104
+
<Steps>
105
+
### Add Load Image Node
106
+
The first step is to start from the Default workflow. Then, based on the existing foundation, add a load image node, which can be found by right-clicking → All Node → Image.
107
+
108
+
### Add CLIP Version Encode Node
109
+
In the second step, we need to input the image into the model, so we need to first encode the image into a vector. So, we need to add a CLIP Version Encode node, which can be found by right-clicking → All Node → Conditioning.
110
+
111
+
Then, we can connect the Load Image node to the CLIP Version Encode node.
112
+
113
+
### Add unCLIPConditioning Node
114
+
Then, we need to fuse the data from the image encoding with the Text Encode data, so we need to add an unCLIPConditioning node. This can be found by right-clicking → All Node → Conditioning.
115
+
116
+
Next, we connect the Positive Prompt to the unCLIPConditioning node, followed by connecting the CLIP Version Encode to the unCLIPConditioning node. Then, the unCLIPConditioning node is connected to the KSampler's positive.
117
+
118
+
Lastly, I'll introduce the two parameters in the nodes:
119
+
* strength: This sets the influence strength of the prompt. The larger the number, the closer it is to the description of the prompt. It's somewhat similar to the weight notation you use when writing Prompts in CLIP Text Encode, like (red hat:1.2).
120
+
* noise_augmentation: This mainly represents the closeness between the new image and the old image. 0 means the closest, I usually set it between 0.1 - 0.3.
121
+
122
+
### Replace Load Checkpoint Node with unCLIPCheckpointLoader Node
123
+
The final step, you will notice that there's a point in the CLIP Version Encode node we added in the second step that hasn't been connected yet. It reads Clip Vision, but what it actually needs to connect to is a node called unCLIPCheckpointLoader, which can be found by right-clicking → All Node → Loaders.
124
+
125
+
Then, delete the original Checkpoint and replace it with the unCLIPCheckpointLoader node, and connect it to the CLIP Version Encode node. Set its model to the sd21-unclip-h model.
126
+
<Callouttype="warning"emoji="⚠️">
127
+
Note that although this loader is called unCLIPCheckpointLoader, you can't find the folder for this node in the ComfyUI project folder. In fact, it shares a folder with the Checkpoint node. So, you need to put the unCLIP model files in the Checkpoint.
128
+
</Callout>
129
+
<Calloutemoji="💡">
130
+
When you download the unCLIP model, you can download the h or l version. The h version of the model will produce clearer images, but it will be slower to generate.
131
+
</Callout>
132
+
133
+
Also, adjust the Empty Latent Image to 768x768, because the unClip model we use is based on SD2.1 training. The final workflow should look like this:
134
+
<br/>
135
+
<PhotoProvider>
136
+
<PhotoViewsrc="/comfyui-img2img/005.png">
137
+
<imgsrc="/comfyui-img2img/005.png"alt="" />
138
+
</PhotoView>
139
+
</PhotoProvider>
140
+
</Steps>
141
+
142
+
From the workflow, it's not hard to see that essentially, it involves first encoding the image, then encoding the text prompt, finally fusing the two, and inputting them into the model. The images generated in this way will contain elements from the original image.
143
+
144
+
Additionally, you can also try inputting multiple images into the model, and the generated images will contain elements from multiple images. For example, like this:
145
+
146
+
<br/>
147
+
<PhotoProvider>
148
+
<PhotoViewsrc="/comfyui-img2img/006.png">
149
+
<imgsrc="/comfyui-img2img/006.png"alt="" />
150
+
</PhotoView>
151
+
</PhotoProvider>
152
+
153
+
## Style model workflow
154
+
155
+
Another type of redraw involves only using the style of the image and not its content. The implementation is very similar to the unCLIP model workflow, but it requires removing the content of the image and keeping only the style. Let's explain its ComfyUI workflow:
156
+
<Steps>
157
+
### Replace Nodes
158
+
The first step is to continue using the workflow from the previous example, but we need to replace some nodes. The first one is to replace the unCLIPCheckpointLoader with a Checkpoint and change the model to v1.5. Then replace the unCLIPCondtioning with the Apply Style Model node, which you can find by right-clicking → All node → conditioning → style_model.
159
+
160
+
You can then connect these nodes, and you'll find that there are still two endpoints not connected, and we'll move to the next step.
58
161
162
+
### Add Load CLIP Version and Load Style Model Nodes
163
+
These two nodes can be found by right-clicking → All node → loaders. Then connect them to the CLIP Vision Encode node and Apply Style Model respectively. After connecting, let's explain the complete workflow. The whole process is quite easy to understand: input an image, then encode the image, and use Apply Style Model to filter out the Style information from the image, and fuse it with the text prompt and pass it to KSampler.
164
+
165
+
What's different from the unCLIP model workflow is that it only uses style. So the content of the drawing may not be very similar to the original image, but the style will look alike.
166
+
167
+
<Callouttype="warning"emoji="⚠️">
168
+
It’s important to note that this style model can only understand common styles. The effects of classic paintings or images of people will be quite good, but if you import some niche and abstract images, the effect will be very poor.
After you have generated an image using Stable Diffusion, you might be unsatisfied with some parts of the image. For instance, if you were discontented with the character's hair in the picture, you might try to modify the Prompt by adding "red hair" and then regenerate the image. But by using this method, the character's portrait might turn red, and other areas might also change.
8
+
9
+
So how to change only a certain area without affecting other areas?
10
+
11
+
<Callouttype="warning"emoji="⚠️">
12
+
Please download the following model and place the model file in the corresponding folder before officially starting to learn this chapter:
13
+
*[Dreamshaper 8-inpainting](https://civitai.com/models/4384?modelVersionId=131004): Place it in the models/checkpoints folder in ComfyUI.
14
+
</Callout>
15
+
16
+
## Inpainting Workflow
17
+
18
+
Let me explain how to build Inpainting using the following scene as an example. The picture on the left was first generated using the text-to-image function. I was not satisfied with the color of the character's hair, so I used ComfyUI to regenerate the character with red hair based on the original image. Comparing with the original image, the new image did not change in areas other than the hair:
19
+
20
+
<PhotoProvider>
21
+
<PhotoViewsrc="/comfyui-inoutpainting/003.png">
22
+
<imgsrc="/comfyui-inoutpainting/003.png"alt="" />
23
+
</PhotoView>
24
+
</PhotoProvider>
25
+
26
+
To replace a part of the image with something else, the main technique used is Inpainting, and the steps are very similar to text-to-image generation:
27
+
28
+
<Steps>
29
+
### Add VAE Encode (for Inpainting)
30
+
As usual, we start with the default workflow. Click the Load Default button on the right panel to load the default workflow. Then double-click in a blank area, input Inpainting, and add this node. You'll see a configuration item on this node called "grow_mask_by", which I usually set to 6-8. Generally speaking, the larger this value, the better, as the newly generated part of the picture will blend more naturally with the original image.
31
+
32
+
### Add Load Image Node
33
+
After adding the node, load the image you want to modify. Then connect the Load Image node with the VAE Encode (for Inpainting) and connect VAE Encode (for Inpainting) with KSampler.
34
+
35
+
### Smear the Area You Want to Change
36
+
After all lines are connected, right-click on the Load Image node and click Open in MaskEditor in the menu. Once you enter the MaskEditor, you can smear the places you want to change. For example, if I want to change the character's hair in the picture to red, I just need to smear the character's hair in the image. In addition, you can use the Thickness below to change the size of the brush. After completion, click Save to node at the right bottom corner.
37
+
<br/>
38
+
<PhotoProvider>
39
+
<PhotoViewsrc="/comfyui-inoutpainting/001.png">
40
+
<imgsrc="/comfyui-inoutpainting/001.png"alt="" />
41
+
</PhotoView>
42
+
</PhotoProvider>
43
+
44
+
### Switch the Model and Configure
45
+
Next, switch the model in Load Checkpoint to the inpainting model. Here's a tip: it's best if the model used for Inpainting is the same as the one used for the original image. For example, the astronaut image I want to modify was generated using Dreamshaper, so in the Inpainting scenario, I'm using the Dreamshaper Inpainting model.
46
+
47
+
Besides adjusting the model, we also need to set the denoise in KSampler to 0.85. Of course, you can adjust this based on your own testing, the smaller it is, the closer it will be to the original image.
48
+
49
+
### Input Prompt
50
+
Lastly, input the prompt you want to modify on the positive prompt. For example, I want to change the hair color to red, so I enter red hair. Then click Queue Prompt to generate the image:
51
+
<br/>
52
+
<PhotoProvider>
53
+
<PhotoViewsrc="/comfyui-inoutpainting/002.png">
54
+
<imgsrc="/comfyui-inoutpainting/002.png"alt="" />
55
+
</PhotoView>
56
+
</PhotoProvider>
57
+
</Steps>
58
+
59
+
## Outpainting workflow
60
+
61
+
In addition to modifying parts of the content, another common requirement is to enlarge the original image. For example, in the astronaut example above, the astronaut's right hand and lower limbs have not been generated. I would like the AI to continue generating, something like this:
62
+
<br/>
63
+
<PhotoProvider>
64
+
<PhotoViewsrc="/comfyui-inoutpainting/004.png">
65
+
<imgsrc="/comfyui-inoutpainting/004.png"alt="" />
66
+
</PhotoView>
67
+
</PhotoProvider>
68
+
69
+
The steps are similar to Inpainting, so you can continue to use the workflow for Inpainting:
70
+
71
+
<Steps>
72
+
### Add Pad Image for Outpainting Node
73
+
Double click on any blank spot, search and add the Pad Image for Outpainting node. You will see several configurations inside. Left, top, right, and bottom represent how many pixels you want to expand to the left, up, right, and down. The last one, feathering, is used to adjust the smoothness of the edge. If you want the transition to be more natural, you can increase it, but it will cause the picture to have a sense of smearing.
74
+
75
+
### Connect Nodes
76
+
For the second step, the Load Image node needs to be connected to the left end of Pad Image for Outpainting. The right end of the Pad Image for Outpainting then connects with the left end of the VAE Encode (for Inpainting).
77
+
78
+
### Adjust parameters
79
+
In CLIP Text Encode (Prompt), you can input additional information. Besides, from my experience, the grow_mask_by parameter in the VAE Encode (for Inpainting) needs to be slightly increased, I usually set it to be more than 10:
0 commit comments