Oh My Assistant is a generative AI service designed to assist comics and illustration artists by learning each artist's unique drawing style to convert realistic images into their style and translate characters' poses to a desired pose.
- The service is deployed on the following link (As of August 2024, the link may not work).
background.mp4
pose.mp4
The background generation service selects either the Img2Img or Txt2Img model of Stable Diffusion depending on whether an source image is provided by the user. Then, it follows these steps to create webtoon-style images.
- Noise Initialization
- If an original image is provided, noise is generated by gradually adding noise to the original image. If no original image is given, noise is generated as a completely random tensor.
- Inject Condition
- Our model takes prompts as input and uses the text encoder of the CLIP model to extract embedding vectors.
- Denoising Process
- The noise generated earlier is used as input to progressively remove noise and produce a high-resolution image. A cross-attention mechanism is employed to guide the noise removal process according to the given features.
In summary, by inputting an original image or prompt into a model trained in webtoon style, the model generates and removes noise based on the input to produce the background image required by the artist.
Character Pose Transfer is divided into 2 steps: Pose Estimation and Pose Transfer.
Pose Estimation model uses DWPose. DWPose uses a detector to find landmarks in the input image and a classifier to categorize body keypoints, predicting a total of 133 keypoints (COCO Whole Body) for the face, hands, arms, and legs.
Pose Transfer uses a diffusion model based on AnimateAnyone. The target pose image extracted from the previous estimation and the character image provided by the user are input and embedded using a VAE and CLIP Encoder. Denoising UNet and ReferenceNet are used to generate the pose-changed character image from noise, and the VAE Decoder is employed to decode the image and produce the final result.