如果您熟悉中文,可以阅读中文版本的README。
FaceChain is a deep-learning toolchain for generating your Digital-Twin. With a minimum of 1 portrait-photo, you can create a Digital-Twin of your own and start generating personal photos in different settings (work photos as starter!). You may train your Digital-Twin model and generate photos via FaceChain's Python scripts, or via the familiar Gradio interface. You can also experience FaceChain directly with our ModelScope Studio.
FaceChain is powered by ModelScope.
The following are the environment dependencies that have been verified:
- python: py3.8, py3.10
- pytorch: torch2.0.0, torch2.0.1
- tensorflow: 2.7.0, tensorflow-cpu
- CUDA: 11.7
- CUDNN: 8+
- OS: Ubuntu 20.04, CentOS 7.9
- GPU: Nvidia-A10 24G
- GPU: About 19G
- Disk: About 50GB
The following installation methods are supported:
-
ModelScope notebook【recommended】 The ModelScope notebook has a free tier that allows you to run the FaceChain application, refer to ModelScope Notebook
In addition to ModelScope notebook and ECS, I would suggest that we add that user may also start DSW instance with the option of ModelScope (GPU) image, to create a ready-to-use envrionment.
# Step1
我的notebook -> PAI-DSW -> GPU环境
# Step2
Open the Terminal,clone FaceChain from github:
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git
# Step3
Entry the Notebook cell:
import os
os.chdir('/mnt/workspace/facechain')
print(os.getcwd())
!pip3 install gradio
!python3 app.py
# Step4
click "public URL" or "local URL"
- Docker
If you are familiar with using docker, we recommend to use this way:
# Step1
Prepare the environment with GPU on local or cloud, we recommend to use Alibaba Cloud ECS, refer to: https://www.aliyun.com/product/ecs
# Step2
Download the docker image (for installing docker engine, refer to https://docs.docker.com/engine/install/)
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.0
# Step3
docker images
docker run -it --name facechain -p 7860:7860 --gpus all your_xxx_image_id /bin/bash
(Note: you may need to install the nvidia-container-runtime, refer to https://github.com/NVIDIA/nvidia-container-runtime)
# Step4
Install the gradio in the docker container:
pip3 install gradio
# Step5
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git
cd facechain
python3 app.py
# Step6
Run the app server: click "public URL" --> in the form of: https://xxx.gradio.live
FaceChain supports direct training and inference in the python environment. Run the following command in the cloned folder to start training:
PYTHONPATH=. sh train_lora.sh "ly261666/cv_portrait_model" "v2.0" "film/film" "./imgs" "./processed" "./output"
Parameter meaning:
ly261666/cv_portrait_model: The stable diffusion base model of the ModelScope model hub, which will be used for training, no need to be changed.
v2.0: The version number of this base model, no need to be changed
film/film: This base model may contains multiple subdirectories of different styles, currently we use film/film, no need to be changed
./imgs: This parameter needs to be replaced with the actual value. It means a local file directory that contains the original photos used for training and generation
./processed: The folder of the processed images after preprocessing, this parameter needs to be passed the same value in inference, no need to be changed
./output: The folder where the model weights stored after training, no need to be changed
Wait for 5-20 minutes to complete the training. Users can also adjust other training hyperparameters. The hyperparameters supported by training can be viewed in the file of train_lora.sh
, or the complete hyperparameter list in facechain/train_text_to_image_lora.py
.
When inferring, please edit the code in run_inference.py:
# Fill in the folder of the images after preprocessing above, it should be the same as during training
processed_dir = './processed'
# The number of images to generate in inference
num_generate = 5
# The stable diffusion base model used in training, no need to be changed
base_model = 'ly261666/cv_portrait_model'
# The version number of this base model, no need to be changed
revision = 'v2.0'
# This base model may contains multiple subdirectories of different styles, currently we use film/film, no need to be changed
base_model_sub_dir = 'film/film'
# The folder where the model weights stored after training, it must be the same as during training
train_output_dir = './output'
# Specify a folder to save the generated images, this parameter can be modified as needed
output_dir = './generated'
Then execute:
python run_inference.py
You can find the generated personal digital image photos in the output_dir
.
The ability of the personal portrait model comes from the text generation image function of the Stable Diffusion model. It inputs a piece of text or a series of prompt words and outputs corresponding images. We consider the main factors that affect the generation effect of personal portraits: portrait style information and user character information. For this, we use the style LoRA model trained offline and the face LoRA model trained online to learn the above information. LoRA is a fine-tuning model with fewer trainable parameters. In Stable Diffusion, the information of the input image can be injected into the LoRA model by the way of text generation image training with a small amount of input image. Therefore, the ability of the personal portrait model is divided into training and inference stages. The training stage generates image and text label data for fine-tuning the Stable Diffusion model, and obtains the face LoRA model. The inference stage generates personal portrait images based on the face LoRA model and style LoRA model.
Input: User-uploaded images that contain clear face areas
Output: Face LoRA model
Description: First, we process the user-uploaded images using an image rotation model based on orientation judgment and a face refinement rotation method based on face detection and keypoint models, and obtain images containing forward faces. Next, we use a human body parsing model and a human portrait beautification model to obtain high-quality face training images. Afterwards, we use a face attribute model and a text annotation model, combined with tag post-processing methods, to generate fine-grained labels for training images. Finally, we use the above images and label data to fine-tune the Stable Diffusion model to obtain the face LoRA model.
Input: User-uploaded images in the training phase, preset input prompt words for generating personal portraits
Output: Personal portrait image
Description: First, we fuse the weights of the face LoRA model and style LoRA model into the Stable Diffusion model. Next, we use the text generation image function of the Stable Diffusion model to preliminarily generate personal portrait images based on the preset input prompt words. Then we further improve the face details of the above portrait image using the face fusion model. The template face used for fusion is selected from the training images through the face quality evaluation model. Finally, we use the face recognition model to calculate the similarity between the generated portrait image and the template face, and use this to sort the portrait images, and output the personal portrait image that ranks first as the final output result.
The models used in FaceChain:
[1] Face detection model DamoFD:https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd
[2] Image rotating model, offered in the ModelScope studio
[3] Human parsing model M2FP:https://modelscope.cn/models/damo/cv_resnet101_image-multiple-human-parsing
[4] Skin retouching model ABPN:https://modelscope.cn/models/damo/cv_unet_skin-retouching
[5] Face attribute recognition model FairFace:https://modelscope.cn/models/damo/cv_resnet34_face-attribute-recognition_fairface
[6] DeepDanbooru model:https://github.com/KichangKim/DeepDanbooru
[7] Face quality assessment FQA:https://modelscope.cn/models/damo/cv_manual_face-quality-assessment_fqa
[8] Face fusion model:https://modelscope.cn/models/damo/cv_unet-image-face-fusion_damo
[9] Face recognition model RTS:https://modelscope.cn/models/damo/cv_ir_face-recognition-ood_rts
ModelScope Library provides the foundation for building the model-ecosystem of ModelScope, including the interface and implementation to integrate various models into ModelScope.
This project is licensed under the Apache License (Version 2.0).