Accepted to SIGGRAPH 2022 (Journal Track)
A tall and skinny female soldier that is arguing. | A skinny ninja that is raising both arms. | An overweight sumo wrestler that is sitting. | A tall and fat Iron Man that is running. |
This repository contains the official implementation of AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars.
[05/2022] Add a Colab Demo for avatar generation!
[05/2022] Support converting the generated avatar to the animatable FBX format! Go checkout how to use the FBX models. Or checkout the instructions for the conversion codes.
[05/2022] Code release for avatar generation part!
[04/2022] AvatarCLIP is accepted to SIGGRAPH 2022 (Journal Track):partying_face:!
If you find our work useful for your research, please consider citing the paper:
@article{hong2022avatarclip,
title={AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars},
author={Hong, Fangzhou and Zhang, Mingyuan and Pan, Liang and Cai, Zhongang and Yang, Lei and Liu, Ziwei},
journal={ACM Transactions on Graphics (TOG)},
volume={41},
number={4},
pages={1--19},
year={2022},
publisher={ACM New York, NY, USA}
}
Go visit our project page. Go to the section avatar gallery. Pick a model that you like. Click 'Load Model' below. Click 'Download FBX' link at the bottom of the pop-up viewer.
The FBX models are already rigged. Use your motion library to animate it!
To make use of the rich motion library provided by Mixamo, you can also upload the FBX model to Mixamo. The rigging process is completely automatic!
We recommend using anaconda to manage the python environment. The setup commands below are provided for you reference.
git clone https://github.com/hongfz16/AvatarCLIP.git
cd AvatarCLIP
conda create -n AvatarCLIP python=3.7
conda activate AvatarCLIP
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt
Other than the above steps, you should also install neural_renderer following its instructions. Before compiling neural_renderer (or after compiling should also be fine), remember to add the following three lines to neural_renderer/perspective.py
after line 19.
x[z<=0] = 0
y[z<=0] = 0
z[z<=0] = 0
This quick fix is for a rendering issue where objects behide the camera will also be rendered. Be careful when using this fixed version of neural_renderer on your other projects, because this fix will cause the rendering process not differentiable.
Register and download SMPL models here. Put the downloaded models in the folder smpl_models
. The folder structure should look like
./
├── ...
└── smpl_models/
├── smpl/
├── SMPL_FEMALE.pkl
├── SMPL_MALE.pkl
└── SMPL_NEUTRAL.pkl
This download is only for coarse shape generation. You can skip if you only want to use other parts. Download the pretrained weights and other required data here. Put them in the folder AvatarGen
so that the folder structure should look like
./
├── ...
└── AvatarGen/
└── ShapeGen/
└── data/
├── codebook.pth
├── model_VAE_16.pth
├── nongrey_male_0110.jpg
├── smpl_uv.mtl
└── smpl_uv.obj
Folder AvatarGen/ShapeGen
contains codes for this part. Run the follow command to generate the coarse shape corresponding to the shape description 'a strong man'. We recommend to use the prompt augmentation 'a 3d rendering of xxx in unreal engine' for better results. The generated coarse body mesh will be stored under AvatarGen/ShapeGen/output/coarse_shape
.
python main.py --target_txt 'a 3d rendering of a strong man in unreal engine'
Then we need to render the mesh for initialization of the implicit avatar representation. Use the following command for rendering.
python render.py --coarse_shape_obj output/coarse_shape/a_3d_rendering_of_a_strong_man_in_unreal_engine.obj --output_folder ${RENDER_FOLDER}
Note that all the codes are tested on NVIDIA V100 (32GB memory). Therefore, in order to run on GPUs with lower memory, please try to scale down the network or tune down max_ray_num
in the config files. You can refer to confs/examples_small/example.conf
or our colab demo for a scale-down version of AvatarCLIP.
Folder AvatarGen/AppearanceGen
contains codes for this part. We provide data, pretrained model and scripts to perform shape sculpting and texture generation on a zero-beta body (mean shape defined by SMPL). We provide many example scripts under AvatarGen/AppearanceGen/confs/examples
. For example, if we want to generate 'Abraham Lincoln', which is defined in the config file confs/examples/abrahamlincoln.conf
, use the following command.
python main.py --mode train_clip --conf confs/examples/abrahamlincoln.conf
Results will be stored in AvatarCLIP/AvatarGen/AppearanceGen/exp/smpl/examples/abrahamlincoln
.
If you wish to perform shape sculpting and texture generation on the previously generated coarse shape. We also provide example config files in confs/base_models/astrongman.conf
confs/astrongman/*.conf
. Two steps of optimization are required as follows.
# Initilization of the implicit avatar
python main.py --mode train --conf confs/base_models/astrongman.conf
# Shape sculpting and texture generation on the initialized implicit avatar
python main.py --mode train_clip --conf confs/astrongman/hulk.conf
To extract meshes from the generated implicit avatar, one may use the following command.
python main.py --mode validate_mesh --conf confs/examples/abrahamlincoln.conf
The final high resolution mesh will be stored as AvatarCLIP/AvatarGen/AppearanceGen/exp/smpl/examples/abrahamlincoln/meshes/00030000.ply
See the instructions here.
TBA
Distributed under the MIT License. See LICENSE
for more information.
There are lots of wonderful works that inspired our work or came around the same time as ours.
Dream Fields enables zero-shot text-driven general 3D object generation using CLIP and NeRF.
Text2Mesh proposes to edit a template mesh by predicting offsets and colors per vertex using CLIP and differentiable rendering.
CLIP-NeRF can manipulate 3D objects represented by NeRF with natural languages or examplar images by leveraging CLIP.
Text to Mesh facilitates zero-shot text-driven general mesh generation by deforming from a sphere mesh guided by CLIP.
This study is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
We thank the following repositories for their contributions in our implementation: NeuS, smplx, vposer, Smplx2FBX.