Skip to content

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

License

Notifications You must be signed in to change notification settings

jefftan969/dressrecon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

3DV 2025 (Oral)

Jeff Tan, Donglai Xiang, Shubham Tulsiani, Deva Ramanan, Gengshan Yang

Teaser video

About

DressRecon is a method for freeform 4D human reconstruction, with support for dynamic clothing and human-object interactions. Given a monocular video as input, it reconstructs a time-consistent body model, including shape, appearance, articulation of body+clothing, and 3D tracks. The software is licensed under the MIT license.

Release Plan

  • Training code
  • Data preprocessing scripts
  • Pretrained checkpoints

Installation

  1. Clone DressRecon
git clone https://github.com/jefftan969/dressrecon
cd dressrecon
  1. Create the environment
conda create -y -n dressrecon -c conda-forge python=3.9
conda activate dressrecon
pip install torch==2.4.1
conda install -y -c conda-forge absl-py numpy==1.24.4 tqdm trimesh tensorboard opencv scipy scikit-image matplotlib urdfpy networkx=3 einops imageio-ffmpeg pyrender open3d
pip install pysdf geomloss
pip install -e .
# (Optional) Visualization dependencies
pip install viser
  1. Install third-party libraries
# CUDA kernels for fast dual-quaternion skinning
pip install -e lab4d/third_party/dqtorch
# CUDA kernels for 3D Gaussian refinement
pip install -e lab4d/diffgs/third_party/simple-knn
pip install git+https://github.com/gengshan-y/gsplat-dev.git

Data

We provide two ways to obtain data: download our preprocessed data (dna-0121_02.zip), or process your own data following the instructions (coming soon!).

Expand to download preprocessed data for sequences in the paper:

Each sequence is about 1.7 GB compressed and 2.3GB uncompressed.

To unzip preprocessed data:

mkdir database/processed
cd database/processed
unzip {path_to_downloaded_zip}
cd ../..

Demo

This example shows how to reconstruct a human from a monocular video. To begin, download preprocessed data above or process your own videos.

Training neural fields

To optimize a body model given an input monocular video:

python lab4d/train.py --num_rounds 240 --imgs_per_gpu 96 --seqname {data_sequence_name} --logname {name_of_this_experiment}

On a 4090 GPU, 240 optimization rounds should take ~8-9 hours. Checkpoints are saved to logdir/{seqname}-{logname}. For faster experiments, you can pass --num_rounds 40 to train a lower-quality model that's not fully converged yet.

The training command above assumes 24GB of GPU memory. Expand if you have 10GB GPU memory:

python lab4d/train.py --num_rounds 240 --imgs_per_gpu 32 --grad_accum 3 --seqname {data_sequence_name} --logname {name_of_this_experiment}
Expand for a description of checkpoint contents:

logdir/{seqname}-{logname}
  - ckpt_*.pth         => (Saved model checkpoints)
  - metadata.pth       => (Saved dataset metadata)
  - opts.log           => (Command-line options)
  - params.txt         => (Learning rates for each optimizable parameter)
  - uncertainty/*.npy  => (Per-pixel uncertainty cache for weighted pixel sampling during training)
  - *-fg-gauss.ply     => (Body Gaussians over all optimization iterations)
  - *-fg-proxy.ply     => (Body shape and cameras over all optimization iterations)
  - *-fg-sdf.ply       => (Deformation fields range of influence over all optimization iterations)

Exporting meshes

To extract time-consistent meshes, and render the shape and body+clothing Gaussians:

python lab4d/export.py --flagfile=logdir/{seqname}-{logname}/opts.log

Results are saved to logdir/{seqname}-{logname}/export_0000.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - export_0000
      - render-shape-*.mp4     => (Rendered time-consistent body shapes)
      - render-boneonly-*.mp4  => (Rendered body+clothing Gaussians)
      - render-bone-*.mp4      => (Body+clothing Gaussians, overlaid on top of body shape)
      - fg-mesh.ply            => (Canonical shape exported as a mesh)
      - camera.json            => (Saved camera intrinsics)
      - fg
          - bone/*.ply         => (Time-varying body+clothing Gaussians, exported as meshes)
          - mesh/*.ply         => (Time-varying body shape, exported as time-consistent meshes)
          - motion.json        => (Saved camera poses and time-varying articulations)
  - renderings_proxy
      - fg.mp4                 => (Birds-eye-view of cameras and body shape over all optimization iterations)
Expand for scripts to visualize the canonical shape, deformation by body Gaussians only, or deformation by clothing Gaussians only:

python lab4d/export.py --flag canonical --flagfile=logdir/{seqname}-{logname}/opts.log
python lab4d/export.py --flag body_only --flagfile=logdir/{seqname}-{logname}/opts.log
python lab4d/export.py --flag cloth_only --flagfile=logdir/{seqname}-{logname}/opts.log

Rendering neural fields

To render RGB, normals, masks, and the other modalities described below:

python lab4d/render.py --flagfile=logdir/{seqname}-{logname}/opts.log

On a 4090 GPU, rendering each frame at 512x512 resolution should take ~20 seconds. Results are saved to logdir/{seqname}-{logname}/renderings_0000. For faster rendering, you can render every N-th frame by passing --stride <N> above.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - depth.mp4    => (Rendered depth, colorized as RGB)
          - feature.mp4  => (Rendered features)
          - mask.mp4     => (Rendered mask)
          - normal.mp4   => (Rendered normal)
          - rgb.mp4      => (Rendered RGB)
Expand to describe additional videos that are rendered for debugging purposes:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - eikonal.mp4     => (Rendered magnitude of eikonal loss)
          - gauss_mask.mp4  => (Rendered silhouette of deformation field)
          - ref_*.mp4       => (Rendered input signals, after cropping to tight bounding box and reshaping)
          - sdf.mp4         => (Rendered magnitude of signed distance field)
          - vis.mp4         => (Rendered visibility field)
          - xyz.mp4         => (Rendered world-frame canonical XYZ coordinates)
          - xyz_cam.mp4     => (Rendered camera-frame XYZ coordinates)
          - xyz_t.mp4       => (Rendered world-frame time-t XYZ coordinates)

3D Gaussian refinement

Training refined 3D Gaussian model

This step requires a pretrained model from the previous section, which we assume is located at logdir/{seqname}-{logname}. To run refinement with 3D Gaussians:

bash scripts/train_diffgs_refine.sh {seqname} {logname}

On a 4090 GPU, 240 optimization rounds should take ~8-9 hours. Checkpoints are saved to logdir/{seqname}-diffgs-{logname}. For faster experiments, you can use --num_rounds 40 to train a lower-quality model that's not fully converged yet.

The training script above assumes 24GB of GPU memory. Expand if you have 10GB GPU memory:

bash scripts/train_diffgs_refine.sh {seqname} {logname} --imgs_per_gpu 4 --grad_accum 4
Expand for a description of checkpoint contents:

logdir/{seqname}-{logname}
  - ckpt_*.pth       => (Saved model checkpoints)
  - opts.log         => (Command-line options)
  - params.txt       => (Learning rates for each optimizable parameter)
  - *-all-gauss.ply  => (Body Gaussians over all optimization iterations)
  - *-all-proxy.ply  => (3D Gaussians and cameras over all optimization iterations)

Exporting 3D Gaussians

To produce mesh renderings of the dynamic 3D Gaussians:

python lab4d/diffgs/export.py --flagfile=logdir/{seqname}-{logname}/opts.log

Results are saved to logdir/{seqname}-{logname}/export_0000.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - export_0000
      - fg
          - mesh/*.ply      => (Dynamic 3D Gaussians, exported as meshes)
          - motion.json     => (Saved camera poses and time-varying articulations)
      - camera.json         => (Saved camera intrinsics)
      - fg-mesh.ply         => (Canonical 3D Gaussians)
      - render-shape-*.mp4  => (Mesh-rendered dynamic 3D Gaussians)
  - renderings_proxy
      - all.mp4             => (Birds-eye-view of cameras and body shape over all optimization iterations)

Rendering 3D Gaussians

To render RGB, normals, masks, and the other modalities described below:

python lab4d/diffgs/render.py --flagfile=logdir/{seqname}-{logname}/opts.log

Results are saved to logdir/{seqname}-{logname}/renderings_0000. For faster rendering, you can render every N-th frame by passing --stride <N> above.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - depth.mp4    => (Rendered depth, colorized as RGB)
          - feature.mp4  => (Rendered features)
          - alpha.mp4    => (Rendered mask)
          - rgb.mp4      => (Rendered RGB)
Expand to describe additional videos that are rendered for debugging purposes:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - ref_*.mp4  => (Rendered input signals, after cropping to tight bounding box and reshaping)
          - xyz.mp4    => (Rendered world-frame canonical XYZ coordinates)

Acknowledgement

  • Our codebase is built upon Lab4D, thanks for building a comprehensive 4D reconstruction framework!
  • Our pre-processing pipeline is built upon the following open-sourced repos:

Bibtex

@inproceedings{tan2025dressrecon,
  title={DressRecon: Freeform 4D Human Reconstruction from Monocular Video},
  author={Tan, Jeff and Xiang, Donglai and Tulsiani, Shubham and Ramanan, Deva and Yang, Gengshan},
  booktitle={3DV},
  year={2025}
}

About

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published