DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Website | Paper

3DV 2025 (Oral)

Jeff Tan, Donglai Xiang, Shubham Tulsiani, Deva Ramanan, Gengshan Yang

About

DressRecon is a method for freeform 4D human reconstruction, with support for dynamic clothing and human-object interactions. Given a monocular video as input, it reconstructs a time-consistent body model, including shape, appearance, articulation of body+clothing, and 3D tracks. The software is licensed under the MIT license.

Release Plan

Training code
Data preprocessing scripts
Pretrained checkpoints

Installation

Clone DressRecon

git clone https://github.com/jefftan969/dressrecon
cd dressrecon

Create the environment

conda create -y -n dressrecon -c conda-forge python=3.9
conda activate dressrecon
pip install torch==2.4.1
conda install -y -c conda-forge absl-py numpy==1.24.4 tqdm trimesh tensorboard opencv scipy scikit-image matplotlib urdfpy networkx=3 einops imageio-ffmpeg pyrender open3d
pip install pysdf geomloss
pip install -e .
# (Optional) Visualization dependencies
pip install viser

Install third-party libraries

# CUDA kernels for fast dual-quaternion skinning
pip install -e lab4d/third_party/dqtorch
# CUDA kernels for 3D Gaussian refinement
pip install -e lab4d/diffgs/third_party/simple-knn
pip install git+https://github.com/gengshan-y/gsplat-dev.git

Data

We provide two ways to obtain data: download our preprocessed data (dna-0121_02.zip), or process your own data following the instructions (coming soon!).

Expand to download preprocessed data for sequences in the paper:

Each sequence is about 1.7 GB compressed and 2.3GB uncompressed.

To unzip preprocessed data:

mkdir database/processed
cd database/processed
unzip {path_to_downloaded_zip}
cd ../..

Demo

This example shows how to reconstruct a human from a monocular video. To begin, download preprocessed data above or process your own videos.

Training neural fields

To optimize a body model given an input monocular video:

python lab4d/train.py --num_rounds 240 --imgs_per_gpu 96 --seqname {data_sequence_name} --logname {name_of_this_experiment}

On a 4090 GPU, 240 optimization rounds should take ~8-9 hours. Checkpoints are saved to logdir/{seqname}-{logname}. For faster experiments, you can pass --num_rounds 40 to train a lower-quality model that's not fully converged yet.

The training command above assumes 24GB of GPU memory. Expand if you have 10GB GPU memory:

python lab4d/train.py --num_rounds 240 --imgs_per_gpu 32 --grad_accum 3 --seqname {data_sequence_name} --logname {name_of_this_experiment}

Expand for a description of checkpoint contents:

logdir/{seqname}-{logname}
  - ckpt_*.pth         => (Saved model checkpoints)
  - metadata.pth       => (Saved dataset metadata)
  - opts.log           => (Command-line options)
  - params.txt         => (Learning rates for each optimizable parameter)
  - uncertainty/*.npy  => (Per-pixel uncertainty cache for weighted pixel sampling during training)
  - *-fg-gauss.ply     => (Body Gaussians over all optimization iterations)
  - *-fg-proxy.ply     => (Body shape and cameras over all optimization iterations)
  - *-fg-sdf.ply       => (Deformation fields range of influence over all optimization iterations)

Exporting meshes

To extract time-consistent meshes, and render the shape and body+clothing Gaussians:

python lab4d/export.py --flagfile=logdir/{seqname}-{logname}/opts.log

Results are saved to logdir/{seqname}-{logname}/export_0000.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - export_0000
      - render-shape-*.mp4     => (Rendered time-consistent body shapes)
      - render-boneonly-*.mp4  => (Rendered body+clothing Gaussians)
      - render-bone-*.mp4      => (Body+clothing Gaussians, overlaid on top of body shape)
      - fg-mesh.ply            => (Canonical shape exported as a mesh)
      - camera.json            => (Saved camera intrinsics)
      - fg
          - bone/*.ply         => (Time-varying body+clothing Gaussians, exported as meshes)
          - mesh/*.ply         => (Time-varying body shape, exported as time-consistent meshes)
          - motion.json        => (Saved camera poses and time-varying articulations)
  - renderings_proxy
      - fg.mp4                 => (Birds-eye-view of cameras and body shape over all optimization iterations)

Expand for scripts to visualize the canonical shape, deformation by body Gaussians only, or deformation by clothing Gaussians only:

python lab4d/export.py --flag canonical --flagfile=logdir/{seqname}-{logname}/opts.log
python lab4d/export.py --flag body_only --flagfile=logdir/{seqname}-{logname}/opts.log
python lab4d/export.py --flag cloth_only --flagfile=logdir/{seqname}-{logname}/opts.log

Rendering neural fields

To render RGB, normals, masks, and the other modalities described below:

python lab4d/render.py --flagfile=logdir/{seqname}-{logname}/opts.log

On a 4090 GPU, rendering each frame at 512x512 resolution should take ~20 seconds. Results are saved to logdir/{seqname}-{logname}/renderings_0000. For faster rendering, you can render every N-th frame by passing --stride <N> above.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - depth.mp4    => (Rendered depth, colorized as RGB)
          - feature.mp4  => (Rendered features)
          - mask.mp4     => (Rendered mask)
          - normal.mp4   => (Rendered normal)
          - rgb.mp4      => (Rendered RGB)

Expand to describe additional videos that are rendered for debugging purposes:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - eikonal.mp4     => (Rendered magnitude of eikonal loss)
          - gauss_mask.mp4  => (Rendered silhouette of deformation field)
          - ref_*.mp4       => (Rendered input signals, after cropping to tight bounding box and reshaping)
          - sdf.mp4         => (Rendered magnitude of signed distance field)
          - vis.mp4         => (Rendered visibility field)
          - xyz.mp4         => (Rendered world-frame canonical XYZ coordinates)
          - xyz_cam.mp4     => (Rendered camera-frame XYZ coordinates)
          - xyz_t.mp4       => (Rendered world-frame time-t XYZ coordinates)

3D Gaussian refinement

Training refined 3D Gaussian model

This step requires a pretrained model from the previous section, which we assume is located at logdir/{seqname}-{logname}. To run refinement with 3D Gaussians:

bash scripts/train_diffgs_refine.sh {seqname} {logname}

On a 4090 GPU, 240 optimization rounds should take ~8-9 hours. Checkpoints are saved to logdir/{seqname}-diffgs-{logname}. For faster experiments, you can use --num_rounds 40 to train a lower-quality model that's not fully converged yet.

The training script above assumes 24GB of GPU memory. Expand if you have 10GB GPU memory:

bash scripts/train_diffgs_refine.sh {seqname} {logname} --imgs_per_gpu 4 --grad_accum 4

Expand for a description of checkpoint contents:

logdir/{seqname}-{logname}
  - ckpt_*.pth       => (Saved model checkpoints)
  - opts.log         => (Command-line options)
  - params.txt       => (Learning rates for each optimizable parameter)
  - *-all-gauss.ply  => (Body Gaussians over all optimization iterations)
  - *-all-proxy.ply  => (3D Gaussians and cameras over all optimization iterations)

Exporting 3D Gaussians

To produce mesh renderings of the dynamic 3D Gaussians:

python lab4d/diffgs/export.py --flagfile=logdir/{seqname}-{logname}/opts.log

Results are saved to logdir/{seqname}-{logname}/export_0000.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - export_0000
      - fg
          - mesh/*.ply      => (Dynamic 3D Gaussians, exported as meshes)
          - motion.json     => (Saved camera poses and time-varying articulations)
      - camera.json         => (Saved camera intrinsics)
      - fg-mesh.ply         => (Canonical 3D Gaussians)
      - render-shape-*.mp4  => (Mesh-rendered dynamic 3D Gaussians)
  - renderings_proxy
      - all.mp4             => (Birds-eye-view of cameras and body shape over all optimization iterations)

Rendering 3D Gaussians

To render RGB, normals, masks, and the other modalities described below:

python lab4d/diffgs/render.py --flagfile=logdir/{seqname}-{logname}/opts.log

Results are saved to logdir/{seqname}-{logname}/renderings_0000. For faster rendering, you can render every N-th frame by passing --stride <N> above.

The output directory structure is as follows:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - depth.mp4    => (Rendered depth, colorized as RGB)
          - feature.mp4  => (Rendered features)
          - alpha.mp4    => (Rendered mask)
          - rgb.mp4      => (Rendered RGB)

Expand to describe additional videos that are rendered for debugging purposes:

logdir/{seqname}-{logname}
  - renderings_0000
      - ref
          - ref_*.mp4  => (Rendered input signals, after cropping to tight bounding box and reshaping)
          - xyz.mp4    => (Rendered world-frame canonical XYZ coordinates)

Acknowledgement

Our codebase is built upon Lab4D, thanks for building a comprehensive 4D reconstruction framework!
Our pre-processing pipeline is built upon the following open-sourced repos:
- Human-specific priors: HMR2.0, Sapiens
- Features & correspondence: DINOv2, VCNPlus
- Segmentation: Track-Anything, Grounding-DINO

Bibtex

@inproceedings{tan2025dressrecon,
  title={DressRecon: Freeform 4D Human Reconstruction from Monocular Video},
  author={Tan, Jeff and Xiang, Donglai and Tulsiani, Shubham and Ramanan, Deva and Yang, Gengshan},
  booktitle={3DV},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
database		database
lab4d		lab4d
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Website | Paper

About

Release Plan

Installation

Data

Demo

Training neural fields

Exporting meshes

Rendering neural fields

3D Gaussian refinement

Training refined 3D Gaussian model

Exporting 3D Gaussians

Rendering 3D Gaussians

Acknowledgement

Bibtex

About

Releases

Packages

Languages

License

jefftan969/dressrecon

Folders and files

Latest commit

History

Repository files navigation

DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Website | Paper

About

Release Plan

Installation

Data

Demo

Training neural fields

Exporting meshes

Rendering neural fields

3D Gaussian refinement

Training refined 3D Gaussian model

Exporting 3D Gaussians

Rendering 3D Gaussians

Acknowledgement

Bibtex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages