Mehwish Ghafoor, Arif Mahmood, Muhammad Bilal
Proposed Dual Transformer Fusion (DTF) architecture takes severly occluded 2D joint positions as input and estimate realistic 3D pose.
The code is developed and tested under the following environment:
-
PyTorch 1.7.1 and Torchvision 0.8.2 following the official instructions.
-
Install dependencies:
pip3 install -r requirements.txt
-
Download the dataset from the Human 3.6M website.
-
Set up the Human3.6M dataset as per the VideoPose3D instructions.
-
Alternatively, download the processed data from here.
${DTF_Occ}/
|-- dataset
| |-- data_3d_h36m.npz
| |-- data_2d_h36m_gt.npz
| |-- data_2d_h36m_cpn_ft_h36m_dbb.npz
You can download pretrained model for Human 3.6M from here.
For MPI-INF-3DHP, we have followed the setting of P-STMO
Training with 351 frames on Human 3.6M
python3 main_h36m.py --frames 351 --batch_size 32
Test
python3 main_h36m.py --test --previous_dir 'checkpoint/351_severe' --frames 351
Video Demo - Human 3.6M
3D Pose Estimations with 16 random occluded joints out of 17 for action ``Eating"
Using proposed DTF
eat_16miss.mp4
Using MHFormer
eat_16_mhformer.mp4
Using STCFormer
eat_16_stcformer.mp4
Using PSTMO