Training the model #20

dy1ngs0ul · 2020-01-09T03:19:50Z

Hello Nvidia AI-IOT team,

First of all thank you very much for your effort in creating this code. I am Zeyan and currently working on real time pose estimation implementation on Jetson AGX Xavier.
My goal is to use Depths image (from Intel real sense camera) and check whether the depths information could help improve the performance of pose estimation or not.

Before I conduct my experiments. First I wish to train the model to act as an base line for our experiments. From your training script it seems config.json file is required to trained the network. As i wish to follow your parameters for this baseline training. It would be great if you could provide me your conifg file so that I could follow your step and parameters to train your model.

Thanks in advance for your help and support. I will be looking forward for your reply. Please let me know if you have anything to say,

Thanks
Dr. Zeyan Oo

jaybdub · 2020-02-04T18:36:13Z

Hi dy1ngs0ul,

Thanks for reaching out!

You may find the training configuration files in this directory

https://github.com/NVIDIA-AI-IOT/trt_pose/blob/master/tasks/human_pose/experiments/resnet18_baseline_att_224x224_A.json

Please let me know if you have any questions.

Best,
John

dy1ngs0ul · 2020-02-26T06:02:34Z

Thanks for your help

kinglintianxia · 2020-04-26T15:50:29Z

@jaybdub , Thanks for your excellent work!
So far I think cmap_channels means keypoint numbers, paf_channels equals to 2*connections, Can you explain upsample_channels means?

"model": {
        "name": "densenet121_baseline_att",
        "kwargs": {
            "cmap_channels": 18,
            "paf_channels": 42,
            "upsample_channels": 256,
            "num_upsample": 3
        }
    },

NicolaGugole · 2020-06-23T20:05:55Z

Hi guys! Have any of you succesfully completed any training using the script provided within the repo?
I'm trying to prune the models but I can't seem to be able to proceed with retraining using train.py because of inconcistency between paf tensors' size:
Traceback (most recent call last): File "provaTrain.py", line 150, in <module> paf_mse = torch.mean(mask * (paf_out - paf)**2) File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/wrap.py", line 58, in wrapper return orig_fn(*new_args, **kwargs) RuntimeError: The size of tensor a (42) must match the size of tensor b (38) at non-singleton dimension 1

NicolaGugole · 2020-06-25T17:22:32Z

I solved it myself, thank you anyway!

OliverGuy · 2020-07-01T11:23:23Z

Hey all, I'm having a similar error as @NicolaGugole using the training dataset downloaded through the provided shell script.
Any tips on how to fix this would be greatly appreciated !
Edit:Nevermind, one just has to edit the model attribute of the json file referenced earlier to match tensor sizes.

NicolaGugole · 2020-07-06T08:43:11Z

Hey all, I'm having a similar error as @NicolaGugole using the training dataset downloaded through the provided shell script.
Any tips on how to fix this would be greatly appreciated !
Edit:Nevermind, one just has to edit the model attribute of the json file referenced earlier to match tensor sizes.

In my case I had to change the annotation file because I noticed a difference between the annotation keypoints number (17 keypoints) and the human_pose.json number (18 keypoints). This difference in tensor sizes is weird in my opinion.
Forcing this sizes to match does not create a fruitful training in my case, I assume because of the fact that the annotation files contain values created to match 17 keypoints while we modified them to match 18 keypoints.

I noticed that in this config file (https://github.com/NVIDIA-AI-IOT/trt_pose/blob/master/tasks/human_pose/experiments/resnet18_baseline_att_224x224_A.json) the devs used a "modified" version of the json file. I hope in the near future we'll have the opportunity to take a look at the modified version of these json files (maybe the devs could upload the files to this repo).

So I have a question @OliverGuy : did you just change the kwargs cmap_channels and paf_channels in the json file referenced earlier? Did that do the job? I tried to do the same but ended up with other conflicts.

Sorry for bothering you all,
Have a nice day!

OliverGuy · 2020-07-06T09:05:07Z

@NicolaGugole I only modified those in the json, but I'm having issues with CudNN not finding the convolution algorithm (see #54).

silent-code · 2020-11-04T20:00:14Z

@NicolaGugole

You have to pre-process the coco annotations. This adds the "Neck" keypoint (midpoint of shoulders) so that you will have 18 keypoints. Use the command:

python3 preprocess_coco_person.py annotations/person_keypoints_train2017.json annotations/person_keypoints_train2017_modified.json

sinuku · 2021-08-26T07:09:25Z

@jaybdub , Thanks for your excellent work!
So far I think cmap_channels means keypoint numbers, paf_channels equals to 2*connections, Can you explain upsample_channels means?

"model": {
        "name": "densenet121_baseline_att",
        "kwargs": {
            "cmap_channels": 18,
            "paf_channels": 42,
            "upsample_channels": 256,
            "num_upsample": 3
        }
    },

Did you figure out what upsample_channels means?
I am struggling with the same issue as you.

Kammmil mentioned this issue May 25, 2020

Problem with trt_pose.coco #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training the model #20

Training the model #20

dy1ngs0ul commented Jan 9, 2020

jaybdub commented Feb 4, 2020

dy1ngs0ul commented Feb 26, 2020

kinglintianxia commented Apr 26, 2020 •

edited

Loading

NicolaGugole commented Jun 23, 2020

NicolaGugole commented Jun 25, 2020

OliverGuy commented Jul 1, 2020 •

edited

Loading

NicolaGugole commented Jul 6, 2020

OliverGuy commented Jul 6, 2020 •

edited

Loading

silent-code commented Nov 4, 2020

sinuku commented Aug 26, 2021 •

edited

Loading

Training the model #20

Training the model #20

Comments

dy1ngs0ul commented Jan 9, 2020

jaybdub commented Feb 4, 2020

dy1ngs0ul commented Feb 26, 2020

kinglintianxia commented Apr 26, 2020 • edited Loading

NicolaGugole commented Jun 23, 2020

NicolaGugole commented Jun 25, 2020

OliverGuy commented Jul 1, 2020 • edited Loading

NicolaGugole commented Jul 6, 2020

OliverGuy commented Jul 6, 2020 • edited Loading

silent-code commented Nov 4, 2020

sinuku commented Aug 26, 2021 • edited Loading

kinglintianxia commented Apr 26, 2020 •

edited

Loading

OliverGuy commented Jul 1, 2020 •

edited

Loading

OliverGuy commented Jul 6, 2020 •

edited

Loading

sinuku commented Aug 26, 2021 •

edited

Loading