Questions about inference speed #51

zhangtingyu11 · 2023-12-06T06:30:03Z

I use single RTX4090 for testing the inference speed, after generating the depth map, I run the following command.

python3 test.py --cfg_file cfgs/models/kitti/VirConv-T.yaml --batch_size 1 --ckpt ../weights/VirConv-T2.pth

and I use the following code(only part code) to test the inference speed.

  time_total = 0
  time_cnt = 0
  def get_milliseconds():
      return int(round(time.time() * 1000))
  for i, batch_dict in enumerate(dataloader):
      load_data_to_gpu(batch_dict)
      #begin = time.time()
      start_time = get_milliseconds()
      with torch.no_grad():
          pred_dicts, ret_dict, batch_dict = model(batch_dict)
      disp_dict = {}
      torch.cuda.synchronize()
      end_time = get_milliseconds()
      time_total += end_time-start_time
      time_cnt+=1
      print("inference time: {}".format(time_total/time_cnt))

however, the result said that it need 120ms to process per frame. I wonder that 1. RTX4090 is faster than RTX3090, why it seems slower than that mentioned in the paper. 2. Does the model training use pseudo points generation again or it only use the already generated pseudo points. If it will not generate the pseudo points again, for my machine, the pseudo points generation costs 80ms per frame. So the total processing time is 200ms(120+80) on my machine?
Hoping for your reply. And the performance of your work is amazing.

hailanyi · 2023-12-07T13:47:22Z

The speed in the paper is mostly detection only (since the PENet reports a very fast speed, but now I also notice that it is not correct) and evaluated under two-stage refinements (RPN+2refine). This released code uses (RPN+3refine), so it is slightly slower than paper. Besides, the speed is also related to the CPU, as some operations such as multiple transformations are done on the CPU. Note that, using 2refine may lead to decreased performance. So additional parameter tuning is required.

zhangtingyu11 · 2023-12-11T06:59:16Z

The speed in the paper is mostly detection only (since the PENet reports a very fast speed, but now I also notice that it is not correct) and evaluated under two-stage refinements (RPN+2refine). This released code uses (RPN+3refine), so it is slightly slower than paper. Besides, the speed is also related to the CPU, as some operations such as multiple transformations are done on the CPU. Note that, using 2refine may lead to decreased performance. So additional parameter tuning is required.

Thank for your reply, I further find it is the CPU burden that causes the difference in detection speed. I test it again on the same sample for 10000 times(without CPU-intensive dataloading process). And the average inference time is 67.35ms, I think this inference time sounds reasonable compared to RTX3090.

hailanyi closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about inference speed #51

Questions about inference speed #51

zhangtingyu11 commented Dec 6, 2023

hailanyi commented Dec 7, 2023

zhangtingyu11 commented Dec 11, 2023

Questions about inference speed #51

Questions about inference speed #51

Comments

zhangtingyu11 commented Dec 6, 2023

hailanyi commented Dec 7, 2023

zhangtingyu11 commented Dec 11, 2023