Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about inference speed #51

Closed
zhangtingyu11 opened this issue Dec 6, 2023 · 2 comments
Closed

Questions about inference speed #51

zhangtingyu11 opened this issue Dec 6, 2023 · 2 comments

Comments

@zhangtingyu11
Copy link

I use single RTX4090 for testing the inference speed, after generating the depth map, I run the following command.

python3 test.py --cfg_file cfgs/models/kitti/VirConv-T.yaml --batch_size 1 --ckpt ../weights/VirConv-T2.pth

and I use the following code(only part code) to test the inference speed.

  time_total = 0
  time_cnt = 0
  def get_milliseconds():
      return int(round(time.time() * 1000))
  for i, batch_dict in enumerate(dataloader):
      load_data_to_gpu(batch_dict)
      #begin = time.time()
      start_time = get_milliseconds()
      with torch.no_grad():
          pred_dicts, ret_dict, batch_dict = model(batch_dict)
      disp_dict = {}
      torch.cuda.synchronize()
      end_time = get_milliseconds()
      time_total += end_time-start_time
      time_cnt+=1
      print("inference time: {}".format(time_total/time_cnt))

however, the result said that it need 120ms to process per frame. I wonder that 1. RTX4090 is faster than RTX3090, why it seems slower than that mentioned in the paper. 2. Does the model training use pseudo points generation again or it only use the already generated pseudo points. If it will not generate the pseudo points again, for my machine, the pseudo points generation costs 80ms per frame. So the total processing time is 200ms(120+80) on my machine?
Hoping for your reply. And the performance of your work is amazing.

@hailanyi
Copy link
Owner

hailanyi commented Dec 7, 2023

The speed in the paper is mostly detection only (since the PENet reports a very fast speed, but now I also notice that it is not correct) and evaluated under two-stage refinements (RPN+2refine). This released code uses (RPN+3refine), so it is slightly slower than paper. Besides, the speed is also related to the CPU, as some operations such as multiple transformations are done on the CPU. Note that, using 2refine may lead to decreased performance. So additional parameter tuning is required.

@hailanyi hailanyi closed this as completed Dec 7, 2023
@zhangtingyu11
Copy link
Author

The speed in the paper is mostly detection only (since the PENet reports a very fast speed, but now I also notice that it is not correct) and evaluated under two-stage refinements (RPN+2refine). This released code uses (RPN+3refine), so it is slightly slower than paper. Besides, the speed is also related to the CPU, as some operations such as multiple transformations are done on the CPU. Note that, using 2refine may lead to decreased performance. So additional parameter tuning is required.

Thank for your reply, I further find it is the CPU burden that causes the difference in detection speed. I test it again on the same sample for 10000 times(without CPU-intensive dataloading process). And the average inference time is 67.35ms, I think this inference time sounds reasonable compared to RTX3090.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants