RuntimeError: dim() called on undefined Tensor #69

jinwchoi · 2019-02-26T04:47:11Z

Hi @dutran, thank you for releasing the code.
I am trying to run hmdb finetuning.
But I got this error.

Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
INFO:train_net:Namespace(base_learning_rate=0.0002, batch_size=4, clip_length_of=8, clip_length_rgb=32, crop_size=112, cudnn_workspace_limit_mb=64, db_type='pickle', display_iter=10, do_flow_aggregation=0, epoch_size=40000, file_store_path='.', flow_data_type=0, frame_gap_of=2, gamma=0.1, get_video_id=0, gpus='0', input_type=0, is_checkpoint=0, model_depth=18, model_name='r2plus1d', num_channels=3, num_decode_threads=4, num_epochs=8, num_gpus=1, num_labels=51, pred_layer_name=None, pretrained_model='/mnt/disks/data/models/r2plus1d/kinetics/l32/r2.5d_d18_l32.pkl', profiling=0, sampling_rate_of=2, sampling_rate_rgb=1, scale_h=128, scale_w=171, step_epoch=2, test_data='/mnt/disks/data/dataset/hmdb51/lmdb/hmdb51_test01', train_data='/mnt/disks/data/dataset/hmdb51/lmdb/hmdb51_train01', use_cudnn=1, use_dropout=0, use_local_file=0, weight_decay=0.005)
INFO:model_builder:Validated: r2plus1d with 18 layers
INFO:model_builder:with input 32x112x112
INFO:train_net:Running on GPUs: [0]
INFO:train_net:Using epoch size: 40000
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO:train_net:train set has 3570 examples
INFO:data_parallel_model:Parallelizing model for devices: [0]
INFO:data_parallel_model:Create input and model training operators
INFO:data_parallel_model:Model for GPU : 0
INFO:model_helper:outputing rgb data
INFO:model_builder:creating r2plus1d, depth=18...
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 230
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 460
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 921
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:data_parallel_model:Adding gradient operators
INFO:data_parallel_model:Add gradient all-reduces for SyncSGD
INFO:data_parallel_model:Post-iteration operators for updating params
INFO:data_parallel_model:Add initial parameter sync
WARNING:data_parallel_model:------- DEPRECATED API, please use data_parallel_model.OptimizeGradientMemory() -----
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Memonger memory optimization took 0.0146450996399 secs
INFO:train_net:----- Create test net ----
WARNING:root:[====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO:train_net:test set has 1530 examples
INFO:data_parallel_model:Parallelizing model for devices: [0]
INFO:data_parallel_model:Create input and model training operators
WARNING:data_parallel_model:
WARNING:data_parallel_model:############# WARNING #############
WARNING:data_parallel_model:Model r2plus1d_test/<caffe2.python.cnn.CNNModelHelper object at 0x7feedb2721d0> is used for testing/validation but
WARNING:data_parallel_model:has init_params=True!
WARNING:data_parallel_model:This can conflict with model training.
WARNING:data_parallel_model:Please ensure model = ModelHelper(init_params=False)
WARNING:data_parallel_model:####################################
WARNING:data_parallel_model:
INFO:data_parallel_model:Model for GPU : 0
INFO:model_helper:outputing rgb data
INFO:model_builder:creating r2plus1d, depth=18...
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 144
INFO:video_model:Number of middle filters: 230
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 288
INFO:video_model:Number of middle filters: 460
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 576
INFO:video_model:Number of middle filters: 921
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:video_model:Number of middle filters: 1152
INFO:data_parallel_model:Parameter update function not defined --> only forward
WARNING:caffe2.python.workspace:Original python traceback for operator 0 in network r2plus1d_test in exception above (most recent call last):
WARNING:caffe2.python.workspace: File "tools/train_net.py", line 501, in
WARNING:caffe2.python.workspace: File "tools/train_net.py", line 496, in main
WARNING:caffe2.python.workspace: File "tools/train_net.py", line 334, in Train
WARNING:caffe2.python.workspace: File "/home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU
WARNING:caffe2.python.workspace: File "/home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/data_parallel_model.py", line 231, in Parallelize
WARNING:caffe2.python.workspace: File "tools/train_net.py", line 326, in test_input_fn
WARNING:caffe2.python.workspace: File "/home/sharepds_gmail_com/src/VMZ/lib/utils/model_helper.py", line 131, in AddVideoInput
Traceback (most recent call last):
File "tools/train_net.py", line 501, in
main()
File "tools/train_net.py", line 496, in main
Train(args)
File "tools/train_net.py", line 337, in Train
workspace.CreateNet(test_model.net)
File "/home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 172, in CreateNet
StringifyProto(net), overwrite,
File "/home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 198, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: dim() called on undefined Tensor (dim at /home/sharepds_gmail_com/pkg/caffe2_2/c10/core/UndefinedTensorImpl.cpp:24)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7fefbe94831a in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: c10::UndefinedTensorImpl::dim() const + 0xca (0x7fefbe94404a in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #2: bool c10::TensorImpl::SetDimsTemplate<long, void>(c10::ArrayRef) + 0x147 (0x7fefd4a3cff7 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #3: caffe2::VideoInputOpcaffe2::CUDAContext::VideoInputOp(caffe2::OperatorDef const&, caffe2::Workspace) + 0x1c79 (0x7fefc002ef39 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: + 0x104612e (0x7fefc003012e in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: std::_Function_handler<std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase > (caffe2::OperatorDef const&, caffe2::Workspace), std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase > ()(caffe2::OperatorDef const&, caffe2::Workspace*)>::_M_invoke(std::_Any_data const&, caffe2::OperatorDef const&, caffe2::Workspace*&&) + 0x23 (0x7fefd6629143 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #6: + 0x17008dd (0x7fefd47e08dd in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: + 0x17030c9 (0x7fefd47e30c9 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: caffe2::CreateOperator(caffe2::OperatorDef const&, caffe2::Workspace*, int) + 0x3b9 (0x7fefd47e3549 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: caffe2::dag_utils::prepareOperatorNodes(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0xe3a (0x7fefd47fe61a in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #10: caffe2::AsyncNetBase::AsyncNetBase(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x23f (0x7fefd478085f in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #11: caffe2::AsyncSchedulingNet::AsyncSchedulingNet(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x9 (0x7fefd478a7c9 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #12: + 0x16ac18e (0x7fefd478c18e in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #13: + 0x16ac043 (0x7fefd478c043 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #14: caffe2::CreateNet(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0xab9 (0x7fefd478f659 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #15: caffe2::Workspace::CreateNet(std::shared_ptr<caffe2::NetDef const> const&, bool) + 0xfd (0x7fefd47a509d in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #16: caffe2::Workspace::CreateNet(caffe2::NetDef const&, bool) + 0x8f (0x7fefd47a64df in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #17: + 0x50d90 (0x7fefd661ed90 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #18: + 0x50fee (0x7fefd661efee in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #19: + 0x90f90 (0x7fefd665ef90 in /home/sharepds_gmail_com/anaconda3/envs/caffe2_p2.7_2/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)

frame #35: __libc_start_main + 0xf0 (0x7fefe4a3c830 in /lib/x86_64-linux-gnu/libc.so.6)

Do you have any solution to this?

The text was updated successfully, but these errors were encountered:

kiyoon · 2019-03-03T09:26:14Z

Same issue here.

Which Caffe2 version should I use?

I use Ubuntu 18.04, CUDA 9.2, built OpenCV from source with CUDA support,
I built Caffe2 as instructed on the official webpage, but added the options this project had specified.

I tried with the master branch and it succeeded, but have the same issue when running the code.
I tried checking out to v0.4.1 or v1.0.1 but failed during compilation.

Any help would be appreciated.

zzjbug · 2019-03-04T05:29:10Z

caffe2/video/video_input_op.h, line 487

prefetched_label_.Resize(
    vector<int64_t>(1, batch_size_ * clip_per_video_ * multi_crop_count_));

change to:

ReinitializeTensor(
    &prefetched_label_,
    vector<int64_t>(1, batch_size_ * clip_per_video_ * multi_crop_count_),
    at::dtype<int>().device(CPU))

kiyoon · 2019-03-04T05:47:15Z

@zzjbug I will try to do it. But can you explain briefly why the error happens, and what did you change to fix it? I mean, why does changing the resizing function to reinitialising fix the error?

zzjbug · 2019-03-04T06:10:58Z

I traced that the exception is thrown by this line of code.

There are a few tensors being initialized in this method: prefetched_clip_rgb_, prefetched_clip_of_, and prefetched_label_. Only this line is using the Resize method while all the others are using ReinitializeTensor.

I have read from another post that the caffe2 tensor interface has changed, probably due to the merge of caffe2 and pytorch. If you check the same file in pytorch 0.4.1, you'll notice that it is using Tensor.Resize everywhere. My guess is that ReinitializeTensor is the new interface to follow, but the porting of video_input_op is not completed yet.

kiyoon · 2019-03-04T06:15:34Z

Wow, that's promising. Thanks for the explanation and suggestion. I'm building the modified source code, and I'll let you know the result.

I traced that the exception is thrown by this line of code.

There are a few tensors being initialized in this method: prefetched_clip_rgb_, prefetched_clip_of_, and prefetched_label_. Only this line is using the Resize method while all the others are using ReinitializeTensor.

I have read from another post that the caffe2 tensor interface has changed, probably due to the merge of caffe2 and pytorch. If you check the same file in pytorch 0.4.1, you'll notice that it is using Tensor.Resize everywhere. My guess is that ReinitializeTensor is the new interface to follow, but the porting of video_input_op is not completed yet.

dutran · 2019-03-04T06:21:32Z

Yes, this is mainly due to tensor interface is changed. We should have an updated VideoInput soon.

mooonick · 2019-03-04T06:38:56Z

@zzjbug i have confronted with the same error in this subject, it runs according to you solutions,thanks very much!!

kiyoon · 2019-03-05T02:46:03Z

@zzjbug Thank you so much. That also solved the issue for me.

zhenheny · 2019-05-06T16:54:13Z

Same here, thanks @zzjbug

dutran · 2019-05-06T19:59:12Z

This PR in pytorch should fix this pytorch/pytorch#15884

dutran · 2019-05-13T20:46:46Z

Sorry, that PR correct one place in 485, but not 487.

dutran · 2019-05-14T04:20:59Z

should be fixed now.

dutran closed this as completed May 6, 2019

dutran reopened this May 13, 2019

dutran closed this as completed May 14, 2019

This was referenced May 14, 2019

Prefetching error when finetuning #73

Closed

Error when running the test script #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: dim() called on undefined Tensor #69

RuntimeError: dim() called on undefined Tensor #69

jinwchoi commented Feb 26, 2019

kiyoon commented Mar 3, 2019

zzjbug commented Mar 4, 2019

kiyoon commented Mar 4, 2019

zzjbug commented Mar 4, 2019

kiyoon commented Mar 4, 2019

dutran commented Mar 4, 2019

mooonick commented Mar 4, 2019

kiyoon commented Mar 5, 2019

zhenheny commented May 6, 2019

dutran commented May 6, 2019

dutran commented May 13, 2019

dutran commented May 14, 2019

RuntimeError: dim() called on undefined Tensor #69

RuntimeError: dim() called on undefined Tensor #69

Comments

jinwchoi commented Feb 26, 2019

kiyoon commented Mar 3, 2019

zzjbug commented Mar 4, 2019

kiyoon commented Mar 4, 2019

zzjbug commented Mar 4, 2019

kiyoon commented Mar 4, 2019

dutran commented Mar 4, 2019

mooonick commented Mar 4, 2019

kiyoon commented Mar 5, 2019

zhenheny commented May 6, 2019

dutran commented May 6, 2019

dutran commented May 13, 2019

dutran commented May 14, 2019