You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using Apple's Metal Performace Shader's (MPS) as GPU backend, but, as I still have some warnings, I would like confirm whether not using PyTorch automatic mixed precision has significant implications on model training. Are there some benchmark training statistics available?
Using default configurations I have the following results for my first batches:
INFO: Starting training:
Epochs: 5
Batch size: 1
Learning rate: 1e-05
Training size: 4580
Validation size: 508
Checkpoints: True
Device: mps
Images scaling: 0.5
Mixed Precision: False
Epoch 1/5: 0%| | 0/4580 [00:00<?, ?img/s]/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Epoch 1/5: 9%| | 432/4580 [06:56<1:06:37, 1.04img/s, loss (batch)
Epoch 1/5: 20%|▏| 916/4580 [16:25<59:22, 1.03img/s, loss (batch)=1
Epoch 1/5: 10%| | 460/4580 [09:06<25:52:14, 22.61s/img, loss (batch
Epoch 1/5: 22%|▏| 1002/4580 [19:51<1:10:56, 1.19s/img, loss (batch
Epoch 1/5: 20%|▏| 918/4580 [18:10<22:55:57, 22.54s/img, loss (batch
INFO: Saved interrupt
Traceback (most recent call last):
File "/Users/calkoen/dev/Pytorch-UNet/train.py", line 265, in <module>
train_net(
File "/Users/calkoen/dev/Pytorch-UNet/train.py", line 124, in train_net
grad_scaler.scale(loss).backward()
File "/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
KeyboardInterrupt
Durin this GPU utilization and memory allocation was around 70-100% and 50-80% respectively.
I don't think mixed precision optimizations (amp) exist for MPS, so I train with amp=False.
However, I still got this cuda-related warning:
/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Which comes from this context:
withtorch.cuda.amp.autocast(enabled=amp):
masks_pred=net(images)
loss=criterion(masks_pred, true_masks) +dice_loss(
F.softmax(masks_pred, dim=1).float(),
F.one_hot(true_masks, net.n_classes)
.permute(0, 3, 1, 2)
.float(),
multiclass=True,
)
# just to be sure...print(amp) # False# the warning can be reproduced by running: torch.cuda.amp.autocast() # or torch.cuda.amp.autocast(enabled=False)
This actually makes sense as autocast has the device hard coded to "cuda".
Hi, can you try the latest master? I've added a check for MPS device in the autocast. But since autocast only supports CPU and CUDA, you should still turn AMP off.
I'm using Apple's Metal Performace Shader's (MPS) as GPU backend, but, as I still have some warnings, I would like confirm whether not using PyTorch automatic mixed precision has significant implications on model training. Are there some benchmark training statistics available?
Using default configurations I have the following results for my first batches:
Durin this GPU utilization and memory allocation was around 70-100% and 50-80% respectively.
Some additional info below.
I'm setting the device with:
I don't think mixed precision optimizations (amp) exist for MPS, so I train with
amp=False
.However, I still got this cuda-related warning:
Which comes from this context:
This actually makes sense as autocast has the device hard coded to "cuda".
class autocast(torch.amp.autocast_mode.autocast):
The text was updated successfully, but these errors were encountered: