Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on mac mps instead of cuda #393

Open
FlorisCalkoen opened this issue Oct 4, 2022 · 2 comments
Open

Training on mac mps instead of cuda #393

FlorisCalkoen opened this issue Oct 4, 2022 · 2 comments

Comments

@FlorisCalkoen
Copy link

I'm using Apple's Metal Performace Shader's (MPS) as GPU backend, but, as I still have some warnings, I would like confirm whether not using PyTorch automatic mixed precision has significant implications on model training. Are there some benchmark training statistics available?

Using default configurations I have the following results for my first batches:

INFO: Starting training:
        Epochs:          5
        Batch size:      1
        Learning rate:   1e-05
        Training size:   4580
        Validation size: 508
        Checkpoints:     True
        Device:          mps
        Images scaling:  0.5
        Mixed Precision: False
Epoch 1/5:   0%|                          | 0/4580 [00:00<?, ?img/s]/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Epoch 1/5:   9%| | 432/4580 [06:56<1:06:37,  1.04img/s, loss (batch)
Epoch 1/5:  20%|▏| 916/4580 [16:25<59:22,  1.03img/s, loss (batch)=1
Epoch 1/5:  10%| | 460/4580 [09:06<25:52:14, 22.61s/img, loss (batch
Epoch 1/5:  22%|▏| 1002/4580 [19:51<1:10:56,  1.19s/img, loss (batch
Epoch 1/5:  20%|▏| 918/4580 [18:10<22:55:57, 22.54s/img, loss (batch
INFO: Saved interrupt
Traceback (most recent call last):
  File "/Users/calkoen/dev/Pytorch-UNet/train.py", line 265, in <module>
    train_net(
  File "/Users/calkoen/dev/Pytorch-UNet/train.py", line 124, in train_net
    grad_scaler.scale(loss).backward()
  File "/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
KeyboardInterrupt

Durin this GPU utilization and memory allocation was around 70-100% and 50-80% respectively.

Some additional info below.

I'm setting the device with:

    device = torch.device("mps" if torch.backends.mps.is_available() else "cpu") 
    print(device) # device(type='mps')

I don't think mixed precision optimizations (amp) exist for MPS, so I train with amp=False.

However, I still got this cuda-related warning:

/Users/calkoen/miniconda3/envs/torch/lib/python3.10/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

Which comes from this context:

with torch.cuda.amp.autocast(enabled=amp):
      masks_pred = net(images)
      loss = criterion(masks_pred, true_masks) + dice_loss(
          F.softmax(masks_pred, dim=1).float(),
          F.one_hot(true_masks, net.n_classes)
          .permute(0, 3, 1, 2)
          .float(),
          multiclass=True,
      )

# just to be sure...
print(amp)  # False
    
# the warning can be reproduced by running: 
torch.cuda.amp.autocast()   # or torch.cuda.amp.autocast(enabled=False)

This actually makes sense as autocast has the device hard coded to "cuda".

class autocast(torch.amp.autocast_mode.autocast):

def __init__(self, enabled : bool = True, dtype : torch.dtype = torch.float16, cache_enabled : bool = True):
    if torch._jit_internal.is_scripting():
        self._enabled = enabled
        self.device = "cuda"
        self.fast_dtype = dtype
        return
    super().__init__("cuda", enabled=enabled, dtype=dtype, cache_enabled=cache_enabled)
@milesial
Copy link
Owner

milesial commented Dec 6, 2022

Hi, can you try the latest master? I've added a check for MPS device in the autocast. But since autocast only supports CPU and CUDA, you should still turn AMP off.

@FlorisCalkoen
Copy link
Author

@milesial, great thanks. I'm currently out of office, but will check it asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants