Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss goes horizontally after 200 epoch #11

Open
bqm1111 opened this issue Apr 23, 2024 · 8 comments
Open

Loss goes horizontally after 200 epoch #11

bqm1111 opened this issue Apr 23, 2024 · 8 comments

Comments

@bqm1111
Copy link

bqm1111 commented Apr 23, 2024

Thank you for your interesting work. I tried to use your method on my custom dataset. The loss goes from 2.2 to 0.2 in less than 200 epochs, then refuses to go down. Do you encounter this problem? What can I do to overcome this?

@Zian-Xu
Copy link
Owner

Zian-Xu commented Apr 24, 2024

Since I'm not sure which dataset you're using, I'm uncertain whether there are issues with the training or if the difficulty of the upstream task itself is causing the loss not to decrease. Perhaps you could also experiment with using MAE to see if the loss can decrease on your own dataset.

@bqm1111
Copy link
Author

bqm1111 commented Apr 24, 2024

I trained on RGB images from SUNRGBD dataset which has over 5000 images for training. I used MAE and the result is the same. It seems like the training process stuck in a local minima. How many epochs did you train, is there anything special about your learning rate scheduler?

@Zian-Xu
Copy link
Owner

Zian-Xu commented Apr 24, 2024

The situation you described, which the model gets stuck in a local optimum, is indeed a possibility, but I can't offer you specific advice on how to address this issue. Typically, I would try different loss functions, optimizers, and so on, but there's no guarantee that the problem will be resolved. Another possible situation is that the upstream task on your dataset itself is inherently difficult, so the loss may not continue to decrease. The configuration I used can be directly found in the open-source project code. I didn't employ any different operations.

@bqm1111
Copy link
Author

bqm1111 commented Apr 24, 2024

How small do you expect your loss to be for a good reconstruction?

@Zian-Xu
Copy link
Owner

Zian-Xu commented Apr 24, 2024

For different datasets, the final loss obtained by Swin MAE is not entirely the same. However, for the two datasets I tried, it was roughly between 0.002 and 0.003. You can see the loss curves for each experiment in the paper.

@bqm1111
Copy link
Author

bqm1111 commented Apr 24, 2024

Now I know what is the problem. I see that you did not use normalization and RandomResizedCrop transform as in original MAE. When I do not use normalization, the loss starts to come close to your report. Do you have any comment on the effect of those transformations?

@bqm1111
Copy link
Author

bqm1111 commented Apr 24, 2024

As mentioned here. If your goal is to reconstruct a good-looking image, use unnormalized pixels. If your goal is to finetune for a downstream recognition task, use normalized pixels. Did you finetune downstream task using normalized or unnormalized pixels?

@Zian-Xu
Copy link
Owner

Zian-Xu commented May 9, 2024

MAE does not rely on data augmentation as much as contrastive learning. And I believe that RandomResizedCrop destroys the integrity of medical images. Therefore, RandomResizedCrop is not used in the experiments.
In previous experiments, unnormalized pixels have been used. The role played by normalization needs further experimental verification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants