-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SegFormer Training extremely slow #996
Comments
Hey @omarequalmars, thanks for reporting the issue! It might be due to it's high-resolution features in the head.. but there might be some profiling needed to find the reason. cc @brianhou0208 maybe you have some insights |
Hi @omarequalmars , I tested your hyperparameter settings and resolved performance issue. Please check #998 Given that you're using a lightweight backbone encoder, I recommend reducing the Regarding input resolution, since it wasn't specified, smaller resolutions are unlikely to cause significant speed differences. However, larger resolutions could have a notable impact on performance. If you're exploring transformer-style semantic segmentation models, aside from SegFormer, you may also consider TopFormer and SeaFormer, which are designed for efficiency as well. |
Hi @brianhou0208 , I'm trying it out now with the decoder channels reduced; running almost as fast as the others. I'm using 512 by 512 image sizes and there are no noticeable slow downs compared to other models trained with identical images sizes. I will definitely be looking into TopFormer and SeaFormer for a comparison. Thank you so much for the quick fix, you're a lifesaver! |
I've been training a series of models implemented by this package for a while, all using 'tu-mobilevit_xxs' as an encoder. However I noticed that the latest addition, segformer, is extremely slow in training compared to the others despite being reported to have the same number of parameters by Pytorch lightning. Here is a visualization from Tensorboard:
And here is the parameter count reported by Lightning:
It's much slower than the rest in training, and considerably slows down my laptop. Why does it take so much resources despite being approximately the same size as the other models? Here are my architecture hyperparameters for reference, for every model I trained:
The text was updated successfully, but these errors were encountered: