Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在训练过程中指标值突然为0 #13

Open
MingboDuan opened this issue Oct 2, 2024 · 10 comments
Open

在训练过程中指标值突然为0 #13

MingboDuan opened this issue Oct 2, 2024 · 10 comments

Comments

@MingboDuan
Copy link

您好,我在运行train.py时在某一轮时,指标值全部骤降为0,如下图的IoU:
1
我的配置如下图所示:
2

为什么会出现这种问题呢?(大致270轮左右时骤降指标)之前训练时从没有出现过,而且数据集也没有缺损或者异常!
麻烦您能帮我解决困惑,谢谢您!

@MingboDuan
Copy link
Author

作者,您好!上面问题我通过将‘begin_test’开始的轮数提前(从220提前到100),成功解决了该问题!
那么想请问 对于我训练SIRST3数据集,我开始测试轮数以及最终结束轮数设为多少是比较合适的呢?以及这是为什么不合理的设置会导致训练指标的严重下滑呢?

@xdFai
Copy link
Owner

xdFai commented Oct 4, 2024

您好,对于 SIRST3 建议500轮 开测 结束轮1000。
很抱歉,说实话我训练的时候并没有遇到过您这种指标变成0的情况。

@MingboDuan
Copy link
Author

MingboDuan commented Oct 4, 2024 via email

@xdFai
Copy link
Owner

xdFai commented Oct 5, 2024

经验之谈的话 一般是在600~800轮之间 能出现比较好的结果 800轮之后就可以停止训练啦

@MingboDuan
Copy link
Author

MingboDuan commented Oct 6, 2024 via email

@arrowonstr
Copy link

同样的问题 也是指标变为0 且loss在200多轮的时候飙升至一千多

@xdFai
Copy link
Owner

xdFai commented Dec 11, 2024

您好 是三个数据集一起训练的吗

@arrowonstr
Copy link

@xdFai 使用的IRSTD1K,训练集比测试集4:1
优化器Adagrad 500轮 begintest200轮
loss还加入了其它的iou loss进行测试
log如下:

Dec  9 19:42:43 Epoch---10, total_loss---15.624369,
Dec  9 19:45:58 Epoch---20, total_loss---12.861537,
Dec  9 19:49:15 Epoch---30, total_loss---12.847774,
Dec  9 19:52:31 Epoch---40, total_loss---12.836510,
Dec  9 19:55:47 Epoch---50, total_loss---12.835945,
Dec  9 19:59:03 Epoch---60, total_loss---12.832285,
Dec  9 20:02:19 Epoch---70, total_loss---12.832659,
Dec  9 20:05:34 Epoch---80, total_loss---12.835690,
Dec  9 20:08:50 Epoch---90, total_loss---12.831832,
Dec  9 20:12:06 Epoch---100, total_loss---12.834795,
Dec  9 20:15:21 Epoch---110, total_loss---12.836212,
Dec  9 20:18:37 Epoch---120, total_loss---12.828453,
Dec  9 20:21:53 Epoch---130, total_loss---12.822076,
Dec  9 20:25:08 Epoch---140, total_loss---12.835477,
Dec  9 20:28:24 Epoch---150, total_loss---12.831886,
Dec  9 20:31:39 Epoch---160, total_loss---12.828893,
Dec  9 20:34:55 Epoch---170, total_loss---12.843395,
Dec  9 20:38:10 Epoch---180, total_loss---12.836482,
Dec  9 20:41:26 Epoch---190, total_loss---12.820898,
Dec  9 20:44:43 Epoch---200, total_loss---12.825828,
the best model epoch 	200
pixAcc, mIoU:	(0.842510461807251, np.float64(0.5948936996408116))
PD, FA:	(0.936026936026936, 6.841783033451065e-05)
Dec  9 20:50:35 Epoch---210, total_loss---111.857483,
Dec  9 20:56:29 Epoch---220, total_loss---113.037758,
Dec  9 21:02:21 Epoch---230, total_loss---113.045013,
Dec  9 21:08:13 Epoch---240, total_loss---113.042252,
Dec  9 21:14:06 Epoch---250, total_loss---113.056198,
Dec  9 21:19:58 Epoch---260, total_loss---113.046364,
Dec  9 21:25:50 Epoch---270, total_loss---113.059280,
Dec  9 21:31:43 Epoch---280, total_loss---113.042625,
Dec  9 21:37:37 Epoch---290, total_loss---113.056938,
Dec  9 21:43:30 Epoch---300, total_loss---113.048706,
Dec  9 21:49:23 Epoch---310, total_loss---113.051620,
Dec  9 21:55:16 Epoch---320, total_loss---113.050255,
Dec  9 22:01:10 Epoch---330, total_loss---113.046349,
Dec  9 22:07:03 Epoch---340, total_loss---113.040436,
Dec  9 22:12:56 Epoch---350, total_loss---113.055527,
Dec  9 22:18:49 Epoch---360, total_loss---113.044685,
Dec  9 22:24:43 Epoch---370, total_loss---113.042641,
Dec  9 22:30:36 Epoch---380, total_loss---113.047241,
Dec  9 22:36:29 Epoch---390, total_loss---113.046936,
Dec  9 22:42:24 Epoch---400, total_loss---113.043839,
Dec  9 22:48:16 Epoch---410, total_loss---113.050171,
Dec  9 22:54:11 Epoch---420, total_loss---113.042526,
Dec  9 23:00:06 Epoch---430, total_loss---113.046516,
Dec  9 23:05:59 Epoch---440, total_loss---113.046837,
Dec  9 23:11:53 Epoch---450, total_loss---113.053688,
Dec  9 23:17:47 Epoch---460, total_loss---113.042084,
Dec  9 23:23:41 Epoch---470, total_loss---113.057220,
Dec  9 23:29:34 Epoch---480, total_loss---113.051018,
Dec  9 23:35:27 Epoch---490, total_loss---113.048325,
Dec  9 23:41:21 Epoch---500, total_loss---113.043686,
pixAcc, mIoU:	(0.0, np.float64(0.0))
PD, FA:	(0.0, 0.0)

其中200轮miou异常过高
200后loss突然变成100多,指标全为0

@xdFai
Copy link
Owner

xdFai commented Dec 11, 2024 via email

@arrowonstr
Copy link

@xdFai 我怀疑pd降为0的原因是不是

def update(self, preds, labels, size):
        predits = np.array((preds).cpu()).astype('int64')
        labelss = np.array((labels).cpu()).astype('int64')

        image = measure.label(predits, connectivity=2)
        coord_image = measure.regionprops(image)
        label = measure.label(labelss, connectivity=2)
        coord_label = measure.regionprops(label)

中没有对predits做sigmoid或者>threshold 二值处理 直接int64 导致有些应该被判为连同的地方 值不一样

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants