You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In your paper, only the adapter is updated. But although you only allow gradients for the adapter in your ImageNet code, the model is already initialized with all parameters requiring gradients.
definject_trainable_vida(... ):
# model already initialized with all parameters requiring gradients.for_moduleinmodel.modules():
if_module.__class__.__name__intarget_replace_module:
forname, _child_modulein_module.named_modules():
if_child_module.__class__.__name__=="Linear":
# ... inject the adapter_module._modules[name].vida_up.weight.requires_grad=True_module._modules[name].vida_down.weight.requires_grad=Truerequire_grad_params.extend(
list(_module._modules[name].vida_up2.parameters())
)
require_grad_params.extend(
list(_module._modules[name].vida_down2.parameters())
)
_module._modules[name].vida_up2.weight.requires_grad=True_module._modules[name].vida_down2.weight.requires_grad=Truenames.append(name)
print([nameforname, paraminmodel.named_parameters() ifparam.requires_grad])
# will contains all module of the model (backbone + vida)
This means that when the code actually runs, all parts are updated.
When I tried disabling gradients for the backbone, I couldn't achieve the expected performance, with results like this, which is almost 1 persent higher then the reported result in the paper.
(Adapter_LR: 2e-7 EMA_MT: 0.8)
Metric
Gaussian
Shot
Impulse
Defocus
Glass
Motion
Zoom
Snow
Frost
Fog
Brightness
Contrast
ElasticTransform
Pixelate
JPEG
Mean
Error
48.68
42.72
42.80
52.56
59.10
44.78
49.74
39.22
42.36
40.28
24.34
58.50
50.64
33.64
32.96
44.15
Do you have any ideas why this happened, that will be a great help for us.
Also, could you please provide the code for the ResNet part and the warm-up model, or just the warm-up model? That would greatly help our research.
The text was updated successfully, but these errors were encountered:
steven12138
changed the title
About Gradient Problem
About Gradient Problem and Resnet Model
May 1, 2024
In your paper, only the adapter is updated. But although you only allow gradients for the adapter in your ImageNet code, the model is already initialized with all parameters requiring gradients.
This means that when the code actually runs, all parts are updated.
When I tried disabling gradients for the backbone, I couldn't achieve the expected performance, with results like this, which is almost 1 persent higher then the reported result in the paper.
(Adapter_LR: 2e-7 EMA_MT: 0.8)
Do you have any ideas why this happened, that will be a great help for us.
Also, could you please provide the code for the ResNet part and the warm-up model, or just the warm-up model? That would greatly help our research.
The text was updated successfully, but these errors were encountered: