You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, you mentioned that "transfer rich multi-scale texture patterns from the source image distribution to the noise prediction"
How ever, in the code, I find that just the last layer feature of the encoder is used for cross attention. As the [-1] means: pose_out = self.cros_attn2(x = xt_feats[-1], cond = pose_feats[-1]).mean([2,3])
Could you please briefly tell me where is the implementation of "multi-scale" feature for cross attention?
The text was updated successfully, but these errors were encountered:
Well, I think the actual main model is class "BeatGANsAutoencModel" instead of class "BeatGANsPoseGuideModel". And the multiscale condition feature is saved in variable "enc_cond_emb" "mid_cond_emb" and "dec_cond_emb". Is it right?
Thanks for sharing this great work.
In the paper, you mentioned that "transfer rich multi-scale texture patterns from the source image distribution to the noise prediction"
How ever, in the code, I find that just the last layer feature of the encoder is used for cross attention. As the [-1] means:
pose_out = self.cros_attn2(x = xt_feats[-1], cond = pose_feats[-1]).mean([2,3])
Could you please briefly tell me where is the implementation of "multi-scale" feature for cross attention?
The text was updated successfully, but these errors were encountered: