Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clip阶段sync loss的问题 #118

Open
liwang0621 opened this issue Jun 11, 2024 · 3 comments
Open

clip阶段sync loss的问题 #118

liwang0621 opened this issue Jun 11, 2024 · 3 comments

Comments

@liwang0621
Copy link

https://github.com/MRzzm/DINet/blob/master/dataset/dataset_DINet_clip.py#L43
source_image_data取的是每个clip的2到7帧,但是计算sync loss使用到的音频特征是整个片段9帧的音频特征(deep_speech_full),这部分会有问题吗,音频特征(deep_speech_full)需要也对边变成取2:7真的音频特征不

@A11enCheung
Copy link

不用,按照它这种方法训练是可以训练出来的,至于为什么作者也没有详细说

@sunjian2015
Copy link

同样的疑问

@maggiez0138
Copy link

猜测这种用法是因为:
1)Frame训练阶段,每个frame给定的驱动音频为(5, 29)的DeepSpeech特征;看着是1个视频帧,采用前后5帧对应的音频feature来驱动。
2) 所以,对应Clip阶段,训练 (2, 2+5)阶段的video frames时,对应的音频也是(2+5+2)总共9帧,也就是一个clip的音频特征;
3)整体看着Clip中,一个clip中,0,1,7,8这几个video frames其实都不太会参与训练,对数据集可能会有点浪费?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants