We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
看https://github.com/MRzzm/DINet/blob/master/dataset/dataset_DINet_clip.py#L43 source_image_data取的是每个clip的2到7帧,但是计算sync loss使用到的音频特征是整个片段9帧的音频特征(deep_speech_full),这部分会有问题吗,音频特征(deep_speech_full)需要也对边变成取2:7真的音频特征不
The text was updated successfully, but these errors were encountered:
不用,按照它这种方法训练是可以训练出来的,至于为什么作者也没有详细说
Sorry, something went wrong.
同样的疑问
猜测这种用法是因为: 1)Frame训练阶段,每个frame给定的驱动音频为(5, 29)的DeepSpeech特征;看着是1个视频帧,采用前后5帧对应的音频feature来驱动。 2) 所以,对应Clip阶段,训练 (2, 2+5)阶段的video frames时,对应的音频也是(2+5+2)总共9帧,也就是一个clip的音频特征; 3)整体看着Clip中,一个clip中,0,1,7,8这几个video frames其实都不太会参与训练,对数据集可能会有点浪费?
No branches or pull requests
看https://github.com/MRzzm/DINet/blob/master/dataset/dataset_DINet_clip.py#L43
source_image_data取的是每个clip的2到7帧,但是计算sync loss使用到的音频特征是整个片段9帧的音频特征(deep_speech_full),这部分会有问题吗,音频特征(deep_speech_full)需要也对边变成取2:7真的音频特征不
The text was updated successfully, but these errors were encountered: