Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect mouth crops #23

Open
ahaliassos opened this issue Oct 17, 2024 · 8 comments
Open

Incorrect mouth crops #23

ahaliassos opened this issue Oct 17, 2024 · 8 comments

Comments

@ahaliassos
Copy link

Hi,

Congrats on your work!

I have run the scripts to download and extract the clips, but when I tried to inspect some clips I noticed that they don't depict the mouth, for example for muavic/es/video/test/jej8qlzlAGw/jej8qlzlAGw_0018.mp4. I wonder if the face detector / alignment failed for many examples.

Could you please point me to a path (in the form of e.g., muavic/es/video/test/jej8qlzlAGw/jej8qlzlAGw_0018.mp4) of a cropped video that is centered around the mouth so that I can check that it's also the case for me locally? Because I have been looking at videos and I can't find one that is centered around the mouth and I'm wondering if I did something wrong.

Many thanks!

@longkhanh-fam
Copy link

longkhanh-fam commented Oct 25, 2024

I also encounter the same problem for ar and de languages currently. The mted folder still contain the videos but the output folders just contain black screen videos instead of cropped mouth ones.

Have you fixed it?

@roudimit
Copy link

Hey @ahaliassos, I wrote up some suggestions and put the cropped video / landmarks here for debugging

@sungnyun
Copy link

sungnyun commented Oct 31, 2024

I'm facing the same problem, all videos are just black screen while the original videos are okay. Is the metadata broken?

+)
The landmark metadata @roudimit provided is the same with mine. I think the problem comes from cropping.

@sungnyun
Copy link

OK, I think I found the reason. The main reason was that my youtube video download was somehow set to be 360p. So the pixel range in the metadata was beyond the video resolution.

You'd better check if the downloaded videos were 1080p res, which metadata should be based on.

@longkhanh-fam
Copy link

@sungnyun you're right; however, not all videos are in 1080p. For example, the video in de/video/test/r2tvb4-i4EE is only 720p. While 720p works, it doesn’t provide the best quality. When I compared the output of this 720p video with @roudimit
provided video, the results weren’t identical. This may affect the fairness of further benchmarking.
Do you have any idea?

@roudimit
Copy link

Nice debugging! I checked ffprobe mtedx/video/de/test/r2tvb4-i4EE.mp4 and the output was:

  Duration: 00:11:10.52, start: 0.000000, bitrate: 1386 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1257 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)

So I guess my video is also in 720p. In general, I'm guessing some are 1080p and some are lower resolution.
@longkhanh-fam what does your cropped video look like?

@longkhanh-fam
Copy link

Sorry for my late reply.
I've uploaded my cropped videos here. Comparing our videos, mine clearly includes the eyes and nose, whereas yours doesn’t. I also checked the video statistics using ffprobe, and they match. This discrepancy could be due to my code and the supported quality of video, I'll check it.
Duration: 00:11:10.56, start: 0.000000, bitrate: 1111 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 1109 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)

Additionally, could you please double-check the remaining video? The specific path is muavic/de/video/test/pR_8SsedSLI/pR_8SsedSLI_0000.mp4. I noticed that the mouth isn't fully captured in the crop. Have you encountered a similar issue?

Sorry to bother you with this, and thank you for your help!

@roudimit
Copy link

roudimit commented Nov 5, 2024

Your dropbox link doesn't work, can you check the settings?
I noticed your bitrate is lower so maybe your video downloaded from YouTube with less quality.

Here's 'muavic/de/video/test/pR_8SsedSLI/pR_8SsedSLI_0000.mp4', as you can see most of the video doesn't have a speaker, and then the final part of the video isn't cropped on the person's face.
https://github.com/user-attachments/assets/e28ab1dd-98b2-44bc-9115-9d7d5cad9d8c

FYI it's a known issue that many of the multilingual mTedX videos don't have the speaker visible. "Visual Speech Recognition for Multiple Languages" proposed to filter mTedX videos with the speaker visible and the amount of data becomes much less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants