Update 2023-05-09

[important] Please turn to our latest work AVLip accepted by ICASSP2023. https://github.com/DanielMengLiu/AudioVisualLip A lot of optimization is done in AVLip including (but not limitted):

better performed systems
the audio-/visual-only structures
parallel data processing and score decision
opensource audio-visual lip datasets and compressed in .mp4 format
data augmentation
preprocessing code of extracting lips

DeepLip

deep-learning based audio-visual lip bometrics

ASRU paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9688240

trained models: https://drive.google.com/drive/folders/1IalsNtmDH-qFnfgmn_O92J1MUHCaQepl?usp=sharing (similar performance as the ASRU paper but not exactly the same)

Author

Meng Liu, Chang Zeng, Hanyi Zhang

e-mail: [email protected]

Content

This work has been submitted to Interspeech2021. It introduces a deep-learning based audio-visual lip biometrics framework, illustrated as the above figure. Since the audio-visual lip biometrics study has been hindered by the lack of appropriate and sizeable database, this work presents a moderate baseline database using public lip datasets, as well as the baseline system.

Audio-visual lip biometrics is interesting. Different from other audio-visual speaker recognition methods, it leverages the multimodal information from the audible speech and visual speech (i.e., lip movements). Many work hasn't been explored in this area. We will update the code and resource as the advancement of our process.

Have Done

establish a public DeepLip database as well as a well performed baseline system
prove the feasibility of deep-learning based audio-visual lip biometrics
show the complementary power of fusing the audible speech and visual speech

To Do List

complex multimodal fusion methods
compared with other audio-visual speaker recognition methods
prove the robustness of spoof and noisy enviroments
text-dependent audio-visual lip biometrics
collect large audio-visual lip database

Cite

@inproceedings{liu2021deeplip, title={DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics}, author={Liu, Meng and Wang, Longbiao and Lee, Kong Aik and Zhang, Hanyi and Zeng, Chang and Dang, Jianwu}, booktitle={2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, pages={122--129}, year={2021}, organization={IEEE} }

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
conf		conf
database		database
models		models
README.md		README.md
tool.py		tool.py
train_audio.py		train_audio.py
train_fusion.py		train_fusion.py
train_video.py		train_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Update 2023-05-09

DeepLip

Author

Content

Have Done

To Do List

Cite

About

Releases

Packages

Languages

DanielMengLiu/DeepLip

Folders and files

Latest commit

History

Repository files navigation

Update 2023-05-09

DeepLip

Author

Content

Have Done

To Do List

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages