#To Extract the Text-based Features
1.FastTextEmb_and_LASEREmbExtraction.py 2.BERTandHateXPlainEmbedding.py
3.AudioMFCC_Feat_andSpectrumGen.py
4.AudioVGG19andInceptionFeat.py
4.AudioVGG19andInceptionFeat.py 5.Model-ViT_featureExtract.py
- UnimodalANN_foldWise.py
6.Vision+lstm_foldWise.py
7.3DCNN_withFolds.py
- MultiModalFusionModelfoldWise.py
frameExtract.py
The 'all__video_vosk_audioMap.p' has to be generated using the Vosk speech recognition toolkit(https://alphacephei.com/vosk/). The format of the file is in JSON format like the below:
{ "video_name1": "transcript1", "video_name2": "transcript2", ... "video_name3": "transcript3" }