This model recognizes gender by analyzing real call recording. It is a Keras implementation of a CNN&LSTM which predict long audio with short audio.
- Python3.6+
- Keras2.3
- scipy, numpy, Pandas, pyAudioAnalysis, pydub, h5py
- Webrtcvad2.0.10
- Sklearn
Generate short audio by:
generate_sample.py
It will generate short audio from long audio by VAD(Voice Activity Detection),then you need to label them and put them in three folders.
Train data files:
├── ...
├── data
│ ├── long_audio #wav files before VAD
│ ├── model #save model
│ └── short_audio #wav files after VAD
│──────├── female # wav files with label female
│──────├── male # wav files with label male
│──────└── noise # wav files with label noise
└── ...
- Set train=true in:
main.py
- It will create feature and label data at first time:
x.npy y.npy label.txt
- If you use your own data, please delete them first.
- Set train=false and model_path in:
main.py
gender | precision | recall |
---|---|---|
female | 0.896 | 0.89 |
male | 0.909 | 0.871 |