Skip to content
forked from ardaillon/FCN-f0

Fully-Convolutional Network for Pitch Estimation of Speech Signals

License

Notifications You must be signed in to change notification settings

GrantL10/FCN-f0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FCN-f0

L. Ardaillon and A. Roebel, "Fully-Convolutional Network for Pitch Estimation of Speech Signals", Proc. Interspeech, 2019.

We kindly request that academic publications making use of our FCN models cite the aforementioned paper.

Description

The code provided in this repository aims at performing monophonic pitch (f0) estimation. It is partly based on the code from the CREPE repository => https://github.com/marl/crepe

Two different fully-convolutional pre-trained models are provided. Those models have been trained exclusively on speech data and may thus not perform as well on other types of sounds.

The code currently provided only allows to run the pitch estimation on given sound files using the provided pretrained models (no code is currently provided to train the model on new data).

The models, algorithm, training, and evaluation procedures have been described in a publication entitled "Fully-Convolutional Network for Pitch Estimation of Speech Signals", to be presented at the Interspeech 2019 conference.

Below are the results of our evaluation comparing our models to the SWIPE algorithm and CREPE model:

FCN-1953 FCN-993 FCN-929 CREPE CREPE-speech SWIPE
PAN-synth (25 cents) 93.62 ± 3.34% 94.31 ± 3.15% 93.50 ± 3.43% 77.62 ± 9.31% 86.92 ± 8.28% 84.56 ± 11.68%
PAN-synth (50 cents) 98.37 ± 1.62% 98.53 ± 1.54% 98.27 ± 1.73% 91.23 ± 6.00% 97.27 ± 2.09% 93.10 ± 7.26%
PAN-synth (200 cents) 99.81 ± 0.64% 99.79 ± 0.65% 99.77 ± 0.73% 95.65 ± 5.17% 99.25 ± 1.07% 97.51 ± 4.90%
manual (50 cents) 88.32 ± 6.33% 88.57 ± 5.77% 88.88 ± 5.73% 87.03 ± 7.35% 88.45 ± 5.70% 85.93 ± 7.62%
manual (200 cents) 97.35 ± 3.02% 97.31 ± 2.56% 97.36 ± 2.51% 92.57 ± 5.22% 96.63 ± 2.91% 95.03 ± 4.04%

And below are comparaison of latency and computation times for the different models and SWIPE :

FCN-1953 FCN-993 FCN-929 CREPE SWIPE
latency 0.122s 0.062s 0.058s 0.032s 0.128
GPU 0.016s 0.010s 0.021s 0.092s X
CPU 1.65s 0.89s 3.34s 14.79s 0.63s

Example command-line usage (using provided pretrained models)

model FCN-1953

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_1953.f0.csv -m /path_to/FCN-f0/models/FCN_1953/model.json -w /path_to/FCN-f0/models/FCN_1953/weights.h5 --use_single_core --verbose --plot

or

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_1953-no_json.f0.csv -w /path_to/FCN-f0/models/FCN_1953/weights.h5 -is 1953 --use_single_core --verbose --plot

model FCN-929

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_929.f0.csv -m /path_to/FCN-f0/models/FCN_929/model.json -w /path_to/FCN-f0/models/FCN_929/weights.h5 --use_single_core --verbose --plot

or

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_929-no_json.f0.csv -w /path_to/FCN-f0/models/FCN_929/weights.h5 -is 929 --use_single_core --verbose --plot

References

[1] Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello. "CREPE: A Convolutional Representation for Pitch Estimation", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.

About

Fully-Convolutional Network for Pitch Estimation of Speech Signals

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%