Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
prediction.py		prediction.py

Repository files navigation

FCN-f0

L. Ardaillon and A. Roebel, "Fully-Convolutional Network for Pitch Estimation of Speech Signals", Proc. Interspeech, 2019.

We kindly request that academic publications making use of our FCN models cite the aforementioned paper.

Description

The code provided in this repository aims at performing monophonic pitch (f0) estimation. It is partly based on the code from the CREPE repository => https://github.com/marl/crepe

Two different fully-convolutional pre-trained models are provided. Those models have been trained exclusively on speech data and may thus not perform as well on other types of sounds.

The code currently provided only allows to run the pitch estimation on given sound files using the provided pretrained models (no code is currently provided to train the model on new data).

The models, algorithm, training, and evaluation procedures have been described in a publication entitled "Fully-Convolutional Network for Pitch Estimation of Speech Signals", to be presented at the Interspeech 2019 conference.

Below are the results of our evaluation comparing our models to the SWIPE algorithm and CREPE model:

	_FCN-1953	_FCN-993	_FCN-929	_CREPE	_CREPE-speech	_SWIPE
_{PAN-synth (25 cents)}	_{93.62 ± 3.34%}	_{94.31 ± 3.15%}	_{93.50 ± 3.43%}	_{77.62 ± 9.31%}	_{86.92 ± 8.28%}	_{84.56 ± 11.68%}
_{PAN-synth (50 cents)}	_{98.37 ± 1.62%}	_{98.53 ± 1.54%}	_{98.27 ± 1.73%}	_{91.23 ± 6.00%}	_{97.27 ± 2.09%}	_{93.10 ± 7.26%}
_{PAN-synth (200 cents)}	_{99.81 ± 0.64%}	_{99.79 ± 0.65%}	_{99.77 ± 0.73%}	_{95.65 ± 5.17%}	_{99.25 ± 1.07%}	_{97.51 ± 4.90%}
_{manual (50 cents)}	_{88.32 ± 6.33%}	_{88.57 ± 5.77%}	_{88.88 ± 5.73%}	_{87.03 ± 7.35%}	_{88.45 ± 5.70%}	_{85.93 ± 7.62%}
_{manual (200 cents)}	_{97.35 ± 3.02%}	_{97.31 ± 2.56%}	_{97.36 ± 2.51%}	_{92.57 ± 5.22%}	_{96.63 ± 2.91%}	_{95.03 ± 4.04%}

And below are comparaison of latency and computation times for the different models and SWIPE :

	_FCN-1953	_FCN-993	_FCN-929	_CREPE	_SWIPE
_latency	_0.122s	_0.062s	_0.058s	_0.032s	_0.128
_GPU	_0.016s	_0.010s	_0.021s	_0.092s	_X
_CPU	_1.65s	_0.89s	_3.34s	_14.79s	_0.63s

Example command-line usage (using provided pretrained models)

model FCN-1953

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_1953.f0.csv -m /path_to/FCN-f0/models/FCN_1953/model.json -w /path_to/FCN-f0/models/FCN_1953/weights.h5 --use_single_core --verbose --plot

or

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_1953-no_json.f0.csv -w /path_to/FCN-f0/models/FCN_1953/weights.h5 -is 1953 --use_single_core --verbose --plot

model FCN-929

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_929.f0.csv -m /path_to/FCN-f0/models/FCN_929/model.json -w /path_to/FCN-f0/models/FCN_929/weights.h5 --use_single_core --verbose --plot

or

python /path_to/FCN-f0/prediction.py -i /path_to/test.wav -o /path_to/test-FCN_929-no_json.f0.csv -w /path_to/FCN-f0/models/FCN_929/weights.h5 -is 929 --use_single_core --verbose --plot

References

[1] Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello. "CREPE: A Convolutional Representation for Pitch Estimation", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FCN-f0

Description

Example command-line usage (using provided pretrained models)

model FCN-1953

model FCN-929

References

About

Releases

Packages

Languages

License

GrantL10/FCN-f0

Folders and files

Latest commit

History

Repository files navigation

FCN-f0

Description

Example command-line usage (using provided pretrained models)

model FCN-1953

model FCN-929

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages