-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize Binarize() performance when onset == offset
#1721
base: develop
Are you sure you want to change the base?
Conversation
c6d73b0
to
dc6b1a8
Compare
dc6b1a8
to
3f1a482
Compare
Fixed an off-by-one error in the new method. I also made a google colab notebook showcasing the improvements: https://colab.research.google.com/drive/1Me3GgQUPXxjuEn06DNVco_GIxlUoYPTE?usp=sharing In summary, the new method has a slight speed up for fully synthetic data, a 2x speedup for discrete (0s and 1s) synthetic data, and an almost 100x speedup for real data in the The notebook also lets you extend the real data sample to however many hours is desired under the
EDIT: I realized that I did not test this with various offsets and onsets when initializing the |
5666ab8
to
afef580
Compare
20395b9
to
c3f1df8
Compare
I've updated this patch to only use the optimized method when The speed improvements appear in the speaker-diarization pipeline because the I'm regularly processing audio 5+ hours, so the speed improvements here are beneficial for me. I've been regularly running this patch and have not noticed any issues, either. With shorter audio lengths, the improvements are trivial. This patch is ready for review, but I understand if you don't want to merge since the benefits are only seen in cases of long audio files, and I don't know how many people are processing 5+ hour long audio samples. If others are experiencing this bottleneck, I'd be interested in hearing. |
onset == offset
c3f1df8
to
649c060
Compare
* let the user decide how to rename tracks, if necessary * reduces a costly step for long audios
While processing long audios in the
SpeakerDiarization
pipeline, I noticed that theto_annotation()
method was taking a while, and I tracked it down topyannote.audio.utils.signal.Binarize.__call__()
where it was looping over a numpy array which could end up being quite large.In my tests, the original implementation took about 60 seconds for a 9 hour audio. With this new implementation, it takes about 0.5 seconds.
I've only tested this with the SpeakerDiarization pipeline, but the new implementation returns the same results as the original.