Skip to content

Commit

Permalink
init commit
Browse files Browse the repository at this point in the history
  • Loading branch information
m-bain committed Dec 14, 2022
1 parent 9791862 commit 9f6fa61
Show file tree
Hide file tree
Showing 38 changed files with 105,726 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
whisperx.egg-info/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 Max Bain

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include whisperx/assets/*
include whisperx/assets/gpt2/*
include whisperx/assets/multilingual/*
include whisperx/normalizers/english.json
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,47 @@
# whisperX
WhisperX: Time-Accurate Automatic Speech Recognition.
<h1 align="center">WhisperX</h1>

<p align="center">Whisper Automatic Speech Recognition with improved timestamp accuracy using forced alignment.

</p>


<h2 align="center">What is it</h2>

This repository refines the timestamps of openAI's Whisper model via forced aligment with phoneme-level ASR models (e.g. wav2vec2)


**Whisper** is an Automatic Speech Recognition model [developed by OpenAI](https://github.com/openai/whisper), trained on a large dataset of diverse audio. Whilst it does produces highly accurate transcriptions, the corresponding timestamps are at the utterance-level, not per word, and can be inaccurate by several seconds.

**Forced Alignment** refers to the process by which orthographic transcriptions are aligned to audio recordings to automatically generate phone level segmentation.

<h2 align="center">Setup</h2>
Install this package using

`pip install git+https://github.com/m-bain/whisperx.git`

You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.

<h2 align="center">Example</h2>

Run whisper on example segment (using default params)

`whisperx examples/sample01.wav --model medium.en --output examples/whisperx --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --align_extend 2`

Outputs both word-level, and phrase level.

<h2 align="center">Limitations</h2>

- Hacked this up quite quickly, there might be some errors, please raise an issue if you encounter any.
- Currently only working and tested for ENGLISH language.
- Whisper normalises spoken numbers e.g. "fifty seven" to arabic numerals "57". Need to perform this normalization after alignment, so the phonemes can be aligned. Currently just ignores numbers.
- Assumes the initial whisper timestamps are accurate to some degree (within margin of 2 seconds, adjust if needed -- bigger margins more prone to alignment errors)

<h2 align="center">Contact</h2>

Contact maxbain[at]robots.ox.ac.uk if you are using this at scale.

<h2 align="center">Acknowledgements</h2>

-OpenAI's whisper https://github.com/openai/whisper

-PyTorch forced alignment tutorial https://pytorch.org/tutorials/intermediate/forced_alignment_with_torchaudio_tutorial.html
Binary file added examples/sample01.wav
Binary file not shown.
140 changes: 140 additions & 0 deletions examples/whisper/sample01.wav.srt
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
1
00:00:00,000 --> 00:00:03,000
Bella, Gloria, love.

2
00:00:03,000 --> 00:00:04,000
Oh.

3
00:00:04,000 --> 00:00:05,000
How are you?

4
00:00:05,000 --> 00:00:07,000
Oh, I'm OK.

5
00:00:07,000 --> 00:00:08,000
I will be.

6
00:00:08,000 --> 00:00:09,000
I said she could stay with us tomorrow

7
00:00:09,000 --> 00:00:10,000
just until she feels better.

8
00:00:10,000 --> 00:00:11,000
Yeah.

9
00:00:11,000 --> 00:00:12,000
Of course she can.

10
00:00:12,000 --> 00:00:14,000
No, things won't be for long.

11
00:00:14,000 --> 00:00:16,000
Well, you can stay as long as you want, my love.

12
00:00:16,000 --> 00:00:18,000
I've really missed you.

13
00:00:18,000 --> 00:00:19,000
Pops.

14
00:00:19,000 --> 00:00:20,000
Great to see you, love.

15
00:00:20,000 --> 00:00:22,000
Oh.

16
00:00:22,000 --> 00:00:23,000
All right, shall we get you off to bed then?

17
00:00:23,000 --> 00:00:25,000
You should have given me some warm.

18
00:00:25,000 --> 00:00:26,000
I know.

19
00:00:26,000 --> 00:00:27,000
I'll have to put the electric blanket on.

20
00:00:27,000 --> 00:00:28,000
I'm sorry.

21
00:00:28,000 --> 00:00:29,000
All right, Bella.

22
00:00:29,000 --> 00:00:31,000
Freezing up there.

23
00:00:31,000 --> 00:00:34,000
In a bedroom, Peter unpacks her suitcase.

24
00:00:34,000 --> 00:00:38,000
The middle-aged woman opens her green case.

25
00:00:38,000 --> 00:00:39,000
Do you want your PJs?

26
00:00:39,000 --> 00:00:40,000
Yeah.

27
00:00:40,000 --> 00:00:42,000
Yeah.

28
00:00:42,000 --> 00:00:45,000
Lifting a bundle of pajamas, Peter finds a sheet of paper

29
00:00:45,000 --> 00:00:50,000
labeled Lancaster North Hospital discharge sheet.

30
00:00:50,000 --> 00:00:52,000
He closes the suitcase and brings Gloria the pajamas.

31
00:00:52,000 --> 00:00:54,000
There you go.

32
00:00:54,000 --> 00:00:55,000
Thank you.

33
00:00:55,000 --> 00:00:57,000
He picks up the locket.

34
00:00:57,000 --> 00:00:59,000
He kept it.

35
00:00:59,000 --> 00:01:28,000
Oh, cool.

140 changes: 140 additions & 0 deletions examples/whisperx/sample01.wav.srt
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
1
00:00:01,185 --> 00:00:03,273
Bella, Gloria, love.

2
00:00:03,754 --> 00:00:03,855
Oh.

3
00:00:04,496 --> 00:00:06,219
How are you?

4
00:00:06,723 --> 00:00:07,126
Oh, I'm OK.

5
00:00:08,412 --> 00:00:08,915
I will be.

6
00:00:09,215 --> 00:00:10,439
I said she could stay with us tomorrow

7
00:00:10,459 --> 00:00:11,351
just until she feels better.

8
00:00:11,733 --> 00:00:11,954
Yeah.

9
00:00:12,095 --> 00:00:13,238
Of course she can.

10
00:00:13,359 --> 00:00:15,012
No, things won't be for long.

11
00:00:15,173 --> 00:00:17,338
Well, you can stay as long as you want, my love.

12
00:00:17,621 --> 00:00:18,810
I've really missed you.

13
00:00:19,493 --> 00:00:19,795
Pops.

14
00:00:20,396 --> 00:00:21,679
Great to see you, love.

15
00:00:21,901 --> 00:00:23,213
Oh.

16
00:00:23,233 --> 00:00:24,378
All right, shall we get you off to bed then?

17
00:00:24,579 --> 00:00:26,052
You should have given me some warm.

18
00:00:26,313 --> 00:00:26,494
I know.

19
00:00:26,614 --> 00:00:28,940
I'll have to put the electric blanket on.

20
00:00:29,490 --> 00:00:29,817
I'm sorry.

21
00:00:29,980 --> 00:00:30,633
All right, Bella.

22
00:00:31,375 --> 00:00:31,897
Freezing up there.

23
00:00:31,897 --> 00:00:33,647
In a bedroom, Peter unpacks her suitcase.

24
00:00:34,268 --> 00:00:36,533
The middle-aged woman opens her green case.

25
00:00:38,095 --> 00:00:39,297
Do you want your PJs?

26
00:00:39,862 --> 00:00:40,185
Yeah.

27
00:00:42,394 --> 00:00:42,474
Yeah.

28
00:00:42,474 --> 00:00:45,418
Lifting a bundle of pajamas, Peter finds a sheet of paper

29
00:00:45,538 --> 00:00:49,251
labeled Lancaster North Hospital discharge sheet.

30
00:00:50,293 --> 00:00:52,858
He closes the suitcase and brings Gloria the pajamas.

31
00:00:54,187 --> 00:00:54,832
There you go.

32
00:00:55,655 --> 00:00:55,896
Thank you.

33
00:00:55,916 --> 00:00:56,742
He picks up the locket.

34
00:00:57,124 --> 00:00:57,627
He kept it.

35
00:00:58,874 --> 00:00:59,899
Oh, cool.

Loading

0 comments on commit 9f6fa61

Please sign in to comment.