-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to carry initial_prompt with the sliding window #2343
Conversation
There are outstanding issues with this PR:
Closing this PR since I can't find a way to move it to draft. |
|
Also a relevant discussion here: #1040 (comment)
It's part of the model dimensions itself, actually 448 tokens total, and half that for the prompt. The logic is in decoding.py if you look for |
Add an option `carry_initial_prompt = False` to `whisper.transcribe()`. When set to `True`, `initial_prompt` is prepended to each internal `decode()` call's `prompt`. If there is not enough context space at the start of the prompt, the prompt is left-sliced to make space.
@ryanheise Thank you for your input; it was helpful. Do you mind providing any additional feedback? Aside: I did find the left-slice in the code, and it turns out that the docs are wrong, as actually the maximum prompt length is Confirming with the >>> medium = torch.load('/home/kittsil/.cache/whisper/medium.en.pt')
>>> medium['dims']
{'n_mels': 80, 'n_vocab': 51864, 'n_audio_ctx': 1500, 'n_audio_state': 1024, 'n_audio_head': 16, 'n_audio_layer': 24, 'n_text_ctx': 448, 'n_text_state': 1024, 'n_text_head': 16, 'n_text_layer': 24}
>>> medium['dims']['n_text_ctx'] // 2 - 1
223 |
hello if i locally merge this what do i add command to prevent whisper losing punctuation during transcription? can you also update here so i can directly install it : https://github.com/kittsil/whisper/tree/patch-1 |
why this very important feature is still not merged @jongwook ? |
@kittsil i use CLI so adding --carry_initial_prompt will work right? |
@FurkanGozukara, that's an issue with In general, though, I wouldn't comment on a PR for debugging help; it's best to keep PRs focused on the request / review process. |
@kittsil thank you so much your PR saved me so much I transcribed this 3 hours video and without your PR I would be devastated because YouTube auto timing also failed :D |
Background
Whisper's
transcribe()
struggles with contextual proper nouns if they appear after the initial prompt has been consumed; see some experimental results here. This solves that issue by allowing the initial "context" prompt to be carried as the sliding window moves through the audio.Changes
Add an option
carry_initial_prompt = False
towhisper.transcribe()
.When
carry_initial_prompt
is set toTrue
,initial_prompt
is prepended to each internaldecode()
call'sprompt
. If there is not enough context space at the start of the prompt, the prompt is left-sliced to make space.