-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for transcription (up to 20% faster transcription on CPU) #2516
base: main
Are you sure you want to change the base?
Conversation
30abb70
to
d3f9b82
Compare
Hi @ccoenen, sorry for the @. Would it be possible to get a review / feedback on this PR? Thank you |
Hi, I think I was tagged by mistake? I'm not part of this project. |
Ah apologies yes that's the wrong username, sorry about that |
Hi @jongwook , sorry for the @. Would it be possible to get a review / feedback on this PR? Thank you |
whisper/model.py
Outdated
|
||
# Optimisation: Apply the precomputed CUDA mask if available. | ||
if torch.cuda.is_available(): | ||
mask = self.mask_cuda[:n_token, :n_token] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some code formatting issues. You can check the flake8/black options in:
https://github.com/openai/whisper/blob/main/.pre-commit-config.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix that, thanks for the feedback :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanheise thanks again for the review, I think my latest changes should have fixed those code formatting issues, would you mind taking another look? Thank you :)
Implements a suite of optimizations focusing on memory efficiency, tensor initialization, and model loading functionality. These changes improve performance, code clarity, and model handling flexibility in the Whisper ASR system that improves transcription speed by up to 20%.
These comprehensive changes optimize memory usage, enhance code quality, and improve model loading reliability while maintaining functional equivalence.
Changes:
Gradient Checkpointing Implementation:
Tensor initialization:
Model loading:
Before:
After:
Impact:
Testing:
python3 -m pytest --durations=0 -vv -k 'not test_transcribe or test_transcribe[tiny] or test_transcribe[tiny.en]' -m 'not requires_cuda'