Pod-Helper is an advanced audio processing tool that goes beyond transcribing at lightning speed. It also offers audio repair capabilities using the MLM (Masked Language Model) objective to ensure your content maintains its quality and vibe.
- ⚡ Real-time audio transcription with TRT-LLM optimized Whisper model.
- 🛠️ Audio corruption repair via good old Roberta.
- ✨ Sentiment analysis to gauge the mood of the content.
See demo of real-time ASR running locally on consumer hardware with only 2.5GB of VRAM, or click here to watch the video.
# Clone the repository
git clone
cd pod-helper
- Install TensorRT-LLM for Windows from tensorrt-llm-windows, after that.
# Install requirements
pip install -r requirements.txt
Below we show to run main model behind this project whisper model in TensorRT-LLM on a single GPU, note that you can also run it on multiple GPUs but you need to rebuild the engine with the correct flags for that. See Optional: Re-Build TensorRT engine(s) section for more details.
# Launch the Gradio interface
python3 app.py
Pod-Helper utilizes the TensorRT-LLM Whisper example code, primarily from examples/whisper
.
Key components include:
run.py
: Performs inference on WAV file(s) using the built TensorRT engines.app.py
: Provides a Gradio interface for microphone input or file upload, utilizingrun.py
modules.
Click to expand
You can either use the pre-converted models located in the tinyrt
folder or download the Whisper checkpoint models from here.
wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/multilingual.tiktoken
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
wget --directory-prefix=assets https://raw.githubusercontent.com/yuekaizhang/Triton-ASR-Client/main/datasets/mini_en/wav/1221-135766-0002.wav
# tiny model
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
TensorRT-LLM Whisper builds TensorRT engine(s) from the pytorch checkpoint, and saves the engine(s) to the specified directory. Skip this step if you are using the pre-converted models.
# install requirements first
pip install -r requirements.txt
# Build the tiny model using a single GPU with plugins.
python3 build.py --output_dir tinyrt --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin
# Build the tiny model using a single GPU with plugins without layernorm
python3 build.py --output_dir tinyrt_no_layernorm --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin
# Build the tiny model using a single GPU with quantization
python3 build.py --output_dir tinyrt_weight_only --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --use_weight_only
This project is a submission for the NVIDIA RTX PCs Developer Contest, under the General Generative AI Projects category. Pod-Helper showcases the potential of generative AI in transforming audio content creation and processing.
Category: General Generative AI Projects category
Tested on following system:
- Operating System: Windows 10
- Version: 22H2
- OS Build: 19045.3930
- TensorRT-LLM version: 0.7.1
- CUDA version: 12.4
- cuDNN version: 8.9.7.29
- GPU: NVIDIA RTX A1000
- Driver version: 551.23
- DataType: FP16
- Python version: 3.10.11
- PyTorch version: 2.1.0+cu121
Confirmation of entry:
Great to see the entries coming in for our #DevContest.
— NVIDIA AI Developer (@NVIDIAAIDev) February 9, 2024
Thank you @Muhtasham9 for this cool demo. https://t.co/6uvKC5NXwO
- Add support for real-time Automatic Speech Recognition (ASR).
- Add support for more audio formats, install ffmpeg.
- Port BERT model for MLM and Sentiment Analysis to TensorRT-LLM, to support real-time audio repair and sentiment analysis. Waiting for this issue to be resolved: BERT model for MLM and Sentiment Analysis
- Add support for more audio repair capabilities.
- Add support for more languages.
Contributions to enhance and expand this project are welcome. Please see the CONTRIBUTING.md
file for guidelines on how to contribute.
This project is licensed under the MIT License.