fix: real time gui not work #123

RoversCode · 2025-02-08T15:53:33Z

This PR addresses two issues and adds a pitch-shifting feature:

Fixed the issue where infer_wav_res produces meaningless noise
The real-time self.resampler conversion was corrupting the audio output. While the root cause is still under investigation, librosa resampling has been implemented as a working solution.
2 . Fixed the silent output issue
When block time is too large, the vad chunk size becomes excessive, causing infinite loops in 'if not self.vad_speech_detected' condition, resulting in silent output.
Added pitch-shifting functionality
Implemented pitch up/down features for input audio.

Plachtaa · 2025-02-11T03:17:40Z

Hi there, thanks for your hard work!
For 1. , I question your opinion on "The real-time self.resampler conversion was corrupting the audio output", reason for producing meaning less noise is most likely to be the case in #125 , please let me know if this is not your case.
For 2. , This is a good bonus feature, but I cannot run your script as there are multiple model checkpoint loading issues. If you agree, I will make some changes on your script before merging your PR.

Thanks

RoversCode · 2025-02-11T12:59:18Z

Thank you for your feedback!

Regarding point 1, through debugging, I've located that the issue is indeed with self.resampler - it converts valid speech signals into silence. I believe issue #125 is likely caused by the same problem, as I observe intermittent audio anomalies in real-time speech output even without making any modifications.

Regarding point 2, feel free to modify the code as needed, since I only added this feature in a simple way without much consideration for code optimization and performance.

I mainly wanted to report these two bugs. Specifically about the self.resampler issue, I'm also puzzled about the root cause. I've reviewed the code and found it's just using the basic torchaudio.transforms.Resample implementation. While strange, I can confirm the problem is definitely with this component

RoversCode added 4 commits February 8, 2025 23:46

fix: real time not work, 1 self.resampler 2 vad chunk size too big

ed1b4f3

refactor: optim ckpt load

4bd1861

feat: win ui

edb8127

fix: one key start

d8df3f7

fix: real-time-gui

51a76b2

fix float

0cde622

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: real time gui not work #123

fix: real time gui not work #123

RoversCode commented Feb 8, 2025

Plachtaa commented Feb 11, 2025 •

edited

Loading

RoversCode commented Feb 11, 2025

fix: real time gui not work #123

Are you sure you want to change the base?

fix: real time gui not work #123

Conversation

RoversCode commented Feb 8, 2025

Plachtaa commented Feb 11, 2025 • edited Loading

RoversCode commented Feb 11, 2025

Plachtaa commented Feb 11, 2025 •

edited

Loading