New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Help with fine tuned accent #133

Closed

GUUser91 opened this issue Feb 20, 2025 · 0 comments

GUUser91 commented Feb 20, 2025 •

edited

Loading

I'm trying to train a model with a fine tuned british accent. I could only gather about 3 minutes and 20 seconds of audio for the dataset. Here's the input file.
https://vocaroo.com/1gY2BK8MTiTF
Here's the reference file.
https://vocaroo.com/1bTycJauDp9i
Output file from a pretrained model
https://vocaroo.com/1cdag3gy5958
Output file from a fine tuned model
https://vocaroo.com/1fvqjU7YBGrI

Edit: Nevermind I got the fine tuned model to work by following the instructions from this link
#131 (comment)

In the config_dit_mel_seed_uvit_whisper_small_wavenet.yml config file, I changed the in_channels info from 768 to 1280

Then I replace the whisper-small info with whisper-large-v3-turbo. I suggest using whisper-large-v3 if you're having trouble.

Here is a output files with the improved fine tuned model
https://vocaroo.com/17qPajlqbHnK

GUUser91 closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment