Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with fine tuned accent #133

Closed
GUUser91 opened this issue Feb 20, 2025 · 0 comments
Closed

Help with fine tuned accent #133

GUUser91 opened this issue Feb 20, 2025 · 0 comments

Comments

@GUUser91
Copy link

GUUser91 commented Feb 20, 2025

I'm trying to train a model with a fine tuned british accent. I could only gather about 3 minutes and 20 seconds of audio for the dataset. Here's the input file.
https://vocaroo.com/1gY2BK8MTiTF
Here's the reference file.
https://vocaroo.com/1bTycJauDp9i
Output file from a pretrained model
https://vocaroo.com/1cdag3gy5958
Output file from a fine tuned model
https://vocaroo.com/1fvqjU7YBGrI

Edit: Nevermind I got the fine tuned model to work by following the instructions from this link
#131 (comment)

In the config_dit_mel_seed_uvit_whisper_small_wavenet.yml config file, I changed the in_channels info from 768 to 1280

Image

Image

Then I replace the whisper-small info with whisper-large-v3-turbo. I suggest using whisper-large-v3 if you're having trouble.

Image

Image

Here is a output files with the improved fine tuned model
https://vocaroo.com/17qPajlqbHnK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant