✨ SuperVoice VoiceBox

Feel free to join my Discord Server to discuss this model!

An independent VoiceBox implementation for voice synthesis. Currently in BETA.

Features

⚡️ Narural sounding
🎤 High quality - 24khz audio
🤹‍♂️ Versatile - synthesiszed voice has high variability
📕 Currently only English language is supported, but nothing stops us from adding more languages.

Samples

sample_1.mp4

sample_2.mp4

sample_3.mp4

sample_4.mp4

How to use

Supervoice consists of three networks: gpt for phoneme and prosogy generation, audio model for audio synthesis and vocoder for audio generation. Supervoice is published using Torch Hub, so you can use it as follows:

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Vocoder
vocoder = torch.hub.load(repo_or_dir='ex3ndr/supervoice-vocoder', model='bigvsan')
vocoder.to(device)
vocoder.eval()

# GPT Model
gpt = torch.hub.load(repo_or_dir='ex3ndr/supervoice-gpt', model='phonemizer')
gpt.to(device)
gpt.eval()

# Main Model
model = torch.hub.load(repo_or_dir='ex3ndr/supervoice-voicebox', model='phonemizer', gpt=gpt, vocoder=vocoder)
model.to(device)
model.eval()

# Generate audio
# Supervoice has three example voices: "voice_1", "voice_2" (my favorite), "voice_3"
# You can also remove the voice parameter to use the random one, or provide your own, but you need a TextGrid alignment for that.
# Steps means quality of the audio, recommended value is 4, 8 or 32.
# Alpha is a parameter of randomness, it should be less than 1.0, stable synthesis with small variaons is 0.1, 0.3 is a good value for more expressive synthesis, 0.5 is a maximum recommended value.
output = model.synthesize("What time is it, Steve?", voice = "voice_1", steps = 8, alpha = 0.1)

# Output of melspec
melspec = output['melspec']

# Output 1D tensor of 24000khz audio (missing if vocoder is not provided)
waveform = output['wav']

# Play audio in notebook
display(Audio(data=waveform, rate=24000))

License

MIT

Name	Name	Last commit message	Last commit date
Latest commit ex3ndr ref: remove old common voice datasets Aug 2, 2024 2840683 · Aug 2, 2024 History 68 Commits
eval	eval	feat: better support for conditioning, ability to create voices from …	Mar 21, 2024
samples	samples	feat: add samples	Mar 21, 2024
supervoice	supervoice	feat: better support for conditioning, ability to create voices from …	Mar 21, 2024
utils	utils	ref: reduce lower value of spectogram, add esd dataset for evaluation…	Mar 10, 2024
voices	voices	feat: better support for conditioning, ability to create voices from …	Mar 21, 2024
.gitignore	.gitignore	feat: better support for conditioning, ability to create voices from …	Mar 21, 2024
README.md	README.md	Update README.md	Aug 2, 2024
datasets.yaml	datasets.yaml	ref: remove old common voice datasets	Aug 2, 2024
datasets_align.sh	datasets_align.sh	ref: improve dataset preprocessing, add evaluation datasets	Feb 15, 2024
datasets_index.py	datasets_index.py	wip: working on durations	Feb 21, 2024
datasets_prepare.py	datasets_prepare.py	ref: improve CFG, normalize f0, better resampling parameters	Mar 6, 2024
datasets_stats.py	datasets_stats.py	ref: improve CFG, normalize f0, better resampling parameters	Mar 6, 2024
eval.ipynb	eval.ipynb	feat: update eval	Mar 21, 2024
eval_audio.ipynb	eval_audio.ipynb	feat: add librilight	Mar 21, 2024
eval_dataset.ipynb	eval_dataset.ipynb	ref: remove vocoder, improve quantisizing alignments, add begin and e…	Feb 14, 2024
generate_voices.py	generate_voices.py	feat: better support for conditioning, ability to create voices from …	Mar 21, 2024
hubconf.py	hubconf.py	ref: use my hosting for models	Mar 21, 2024
train.py	train.py	fix: fix learning rate	Jul 17, 2024
train.sh	train.sh	feat: pitch predictor	Mar 2, 2024
train_tokenizer.py	train_tokenizer.py	feat: bigger corpus - add all known phonemes, add ru/uk languages, pr…	Feb 12, 2024
workbook.ipynb	workbook.ipynb	feat: add librilight	Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ SuperVoice VoiceBox

Features

Samples

How to use

License

About

Releases

Packages

Languages

ex3ndr/supervoice-voicebox

Folders and files

Latest commit

History

Repository files navigation

✨ SuperVoice VoiceBox

Features

Samples

How to use

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages