✨ Supervoice VALL-E 2

Feel free to join my Discord Server to discuss this model!

An independent VALL-E 2 reproduction for voice synthesis with voice cloning.

supervoice_valle.mp4

Features

⚡️ Narural sounding and voice cloning on human level
🎤 High quality - 24khz audio
🤹‍♂️ Versatile - synthesiszed voice has high variability
📕 Currently only English language is supported, but nothing stops us from adding more languages.

Tips and tricks

Network can follow voices, but they better to be in-domain and from librilight, libritts and from others similar sources

Architecture

Repdorduction tries to follow papers as close as possible, but some minor changes include

Linear annielation replaced with cosine one
Not implemented codec grouping
No padding masking used during training, since it would train 5 times slower using flash attention

How to use

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model
model = torch.hub.load(repo_or_dir='ex3ndr/supervoice-vall-e-2', model='supervoice')
model = model.to(device)

# Synthesize
in_voice_1 = model.synthesize("voice_1", "What time is it, Steve?", top_p = 0.2).cpu()
in_voice_2 = model.synthesize("voice_2", "What time is it, Steve?", top_p = 0.2).cpu()

# Experimental voices
in_emo_1 = model.synthesize("emo_1", "What time is it, Steve?", top_p = 0.2).cpu()
in_emo_2 = model.synthesize("emo_2", "What time is it, Steve?", top_p = 0.2).cpu()

License

MIT

Name	Name	Last commit message	Last commit date
Latest commit ex3ndr Update README.md Jul 14, 2024 6f585bb · Jul 14, 2024 History 39 Commits
docs	docs	wip: working on readme	Jul 13, 2024
eval	eval	feat: add frontend	Jul 14, 2024
supervoice_valle	supervoice_valle	feat: update voices	Jul 14, 2024
train	train	fix: fix missing casual mask	Jul 11, 2024
voices	voices	ref: improve voices	Jul 14, 2024
.gitignore	.gitignore	wip: working on better attention	Jul 10, 2024
README.md	README.md	Update README.md	Jul 14, 2024
__init__.py	__init__.py	wip: working on implementation	Jun 21, 2024
attention.ipynb	attention.ipynb	feat: add attention notebook	Jul 10, 2024
benchmark.py	benchmark.py	wip: working on benchmark	Jul 2, 2024
benchmark.sh	benchmark.sh	feat: add dedicated benchmark	Jul 1, 2024
datasets.yaml	datasets.yaml	wip: working on implementation	Jun 21, 2024
eval.ipynb	eval.ipynb	feat: update voices	Jul 14, 2024
generate_voices.py	generate_voices.py	feat: update voices	Jul 14, 2024
hubconf.py	hubconf.py	fix: fix hubconf	Jul 14, 2024
mkbhd.m4a	mkbhd.m4a	wip: working on better attention	Jul 10, 2024
tokenizer_text.model	tokenizer_text.model	wip: working on implementation	Jun 21, 2024
tokenizer_text.vocab	tokenizer_text.vocab	wip: working on implementation	Jun 21, 2024
train.py	train.py	fix: reduce grad accumulation	Jul 11, 2024
train.sh	train.sh	wip: trying to enable schedule free	Jul 1, 2024
train_ar.py	train_ar.py	fix: fix missing casual mask	Jul 11, 2024
train_ar.sh	train_ar.sh	feat: add AR model	Jul 11, 2024
train_tokenizer.py	train_tokenizer.py	wip: working on better attention	Jul 10, 2024
welcome.ipynb	welcome.ipynb	ref: improve voices	Jul 14, 2024
welcome2.ipynb	welcome2.ipynb	fix: fix missing casual mask	Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Supervoice VALL-E 2

Features

Tips and tricks

Architecture

How to use

License

About

Releases

Packages

Languages

ex3ndr/supervoice-vall-e-2

Folders and files

Latest commit

History

Repository files navigation

✨ Supervoice VALL-E 2

Features

Tips and tricks

Architecture

How to use

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages