Merge pull request #3405 from coqui-ai/studio_speakers

Add studio speakers to open source XTTS!
coqui-ai · Dec 12, 2023 · 8c1a8b5 · 8c1a8b5
2 parents 934b87b + 8e6a7cb
commit 8c1a8b5
Show file tree

Hide file tree

Showing 18 changed files with 182 additions and 895 deletions.
diff --git a/.github/workflows/api_tests.yml b/.github/workflows/api_tests.yml
diff --git a/.github/workflows/zoo_tests_tortoise.yml b/.github/workflows/zoo_tests_tortoise.yml
diff --git a/Makefile b/Makefile
@@ -35,9 +35,6 @@ test_zoo:	## run zoo tests.
 inference_tests: ## run inference tests.
 	nose2 -F -v -B --with-coverage --coverage TTS tests.inference_tests
 
-api_tests: ## run api tests.
-	nose2 -F -v -B --with-coverage --coverage TTS tests.api_tests
-
 data_tests: ## run data tests.
 	nose2 -F -v -B --with-coverage --coverage TTS tests.data_tests
 

diff --git a/README.md b/README.md
@@ -7,8 +7,6 @@
 - 📣 [🐶Bark](https://github.com/suno-ai/bark) is now available for inference with unconstrained voice cloning. [Docs](https://tts.readthedocs.io/en/dev/models/bark.html)
 - 📣 You can use [~1100 Fairseq models](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) with 🐸TTS.
 - 📣 🐸TTS now supports 🐢Tortoise with faster inference. [Docs](https://tts.readthedocs.io/en/dev/models/tortoise.html)
-- 📣 **Coqui Studio API** is landed on 🐸TTS. - [Example](https://github.com/coqui-ai/TTS/blob/dev/README.md#-python-api)
-- 📣 [**Coqui Studio API**](https://docs.coqui.ai/docs) is live.
 - 📣 Voice generation with prompts - **Prompt to Voice** - is live on [**Coqui Studio**](https://app.coqui.ai/auth/signin)!! - [Blog Post](https://coqui.ai/blog/tts/prompt-to-voice)
 - 📣 Voice generation with fusion - **Voice fusion** - is live on [**Coqui Studio**](https://app.coqui.ai/auth/signin).
 - 📣 Voice cloning is live on [**Coqui Studio**](https://app.coqui.ai/auth/signin).
@@ -253,29 +251,6 @@ tts.tts_with_vc_to_file(
 )
 ```
 
-#### Example using [🐸Coqui Studio](https://coqui.ai) voices.
-You access all of your cloned voices and built-in speakers in [🐸Coqui Studio](https://coqui.ai).
-To do this, you'll need an API token, which you can obtain from the [account page](https://coqui.ai/account).
-After obtaining the API token, you'll need to configure the COQUI_STUDIO_TOKEN environment variable.
-
-Once you have a valid API token in place, the studio speakers will be displayed as distinct models within the list.
-These models will follow the naming convention `coqui_studio/en/<studio_speaker_name>/coqui_studio`
-
-```python
-# XTTS model
-models = TTS(cs_api_model="XTTS").list_models()
-# Init TTS with the target studio speaker
-tts = TTS(model_name="coqui_studio/en/Torcull Diarmuid/coqui_studio", progress_bar=False)
-# Run TTS
-tts.tts_to_file(text="This is a test.", language="en", file_path=OUTPUT_PATH)
-
-# V1 model
-models = TTS(cs_api_model="V1").list_models()
-# Run TTS with emotion and speed control
-# Emotion control only works with V1 model
-tts.tts_to_file(text="This is a test.", file_path=OUTPUT_PATH, emotion="Happy", speed=1.5)
-```
-
 #### Example text to speech using **Fairseq models in ~1100 languages** 🤯.
 For Fairseq models, use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`.
 You can find the language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
@@ -351,12 +326,6 @@ If you don't specify any models, then it uses LJSpeech based English model.
   $ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay
   ```
 
-- Run TTS and define speed factor to use for 🐸Coqui Studio models, between 0.0 and 2.0:
-
-  ```
-  $ tts --text "Text for TTS" --model_name "coqui_studio/<language>/<dataset>/<model_name>" --speed 1.2 --out_path output/path/speech.wav
-  ```
-
 - Run a TTS model with its default vocoder model:
 
   ```

diff --git a/TTS/.models.json b/TTS/.models.json
@@ -3,12 +3,13 @@
         "multilingual": {
             "multi-dataset": {
                 "xtts_v2": {
-                    "description": "XTTS-v2.0.2 by Coqui with 16 languages.",
+                    "description": "XTTS-v2.0.3 by Coqui with 17 languages.",
                     "hf_url": [
                         "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/model.pth",
                         "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/config.json",
                         "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/vocab.json",
-                        "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/hash.md5"
+                        "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/hash.md5",
+                        "https://coqui.gateway.scarf.sh/hf-coqui/XTTS-v2/main/speakers_xtts.pth"
                     ],
                     "model_hash": "10f92b55c512af7a8d39d650547a15a7",
                     "default_vocoder": null,
@@ -45,7 +46,7 @@
                     "hf_url": [
                         "https://coqui.gateway.scarf.sh/hf/bark/coarse_2.pt",
                         "https://coqui.gateway.scarf.sh/hf/bark/fine_2.pt",
-                        "https://app.coqui.ai/tts_model/text_2.pt",
+                        "https://coqui.gateway.scarf.sh/hf/text_2.pt",
                         "https://coqui.gateway.scarf.sh/hf/bark/config.json",
                         "https://coqui.gateway.scarf.sh/hf/bark/hubert.pt",
                         "https://coqui.gateway.scarf.sh/hf/bark/tokenizer.pth"
@@ -270,7 +271,7 @@
                 "tortoise-v2": {
                     "description": "Tortoise tts model https://github.com/neonbjb/tortoise-tts",
                     "github_rls_url": [
-                        "https://app.coqui.ai/tts_model/autoregressive.pth",
+                        "https://coqui.gateway.scarf.sh/v0.14.1_models/autoregressive.pth",
                         "https://coqui.gateway.scarf.sh/v0.14.1_models/clvp2.pth",
                         "https://coqui.gateway.scarf.sh/v0.14.1_models/cvvp.pth",
                         "https://coqui.gateway.scarf.sh/v0.14.1_models/diffusion_decoder.pth",