Skip to content

Commit 5413969

Browse files
author
Mark
committed
add: refactored the code + saving endpoints in a json file + transferring of smaller sections into individual functions + enum for voices
fix: Error #2: Special characters like emojis will not throw an error if sound is generated from txt file
1 parent 06adef8 commit 5413969

File tree

8 files changed

+288
-230
lines changed

8 files changed

+288
-230
lines changed

README.md

+53-23
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,11 @@ This is a simple Python program that gives you an `.mp3` file including the give
55
I thank all people that use this for their project. I love to contribute to the community. However, please credit me by using the GitHub project link.
66

77
## Usage
8-
98
To use this program, you need an internet connection, python 3.6+ and all of the required packages installed.
109
To install the required packages, run: `pip3 install -r requirements.txt`
1110

1211
### Create audio from file
13-
1. Make sure you have your text in plaintext. You can name it anything
12+
1. Make sure you have your text in the file plaintext.
1413
2. Run `py main.py -txt FILENAME.txt -v VOICENAME` (see voices below)
1514

1615
Only latin characters are supported.
@@ -21,27 +20,58 @@ Only latin characters are supported.
2120
You can have non-latin characters (as long as it has a TTS supported voice).
2221

2322
### Create audio in python script
24-
1. Put the file `tiktokvoice.py` into your directory.
25-
2. Import the text-to-speech function with `from tiktokvoice import tts`.
26-
3. Execute `tts(TEXT, VOICENAME, OUTPUTFILENAME, PLAYSOUND)` in your code.
27-
28-
I provided an [example script](https://github.com/GiorDior/TikTok-Voice-TTS/blob/main/examplescript.py) which shows how the tts function could be used in a script.
29-
30-
## Supported languages:
31-
List of every voice and its designation: [voices](https://github.com/oscie57/tiktok-voice/wiki/Voice-Codes)
32-
33-
- Portuguese (Brazil)
34-
- German
35-
- English (Australia)
36-
- English (United Kingdom)
37-
- English (United States)
38-
- English (Disney)
39-
- Spanish
40-
- Spanish (Mexico)
41-
- French
42-
- Indonesian
43-
- Japanese
44-
- Korean
23+
1. Put the file folder/package `tiktok_voice` into your directory.
24+
2. Import the text-to-speech function and the voices with `from tiktok_voice import tts, Voice`.
25+
3. Execute `tts(TEXT, VOICE, OUTPUTFILENAME, PLAYSOUND)` in your code.
26+
27+
I provided an [example script](https://github.com/GiorDior/TikTok-Voice-TTS/blob/main/example_script.py) which shows how the tts function could be used in a script.
28+
29+
## Voices
30+
List of every voice and its designation:
31+
32+
| Name |
33+
| -------------------- |
34+
| GHOSTFACE |
35+
| CHEWBACCA |
36+
| C3PO |
37+
| STITCH |
38+
| STORMTROOPER |
39+
| ROCKET |
40+
| EN_AU_FEMALE_1 |
41+
| EN_AU_MALE_1 |
42+
| EN_UK_MALE_1 |
43+
| EN_UK_MALE_2 |
44+
| EN_US_FEMALE_1 |
45+
| EN_US_FEMALE_2 |
46+
| EN_US_MALE_1 |
47+
| EN_US_MALE_2 |
48+
| EN_US_MALE_3 |
49+
| EN_US_MALE_4 |
50+
| FR_MALE_1 |
51+
| FR_MALE_2 |
52+
| DE_FEMALE |
53+
| DE_MALE |
54+
| ES_MALE |
55+
| ES_MX_MALE |
56+
| BR_FEMALE_1 |
57+
| BR_FEMALE_2 |
58+
| BR_FEMALE_3 |
59+
| BR_MALE |
60+
| ID_FEMALE |
61+
| JP_FEMALE_1 |
62+
| JP_FEMALE_2 |
63+
| JP_FEMALE_3 |
64+
| JP_MALE |
65+
| KR_MALE_1 |
66+
| KR_FEMALE |
67+
| KR_MALE_2 |
68+
| EN_FEMALE_ALTO |
69+
| EN_MALE_TENOR |
70+
| EN_FEMALE_WARMY_BREEZE |
71+
| EN_MALE_SUNSHINE_SOON |
72+
| EN_MALE_NARRATION |
73+
| EN_MALE_FUNNY |
74+
| EN_FEMALE_EMOTIONAL |
4575

4676
## Samples
4777

examplescript.py example_script.py

+10-11
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
from tiktokvoice import tts
2-
3-
text = 'Tangerines are smaller and less rounded than the oranges. The taste is considered less sour, as well as sweeter and stronger, than that of an orange. A ripe tangerine is firm to slightly soft, and pebbly-skinned with no deep grooves, as well as orange in color. The peel is thin, with little bitter white mesocarp. All of these traits are shared by mandarins generally. Peak tangerine season lasts from autumn to spring. Tangerines are most commonly peeled and eaten by hand. The fresh fruit is also used in salads, desserts and main dishes. The peel is used fresh or dried as a spice or zest for baking and drinks. Fresh tangerine juice and frozen juice concentrate are commonly available in the United States.'
4-
voice = "en_us_006"
5-
6-
# arguments:
7-
# - input text
8-
# - vocie which is used for the audio
9-
# - output file name
10-
# - play sound after generating the audio
11-
tts(text, voice, "output.mp3", play_sound=True)
1+
from tiktok_voice import tts, Voice
2+
3+
text = 'Tangerines are smaller and less rounded than the oranges. The taste is considered less sour, as well as sweeter and stronger, than that of an orange. A ripe tangerine is firm to slightly soft, and pebbly-skinned with no deep grooves, as well as orange in color. The peel is thin, with little bitter white mesocarp. All of these traits are shared by mandarins generally. Peak tangerine season lasts from autumn to spring. Tangerines are most commonly peeled and eaten by hand. The fresh fruit is also used in salads, desserts and main dishes. The peel is used fresh or dried as a spice or zest for baking and drinks. Fresh tangerine juice and frozen juice concentrate are commonly available in the United States.'
4+
5+
# arguments:
6+
# - input text
7+
# - voice which is used for the audio
8+
# - output file name
9+
# - play sound after generating the audio
10+
tts(text, Voice.EN_US_MALE_1, "output.mp3", play_sound=True)

main.py

+14-16
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,40 @@
11
# author: Giorgio
2-
# date: 19.03.2024
2+
# date: 23.08.2024
33
# topic: TikTok-Voice-TTS
4-
# version: 1.1
4+
# version: 1.3
55

6+
from codecs import BOM_UTF32
67
import argparse
7-
88
# the script in the directory
9-
import tiktokvoice
9+
from tiktok_voice import tts, Voice
1010

1111
def main():
1212
# adding arguments
1313
parser = argparse.ArgumentParser(description='TikTok TTS')
1414
parser.add_argument('-t', help='text input')
15-
parser.add_argument('-v', help='voice selection', choices=tiktokvoice.VOICES)
16-
parser.add_argument('-n', help='output filename', default='output.mp3')
17-
parser.add_argument('-txt', help='text input from a txt file', type=argparse.FileType('r'))
15+
parser.add_argument('-v', help='voice selection')
16+
parser.add_argument('-o', help='output filename', default='output.mp3')
17+
parser.add_argument('-txt', help='text input from a txt file', type=argparse.FileType('r', encoding="utf-8"))
1818
parser.add_argument('-play', help='play sound after generating audio', action='store_true')
1919

2020
args = parser.parse_args()
2121

2222
# checking if given values are valid
2323
if not args.t and not args.txt:
24-
print("Error: insert a valid text or txt file")
25-
return
24+
raise ValueError("insert a valid text or txt file")
2625

2726
if args.t and args.txt:
28-
print("Error: only one input type is possible")
29-
return
27+
raise ValueError("only one input type is possible")
3028

31-
if not args.v:
32-
print("Error: no voice has been selected")
33-
return
29+
voice: Voice | None = Voice.from_string(args.v)
30+
if voice == None:
31+
raise ValueError("no valid voice has been selected")
3432

3533
# executing script
3634
if args.t:
37-
tiktokvoice.tts(args.t, args.v, args.n, args.play)
35+
tts(args.t, voice, args.o, args.play)
3836
elif args.txt:
39-
tiktokvoice.tts(args.txt.read(), args.v, args.n, args.play)
37+
tts(args.txt.read(), voice, args.o, args.play)
4038

4139
if __name__ == "__main__":
4240
main()

tiktok_voice/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
from .src.text_to_speech import tts
2+
from .src.voice import Voice

tiktok_voice/data/config.json

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[
2+
{
3+
"url": "https://tiktok-tts.weilnet.workers.dev/api/generation",
4+
"response": "data"
5+
},
6+
{
7+
"url": "https://countik.com/api/text/speech",
8+
"response": "v_data"
9+
},
10+
{
11+
"url": "https://gesserit.co/api/tiktok-tts",
12+
"response": "base64"
13+
}
14+
]

tiktok_voice/src/text_to_speech.py

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Python standard modules
2+
import os
3+
import requests
4+
import base64
5+
import re
6+
from json import load
7+
from threading import Thread
8+
from typing import Dict, List, Optional
9+
10+
# Downloaded modules
11+
from playsound import playsound
12+
13+
# Local files
14+
from .voice import Voice
15+
16+
def tts(
17+
text: str,
18+
voice: Voice,
19+
output_file_path: str = "output.mp3",
20+
play_sound: bool = False
21+
):
22+
"""Main function to convert text to speech and save to a file."""
23+
24+
# Validate input arguments
25+
_validate_args(text, voice)
26+
27+
# Load endpoint data from the endpoints.json file
28+
endpoint_data: List[Dict[str, str]] = _load_endpoints()
29+
30+
31+
# Iterate over endpoints to find a working one
32+
for endpoint in endpoint_data:
33+
# Generate audio bytes from the current endpoint
34+
audio_bytes: bytes = _fetch_audio_bytes(endpoint, text, voice)
35+
36+
if audio_bytes:
37+
# Save the generated audio to a file
38+
_save_audio_file(output_file_path, audio_bytes)
39+
40+
# Optionally play the audio file
41+
if play_sound:
42+
playsound(output_file_path)
43+
44+
# Stop after processing a valid endpoint
45+
break
46+
47+
def _save_audio_file(output_file_path: str, audio_bytes: bytes):
48+
"""Write the audio bytes to a file."""
49+
if os.path.exists(output_file_path):
50+
os.remove(output_file_path)
51+
52+
with open(output_file_path, "wb") as file:
53+
file.write(audio_bytes)
54+
55+
def _fetch_audio_bytes(
56+
endpoint: Dict[str, str],
57+
text: str,
58+
voice: Voice
59+
) -> Optional[bytes]:
60+
"""Fetch audio data from an endpoint and decode it."""
61+
62+
# Initialize variables for endpoint validity and audio data
63+
text_chunks: List[str] = _split_text(text)
64+
audio_chunks: List[str] = ["" for _ in range(len(text_chunks))]
65+
66+
# Function to generate audio for each text chunk
67+
def generate_audio_chunk(index: int, text_chunk: str):
68+
try:
69+
response = requests.post(endpoint["url"], json={"text": text_chunk, "voice": voice.value})
70+
response.raise_for_status()
71+
audio_chunks[index] = response.json()[endpoint["response"]]
72+
except (requests.RequestException, KeyError):
73+
return
74+
75+
# Start threads for generating audio for each chunk
76+
threads = [Thread(target=generate_audio_chunk, args=(i, chunk)) for i, chunk in enumerate(text_chunks)]
77+
for thread in threads:
78+
thread.start()
79+
80+
for thread in threads:
81+
thread.join()
82+
83+
if any(not chunk for chunk in audio_chunks):
84+
return None
85+
86+
# Concatenate and decode audio data from all chunks
87+
return base64.b64decode("".join(audio_chunks))
88+
89+
def _load_endpoints() -> List[Dict[str, str]]:
90+
"""Load endpoint configurations from a JSON file."""
91+
script_dir = os.path.dirname(__file__)
92+
json_file_path = os.path.join(script_dir, '../data', 'config.json')
93+
with open(json_file_path, 'r') as file:
94+
return load(file)
95+
96+
def _validate_args(text: str, voice: Voice):
97+
"""Validate the input arguments."""
98+
99+
# Check if the voice is of the correct type
100+
if not isinstance(voice, Voice):
101+
raise TypeError("'voice' must be of type Voice")
102+
103+
# Check if the text is not empty
104+
if not text:
105+
raise ValueError("text must not be empty")
106+
107+
def _split_text(text: str) -> List[str]:
108+
"""Split text into chunks of 300 characters or less."""
109+
110+
# Split text into chunks based on punctuation marks
111+
merged_chunks: List[str] = []
112+
separated_chunks: List[str] = re.findall(r'.*?[.,!?:;-]|.+', text)
113+
114+
# Further split any chunks longer than 300 characters
115+
for i, chunk in enumerate(separated_chunks):
116+
if len(chunk) > 300:
117+
separated_chunks[i:i+1] = re.findall(r'.*?[ ]|.+', chunk)
118+
119+
# Combine chunks into segments of 300 characters or less
120+
current_chunk: str = ""
121+
for separated_chunk in separated_chunks:
122+
if len(current_chunk) + len(separated_chunk) <= 300:
123+
current_chunk += separated_chunk
124+
else:
125+
merged_chunks.append(current_chunk)
126+
current_chunk = separated_chunk
127+
128+
# Append the last chunk
129+
merged_chunks.append(current_chunk)
130+
return merged_chunks

tiktok_voice/src/voice.py

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# author: Giorgio
2+
# date: 23.08.2024
3+
# topic: TikTok-Voice-TTS
4+
# version: 1.3
5+
6+
from enum import Enum
7+
8+
# Enum to define available voices for text-to-speech conversion
9+
class Voice(Enum):
10+
# DISNEY VOICES
11+
GHOSTFACE = 'en_us_ghostface'
12+
CHEWBACCA = 'en_us_chewbacca'
13+
C3PO = 'en_us_c3po'
14+
STITCH = 'en_us_stitch'
15+
STORMTROOPER = 'en_us_stormtrooper'
16+
ROCKET = 'en_us_rocket'
17+
# ENGLISH VOICES
18+
EN_AU_FEMALE_1 = 'en_au_001'
19+
EN_AU_MALE_1 = 'en_au_002'
20+
EN_UK_MALE_1 = 'en_uk_001'
21+
EN_UK_MALE_2 = 'en_uk_003'
22+
EN_US_FEMALE_1 = 'en_us_001'
23+
EN_US_FEMALE_2 = 'en_us_002'
24+
EN_US_MALE_1 = 'en_us_006'
25+
EN_US_MALE_2 = 'en_us_007'
26+
EN_US_MALE_3 = 'en_us_009'
27+
EN_US_MALE_4 = 'en_us_010'
28+
# EUROPE VOICES
29+
FR_MALE_1 = 'fr_001'
30+
FR_MALE_2 = 'fr_002'
31+
DE_FEMALE = 'de_001'
32+
DE_MALE = 'de_002'
33+
ES_MALE = 'es_002'
34+
# AMERICA VOICES
35+
ES_MX_MALE = 'es_mx_002'
36+
BR_FEMALE_1 = 'br_001'
37+
BR_FEMALE_2 = 'br_003'
38+
BR_FEMALE_3 = 'br_004'
39+
BR_MALE = 'br_005'
40+
# ASIA VOICES
41+
ID_FEMALE = 'id_001'
42+
JP_FEMALE_1 = 'jp_001'
43+
JP_FEMALE_2 = 'jp_003'
44+
JP_FEMALE_3 = 'jp_005'
45+
JP_MALE = 'jp_006'
46+
KR_MALE_1 = 'kr_002'
47+
KR_FEMALE = 'kr_003'
48+
KR_MALE_2 = 'kr_004'
49+
# SINGING VOICES
50+
EN_FEMALE_ALTO = 'en_female_f08_salut_damour'
51+
EN_MALE_TENOR = 'en_male_m03_lobby'
52+
EN_FEMALE_WARMY_BREEZE = 'en_female_f08_warmy_breeze'
53+
EN_MALE_SUNSHINE_SOON = 'en_male_m03_sunshine_soon'
54+
# OTHER
55+
EN_MALE_NARRATION = 'en_male_narration'
56+
EN_MALE_FUNNY = 'en_male_funny'
57+
EN_FEMALE_EMOTIONAL = 'en_female_emotional'
58+
59+
# Function to check if a string matches any enum member name
60+
def from_string(input_string: str):
61+
# Iterate over all enum members
62+
for voice in Voice:
63+
if voice.name == input_string:
64+
return voice
65+
return None

0 commit comments

Comments
 (0)