Skip to content

Commit d873a29

Browse files
committed
modify intro texts
1 parent c95fd7f commit d873a29

File tree

2 files changed

+10
-6
lines changed

2 files changed

+10
-6
lines changed

index.html

+5-3
Original file line numberDiff line numberDiff line change
@@ -1259,10 +1259,11 @@ <h3>Prompted Generation</h3>
12591259
<p class="lead">* please scroll horizontally to explore additional columns in the table.</p>
12601260
</div>
12611261
<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
1262-
<h3>Speech Inpainting</h3>
1262+
<h3>Speech Editing</h3>
12631263
<p class="lead">
1264-
In this task, we evaluate on test set C. We mask fragments of the waveforms, and ask the models to generate the full waveforms. The masked sections are highlighted within the text.
1265-
All speakers are unseen for all systems during training.
1264+
We evaluated the performance of text-based speech editing on the speech inpainting task.
1265+
The models generate complete waveforms given complete texts and partially masked waveforms. The masked sections are highlighted within the text.
1266+
All speakers were unseen by all systems during training. The following 20 test cases are from test set C (long).
12661267
</p>
12671268
<div class="table-responsive" style="overflow-x: scroll">
12681269
<table class="table table-sm">
@@ -2046,6 +2047,7 @@ <h3>Prompted Generation (Comparing with Proprietary Systems)</h3>
20462047
<p class="lead">
20472048
In this section, we compare our system with proprietary systems including NaturalSpeech 2/3, MegaTTS 2, UniAudio, CLaM-TTS, VoiceBox, and VALL-E. The source codes and model weights for these models are not available.
20482049
The following samples are obtained from their online demo pages. All waveforms are downsampled to 16kHz.
2050+
Please note that ARDiT's performance is influenced by the fact that the prompt waveforms are in 16kHz, not 24kHz, and the prompt texts are not semantically coherent with the target texts.
20492051
</p>
20502052
<p class="lead">1~4 are obtained from
20512053
<a href="https://speechresearch.github.io/naturalspeech3/">NaturalSpeech 3</a> and 5~20 are obtained from

index.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,12 @@
4949

5050
with div(cls="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded"):
5151
from inpaint import get_table
52-
h3("Speech Inpainting")
52+
h3("Speech Editing")
5353
p(
5454
"""
55-
In this task, we evaluate on test set C. We mask fragments of the waveforms, and ask the models to generate the full waveforms. The masked sections are highlighted within the text.
56-
All speakers are unseen for all systems during training.
55+
We evaluated the performance of text-based speech editing on the speech inpainting task.
56+
The models generate complete waveforms given complete texts and partially masked waveforms. The masked sections are highlighted within the text.
57+
All speakers were unseen by all systems during training. The following 20 test cases are from test set C (long).
5758
""",
5859
cls="lead"
5960
)
@@ -67,6 +68,7 @@
6768
"""
6869
In this section, we compare our system with proprietary systems including NaturalSpeech 2/3, MegaTTS 2, UniAudio, CLaM-TTS, VoiceBox, and VALL-E. The source codes and model weights for these models are not available.
6970
The following samples are obtained from their online demo pages. All waveforms are downsampled to 16kHz.
71+
Please note that ARDiT's performance is influenced by the fact that the prompt waveforms are in 16kHz, not 24kHz, and the prompt texts are not semantically coherent with the target texts.
7072
""",
7173
cls="lead"
7274
)

0 commit comments

Comments
 (0)