Does the prosody codes[0] work? #10

dalazymodder · 2024-06-30T15:23:38Z

I tried to test the code some specifically for prosody but it seemed like the prosody was tied to codes[1] with the content?

Plachtaa · 2024-06-30T19:21:39Z

what kind of test did you performed on prosody code?

dalazymodder · 2024-06-30T19:48:28Z

I took two different sound bytes and changed the line

z2 = model.encoder(codes[0], codes[1], timbre2, use_p_code=False, n_c=1)

line 103 to

z2 = model.encoder(codes2[0], codes[1], timbre, use_p_code=False, n_c=1)

this should make it so it only outputs the file with prosody changed between the two different sound files right?

But if you look at the resulting files they appear to be identical for the the unedited reconstruction vs the new one that should have different prosody.

I also tried changing the code to this, which changed content and prosody.

z2 = model.encoder(codes[0], codes2[1], timbre, use_p_code=False, n_c=1)

Sorry if I got anything wrong I'm a novice at this but isnt the prosody kind of like the emotion and timing of the speech?

I did add some lines to pad both audio files to same length, but I don't think that should affect the prosody.

def main(args):
source = args.source
target = args.target
source_audio = librosa.load(source, sr=24000)[0]
ref_audio = librosa.load(target, sr=24000)[0]

# Find the length of the longest audio and add a small buffer (e.g., 1 second)
max_length = max(len(source_audio), len(ref_audio))
target_length = max_length + 24000  # Add 1 second (24000 samples at 24kHz)

# Pad both audios to the target length
source_audio = np.pad(source_audio, (0, target_length - len(source_audio)), mode='constant')
ref_audio = np.pad(ref_audio, (0, target_length - len(ref_audio)), mode='constant')

# Convert to torch tensors
source_audio = torch.tensor(source_audio).unsqueeze(0).float().to(device)
ref_audio = torch.tensor(ref_audio).unsqueeze(0).float().to(device)

Plachtaa · 2024-07-04T19:51:17Z

Thanks for your experiment. It was very helpful for us to understand what exactly the prosody component stands for.
I will try to replicate your experiment myself, and will tell you if I could give you any explanations about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the prosody codes[0] work? #10

Does the prosody codes[0] work? #10

dalazymodder commented Jun 30, 2024

Plachtaa commented Jun 30, 2024

dalazymodder commented Jun 30, 2024 •

edited

Loading

Plachtaa commented Jul 4, 2024

Does the prosody codes[0] work? #10

Does the prosody codes[0] work? #10

Comments

dalazymodder commented Jun 30, 2024

Plachtaa commented Jun 30, 2024

dalazymodder commented Jun 30, 2024 • edited Loading

Plachtaa commented Jul 4, 2024

dalazymodder commented Jun 30, 2024 •

edited

Loading