Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何提升说话人识别的准确率 #411

Open
chenfuckthesky opened this issue Feb 26, 2025 · 4 comments
Open

如何提升说话人识别的准确率 #411

chenfuckthesky opened this issue Feb 26, 2025 · 4 comments

Comments

@chenfuckthesky
Copy link

在使用wespeaker的过程中,发现很多时候无法把说话人分离开,比如附件里的这个录音,是一男一女两个人在对话,音色的差别听上去还挺大的,但是最后测试的结果是下面这样的。所以我的问题是,有没有什么参数,比如相似度之类的,可以提升准确率。我仔细看了Speaker这个类,但是没有收获:

('unk', 0.1, 1.9, 0)
('unk', 2.0, 4.1, 0)
('unk', 4.7, 5.7, 0)
('unk', 29.8, 30.3, 0)
('unk', 32.5, 33.2, 0)
('unk', 33.5, 36.1, 0)
('unk', 36.5, 38.9, 0)
('unk', 40.3, 44.8, 0)
('unk', 45.1, 46.2, 0)
('unk', 46.9, 50.6, 0)
('unk', 50.7, 56.4, 0)
('unk', 58.3, 64.4, 0)
('unk', 67.2, 67.7, 0)
('unk', 67.9, 73.1, 0)
('unk', 73.9, 74.9, 0)
('unk', 76.9, 78.4, 0)
('unk', 78.9, 81.4, 0)
('unk', 83.6, 85.4, 0)
('unk', 88.5, 91.6, 0)
('unk', 91.7, 92.3, 0)
('unk', 92.6, 94.1, 0)
('unk', 94.3, 94.7, 0)
('unk', 94.8, 95.4, 0)

1ac0e486-68a2-11ee-a110-591e6b00846c-all.zip

@chenfuckthesky
Copy link
Author

补充一下,这个录音是8K的,我把它重采样成16K的,结果也一样

@JiJiJiang
Copy link
Collaborator

音频是不是比较短?可以修改一下topk或者p的值,改得相对小一点,出更多人数的概率大一点!

@chenfuckthesky
Copy link
Author

这个音频是100秒,应该不算短吧。因为我们的场景是客服和用户之间的通话,通常不会太长。你说的topk或者p值应该在哪个文件里改?我这边代码有去看了,但是实在是水平有限,在计算聚类的那几个文件里都没有找到

@JiJiJiang
Copy link
Collaborator

def cluster(embeddings, p=.01, num_spks=None, min_num_spks=1, max_num_spks=20):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants