报告摘要:Speaker recognition is a decision task, for which the optimal scoring scheme should lead to minimum Bayes risk (MBR). According to the distribution of speaker embeddings, the corresponding MBR model is different. We will demonstrate that if the distribution is a linear Gaussian, then PLDA is MBR optimal, and cosine distance can be regarded as an approximation of PLDA. I will show that PLDA is optimal for both verification and identification, and discuss some useful properties and extensions. In particular, I'll advocate that researchers should pay attention to both discrimination and normalization of the embedding, though the latter has been largely ignored.

讲者介绍:王东,清华大学语音语言中心副研究员,在语音语言领域发表论文150余篇,著有《人工智能》《机器学习导论》等著作,发布THCHS30, CNCeleb等Kaldi基线系统。

内容中包含的图片若涉及版权问题,请及时与我们联系删除