A Real-Time Voice Activity Detection Based On Lightweight Neural

2024年05月27日
  • 简介
    语音活动检测(VAD)是指在音频流中检测语音的任务,由于真实环境中存在众多未知噪声和低信噪比,这是一项具有挑战性的任务。最近,基于神经网络的VAD在一定程度上缓解了性能下降的问题。然而,现有研究大多采用过于庞大的模型并加入了未来的上下文,而忽略了模型的操作效率和延迟。本文提出了一种轻量级和实时的神经网络MagicNet,它利用了非正式和深度可分离的1-D卷积和GRU。我们的提议模型不依赖于未来的特征作为输入,与两种最先进的算法在合成的领域内和领域外的测试数据集上进行了比较。评估结果表明,MagicNet可以在更少的参数成本下实现更好的性能和鲁棒性。
  • 作者讲解
  • 图表
  • 解决问题
    MagicNet: A Lightweight and Real-Time Neural Network for Voice Activity Detection
  • 关键思路
    MagicNet is a lightweight and real-time neural network that utilizes casual and depth separable 1-D convolutions and GRU to detect speech in an audio stream without relying on future features as input.
  • 其它亮点
    The proposed model is compared with two state-of-the-art algorithms on synthesized in-domain and out-domain test datasets and achieves improved performance and robustness with fewer parameter costs. The experiments show that MagicNet is efficient and has low latency. The paper also provides open-source code for the model.
  • 相关研究
    Recent studies in this field have employed excessively large models and incorporated future context, while MagicNet does not rely on future features and is lightweight. Some related papers are 'A Lightweight and Accurate Voice Activity Detection Model for Wearable Devices Using Convolutional Neural Networks' and 'Real-Time Voice Activity Detection Using Deep Recurrent Neural Networks with Bidirectional Long Short-Term Memory'.
许愿开讲
PDF
原文
点赞 收藏
向作者提问
NEW
分享到Link

提问交流

提交问题,平台邀请作者,轻松获得权威解答~

向作者提问