DOST -- Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
解决问题:本篇论文旨在解决多标签分类任务中标签噪声对领域规则违反的影响,并提出了一种新的域顺从自监督训练(DOST)范式,以缓解噪声影响并提高学习性能。
关键思路:DOST范式通过域指导检测有问题的注释并以自监督的方式防止违反规则的预测,从而使深度学习模型更符合领域规则,提高学习性能,最小化注释噪声的影响。与当前领域的研究相比,该论文的思路在于将领域规则纳入考虑,并提出了新的解决方案。
其他亮点:该论文使用两个大规模多标签分类数据集进行实证研究,证明了该方法在各个方面都有所改进,并且通常完全抵消了噪声的影响。此外,该论文还提供了开源代码,值得进一步深入研究和探讨。
关于作者:本文的主要作者是Soumadeep Saha、Utpal Garain、Arijit Ukil、Arpan Pal和Sundeep Khandelwal。他们分别来自印度科学院、印度技术学院和印度国防研究与发展组织。之前,他们在多个领域的人工智能研究中都有过出色的表现,例如自然语言处理、计算机视觉和机器学习等。
相关研究:近期其他相关的研究包括:
- "A Survey on Multi-Label Classification Techniques", by Weiwei Liu and Xin Gao, from Huazhong University of Science and Technology.
- "Learning with Feature Evolvable Streams for Multi-Label Classification", by Xiangxiang Xu, from Fudan University.
- "Deep Multi-Label Text Classification with Semantic Description Attention", by Yuhang Lu, from Zhejiang University.
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more "data efficient" and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
内容中包含的图片若涉及版权问题,请及时与我们联系删除


评论
沙发等你来抢