来自今天的爱可可AI前沿推介

[IR] What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News

D S Shah, S He, R Bansal
[Microsoft]

本地新闻检测的弱监督方法

要点:

  1. 本地新闻推荐的两步管道,包括检测本地新闻并确定文章的地理位置和影响半径;
  2. 一个弱监督框架,结合了域知识和自动数据处理,以训练本地新闻检测的深度学习模型;
  3. 具有向非英语语言的可扩展性,可向用户提供更准确的本地新闻,帮助当地企业获得更多曝光率,并为人们提供有关其社区安全的更多信息。

一句话总结:
本地新闻推荐的两步管道,用包含域知识和自动数据处理的弱监督框架,以训练可扩展到多种语言的本地新闻检测深度学习模型,以改进用户和本地企业的本地新闻推荐。

摘要:
本地新闻文章是影响地理区域(如城市、县或州)用户的新闻子集。检测本地新闻(第1步)并随后决定其地理位置和影响半径(第2步)是本地新闻准确推荐的两个重要步骤。由于对新闻内容缺乏了解,简单的基于规则的方法,例如从新闻标题中检测城市名称,往往会产生错误的结果。在自然语言处理最新发展的推动下,本文开发了一个集成管道,可以实现自动本地新闻检测和基于内容的本地新闻推荐。本文重点介绍了管道的第一步,该步骤强调了:(1) 与域知识和自动数据处理相结合的弱监督框架,以及 (2) 可扩展到多语言设置。与斯坦福 CoreNLP NER 模型相比,该管道在现实世界和人工标记的数据集上具有更高的精度和召回度。该管道有可能向用户提供更精确的本地新闻,帮助当地企业获得更多曝光率,并为人们提供更多关于其社区安全的信息。

Local news articles are a subset of news that impact users in a geographical area, such as a city, county, or state. Detecting local news (Step 1) and subsequently deciding its geographical location as well as radius of impact (Step 2) are two important steps towards accurate local news recommendation. Naive rule-based methods, such as detecting city names from the news title, tend to give erroneous results due to lack of understanding of the news content. Empowered by the latest development in natural language processing, we develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations. In this paper, we focus on Step 1 of the pipeline, which highlights: (1) a weakly supervised framework incorporated with domain knowledge and auto data processing, and (2) scalability to multi-lingual settings. Compared with Stanford CoreNLP NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset. This pipeline has potential to more precise local news to users, helps local businesses get more exposure, and gives people more information about their neighborhood safety.

论文链接:https://arxiv.org/abs/2301.08146
图片
图片
图片

内容中包含的图片若涉及版权问题,请及时与我们联系删除