SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials

简介

大型语言模型（LLMs）处于自然语言处理领域的前沿，但在处理快捷学习、事实不一致和容易受到对抗性输入攻击方面存在缺陷。在医疗背景下，这些缺陷可能会误导实际模型的能力。为了解决这个问题，我们提出了SemEval-2024任务2：临床试验安全生物医学自然语言推理。我们的贡献包括经过精细处理的NLI4CT-P数据集（即临床试验的自然语言推理-扰动），旨在通过介入和因果推理任务挑战LLMs，以及参与者提交的方法和结果的全面评估。共有106名参与者注册了该任务，贡献了超过1200个独立的提交和25篇系统概述论文。这个举措旨在推进NLI模型在医疗保健领域的鲁棒性和适用性，确保在临床决策中提供更安全、更可靠的AI辅助。我们预计，该任务的数据集、模型和结果可以支持未来生物医学NLI领域的研究。数据集、竞赛排行榜和网站均可公开获取。
图表
解决问题

SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for ClinicalTrials aims to address the shortcomings of Large Language Models (LLMs) in dealing with shortcut learning, factual inconsistency, and vulnerability to adversarial inputs in medical contexts. The refined NLI4CT-P dataset is designed to challenge LLMs with interventional and causal reasoning tasks.
关键思路

The key idea of the paper is to improve the robustness and applicability of NLI models in healthcare by presenting a new dataset and a comprehensive evaluation of methods and results for participant submissions.
其它亮点

A total of 106 participants registered for the task contributing to over 1200 individual submissions and 25 system overview papers. The dataset, competition leaderboard, and website are publicly available. The initiative aims to ensure safer and more dependable AI assistance in clinical decision-making and support future research in the field of biomedical NLI.
相关研究

The paper does not specifically mention related works in the field of biomedical NLI, but it highlights the shortcomings of LLMs in dealing with medical contexts.

SemEval-2024 Task 2: Safe Biomedical Natural Language Inference for Clinical Trials

评论