Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

NEW

简介

本文介绍了人们通过攻击大型语言模型（LLMs）故意生成异常输出的新型人类活动。采用正式的定性方法，我们采访了来自不同背景的数十位从业者，他们都是尝试使LLMs失败的贡献者。我们将这一活动与从业者的动机和目标、他们采用的策略和技术以及社区的关键作用联系起来。因此，本文提出了一个关于人们如何和为什么攻击大型语言模型的基础理论：野外LLM红队。
作者讲解

目前尚无作者解读视频，你可点击下方【许愿开讲】按钮，许愿作者开讲~
图表
解决问题

The paper aims to present a thorough exposition of how and why people perform attacks on large language models (LLMs) and how the community plays a crucial role in this activity.
关键思路

The paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.
其它亮点

The paper uses a formal qualitative methodology to interview dozens of practitioners from a broad range of backgrounds. It relates and connects the motivations, goals, strategies, and techniques of LLM red teaming practitioners. The experiments and datasets used are not mentioned, but the paper highlights the importance of the community in this activity. The paper suggests that this novel activity of attacking LLMs is a significant development in the field of AI.
相关研究

Related work is not mentioned in the abstract.

许愿开讲

PDF

原文

点赞收藏

向作者提问

NEW

分享到Link

提交问题，平台邀请作者，轻松获得权威解答～

向作者提问