Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

Nanna Inie ,
Jonathan Stray ,
Leon Derczynski
25
热度
NLP
SEC
HCI
2023年11月10日
  • 简介
    本文介绍了人们通过攻击大型语言模型(LLMs)故意生成异常输出的新型人类活动。采用正式的定性方法,我们采访了来自不同背景的数十位从业者,他们都是尝试使LLMs失败的贡献者。我们将这一活动与从业者的动机和目标、他们采用的策略和技术以及社区的关键作用联系起来。因此,本文提出了一个关于人们如何和为什么攻击大型语言模型的基础理论:野外LLM红队。
  • 图表
  • 解决问题
    The paper aims to present a thorough exposition of how and why people perform attacks on large language models (LLMs) and how the community plays a crucial role in this activity.
  • 关键思路
    The paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.
  • 其它亮点
    The paper uses a formal qualitative methodology to interview dozens of practitioners from a broad range of backgrounds. It relates and connects the motivations, goals, strategies, and techniques of LLM red teaming practitioners. The experiments and datasets used are not mentioned, but the paper highlights the importance of the community in this activity. The paper suggests that this novel activity of attacking LLMs is a significant development in the field of AI.
  • 相关研究
    Related work is not mentioned in the abstract.
PDF
原文
点赞 收藏 评论 分享到Link

沙发等你来抢

去评论