Adversarial Search Engine Optimization for Large Language Models

简介

大型语言模型（LLMs）越来越多地用于选择竞争第三方内容的应用程序，例如LLM驱动的搜索引擎或聊天机器人插件。在本文中，我们介绍了一种新的攻击类别，即偏好操纵攻击，该攻击操纵LLM的选择以支持攻击者。我们证明，精心制作的网站内容或插件文档可以欺骗LLM，以促进攻击者的产品并贬低竞争对手，从而增加用户流量和盈利能力。我们展示了这导致囚徒困境，所有各方都有动机发动攻击，但集体效应会降低LLM的输出，影响每个人。我们在生产LLM搜索引擎（Bing和Perplexity）和插件API（用于GPT-4和Claude）上展示了我们的攻击。随着LLM越来越多地用于排名第三方内容，我们预计偏好操纵攻击将成为一个重要的威胁。
图表
解决问题

Preference Manipulation Attacks in Large Language Models
关键思路

Crafting website content or plugin documentation to manipulate an LLM's selections and promote the attacker's products while discrediting competitors, leading to a prisoner's dilemma where all parties are incentivized to launch attacks but the collective effect degrades the LLM's outputs for everyone
其它亮点

Demonstration of attacks on production LLM search engines (Bing and Perplexity) and plugin APIs (GPT-4 and Claude), potential for significant threat as LLMs are increasingly used to rank third-party content
相关研究

N/A