吴恩达来信：区别人工智能生成与人类生成

作者吴恩达，全球人工智能教育及研究领导者、DeepLearning.AI创始人

上半部分为英文原文，下半部分为中文翻译，原文链接是 https://zhuanlan.zhihu.com/p/612640260

Dear friends,

ChatGPT has raised fears that students will harm their learning by using it to complete assignments. Voice cloning, another generative AI technology, has fooled people into giving large sums of money to scammers, as you can read below in this issue of The Batch. Why don’t we watermark AI-generated content to make it easy to distinguish from human-generated content? Wouldn’t that make ChatGPT-enabled cheating harder and voice cloning less of a threat? While watermarking can help, unfortunately financial incentives in the competitive market for generative AI make their adoption challenging.

Effective watermarking technology exists. OpenAI has talked about developing it to detect text produced by ChatGPT, and this tweet storm describes one approach. Similarly, a watermark can be applied invisibly to generated images or audio. While it may be possible to circumvent these watermarks (for instance, by erasing them), they certainly would pose a barrier to AI-generated content that masquerades as human-generated.

Unfortunately, I’m not optimistic that this solution will gain widespread adoption. Numerous providers are racing to provide text-, image-, and voice-generation services. If one of them watermarks its output, it will risk imposing on itself a competitive disadvantage (even if it may make society as a whole better off).For example, assuming that search engines downranked AI-generated text, SEO marketers who wanted to produce high-ranking content would have a clear incentive to make sure their text wasn’t easily identifiable as generated. Similarly, a student who made unauthorized use of a text generator to do their homework would like it to be difficult for the teacher to find out.

Even if a particular country were to mandate watermarking of AI-generated content, the global nature of competition in this market likely would incentivize providers in other countries to ignore that law and keep generating human-like output without watermarking.

Some companies likely will whitewash these issues by talking about developing watermarking technology without actually implementing it. An alternative to watermarking is to use machine learning to classify text as either AI- or human-generated. However, systems like GPTzero that attempt to do so have a high error rate and don’t provide a robust solution.

If one company were to establish a monopoly or near-monopoly, then it would have the market power to implement watermarking without risking losing significant market share. Given the many downsides of monopolies, this is absolutely not the outcome we should hope for.

So what’s next? I think we’re entering an era when, in many circumstances, it will be practically impossible to tell if a piece of content is human- or AI-generated. We will need to figure out how to re-architect both human systems such as schools and computer systems such as biometric security to operate in this new — and sometimes exciting — reality. Years ago when Photoshop was new, we learned what images to trust and not trust. With generative AI, we have another set of discoveries ahead of us.

Keep learning!

Andrew

亲爱的朋友们,

ChatGPT已经引发了人们的担忧——即学生使用它来完成作业的行为会对学业造成不良影响。语音克隆是另一种生成式人工智能技术，它愚弄了人们，让他们把大笔金钱交到骗子手里，你可以在本周的《The Batch》中了解相关内容。为什么我们不给人工智能生成的内容加上水印，使其有别于人类生成的内容呢？这难道不会打击利用ChatGPT作弊、减少声音克隆造成的威胁吗？虽然水印可以有所帮助，但无奈的是，生成式人工智能竞争市场中的财务激励为其真正应用带来了挑战性。

现在已经有了可用的水印技术。OpenAI曾说过要开发水印技术来检测ChatGPT产生的文本。水印可以隐秘地被应用于生成图像或音频。虽然有可能避开这些水印（例如删除它们），但它们肯定会对伪装成人类生成的人工智能生成内容构成障碍。

遗憾的是，我对这种解决方案能否被广泛应用并不乐观。许多供应商都在竞相提供文本、图像和语音生成服务。如果其中一家公司对其输出加注水印，它就有可能使自己置于竞争劣势（即便这可能使整个社会变得更好）。

例如，假设搜索引擎降低了人工智能生成文本的排名，那么想要生成高级别内容的SEO营销人员将有明显的动机确保他们的文本不容易被识别为生成内容。类似地，一个学生在未经授权的情况下使用文本生成器完成作业，并希望老师很难发现。即使某个特定国家要求对人工智能生成的内容进行水印处理，该市场的全球竞争性质可能会激励其他国家的提供商无视这一法律，继续在没有水印的情况下生成类似人类的输出。

有些公司可能会试图掩盖这些问题，只表示要开发水印技术，而不实际实施。水印的另一种选择是使用机器学习将文本分类为人工智能或人类生成。然而，像GPTzero这种试图这样做的系统存在很高的错误率，并且不能提供稳健的解决方案。

如果一家公司想要建立垄断或几乎垄断，那么它将获得市场力量来进行水印处理，且不会面临丢失大量市场份额的风险。考虑到垄断的诸多不利因素，这绝对不是我们应该期待的结果。

接下来会发生什么呢？我认为我们正在进入一个时代——在许多情况下，我们很难分辨一段内容是人类制造的还是人工智能生成的。我们需要弄清楚如何重新构建诸如学校的人类系统和生物识别安全的计算机系统，以便在这个新（有时令人兴奋的）时代中前行。几年前，当Photoshop刚刚出现的时候，我们学习了分辨图像得真假。有了生成式人工智能，更多新发现也在前方等着我们。

请不断学习！
吴恩达

更多阅读

吴恩达来信：ChatGPT很酷，RL也很酷

吴恩达来信：Prompt engineering的现状与未来

内容中包含的图片若涉及版权问题，请及时与我们联系删除

吴恩达来信：区别人工智能生成与人类生成

评论列表

评论