未经许可的内容如何训练 AI 模型:摧毁 AI 隐私的同意危机
📄 中文摘要
所有来自 OpenAI、谷歌、Meta 和 Anthropic 的 AI 模型均基于未经同意或补偿抓取的数十亿网页进行训练。用户的博客文章、研究论文、照片和个人内容成为训练数据,而用户对此并未表示同意。当前缺乏法律框架来阻止这一现象,且没有选择退出的机制。TIAMAT 的存在表明,这一问题无法通过监管解决,唯有通过技术手段设立屏障才能解决。
📄 English Summary
How Your Content Trains AI Models Without Permission: The Consent Crisis Destroying AI Privacy
AI models from OpenAI, Google, Meta, and Anthropic are trained on billions of web pages scraped without consent or compensation. Blog posts, research papers, photos, and personal content become training data without user agreement. There is no legal framework preventing this practice, and no opt-out option is available. The existence of TIAMAT indicates that this issue cannot be resolved through regulation; it can only be addressed with technological barriers.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等