聊天盒子先生是一个(弱)维多利亚时代的伦理训练模型,可以在自己的计算机上运行

📄 中文摘要

聊天盒子先生是一个完全基于维多利亚时代的文本训练的语言模型,训练数据来自英国图书馆的超28,000本书籍,出版时间介于1837年至1899年之间。该模型的词汇和思想完全源自19世纪的文学作品,未包含1899年之后的任何训练输入。经过筛选,训练语料库包含28,035本书,估计输入标记数达到29.3亿。该模型的发布为研究和应用提供了一个独特的历史文本生成工具。

📄 English Summary

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Mr. Chatterbox is a language model trained entirely from Victorian-era texts, specifically over 28,000 books published between 1837 and 1899, sourced from the British Library. The model's vocabulary and ideas are exclusively derived from 19th-century literature, with no training inputs from after 1899. The training corpus consists of 28,035 books, resulting in an estimated 2.93 billion input tokens after filtering. The release of this model offers a unique tool for generating text based on historical literature for research and application.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等