📄 中文摘要
大型语言模型(LLM)家族正在迅速发展,但多模态能力的出现和传播速度尚不明确。利用Hugging Face模型元数据和记录的谱系字段构建的ModelBiome AI生态系统数据集(超过1.8百万个模型条目),对多模态能力在时间上的演变及其在记录的亲子关系中的分布进行了量化分析。在更广泛的生态系统中,跨模态任务在主要开放LLM家族普遍流行之前就已广泛存在:在这些家族中,多模态能力在2023年及2024年大部分时间仍然稀少,随后在2024至2025年间急剧增加,主要以图像-文本视觉语言任务为主。各大家族中,首个视觉语言模型(VLM)变体通常在这一时间段内首次出现。
📄 English Summary
Founder effects shape the evolutionary dynamics of multimodality in open LLM families
Large language model (LLM) families are evolving rapidly, yet the emergence and propagation speed of multimodal capabilities remain unclear. Utilizing the ModelBiome AI Ecosystem dataset, which encompasses Hugging Face model metadata and lineage fields (over 1.8 million model entries), this study quantifies multimodality over time and along recorded parent-child relationships. Cross-modal tasks are prevalent in the broader ecosystem well before they become common within major open LLM families. Within these families, multimodality remains rare through 2023 and most of 2024, then sharply increases between 2024 and 2025, predominantly driven by image-text vision-language tasks. The first vision-language model (VLM) variants typically emerge during this period across major families.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等