软最近邻损失的专家混合模型:通过表示解耦解决专家崩溃问题

📄 中文摘要

提出了一种增强的专家混合模型(MoE)架构,该架构利用特征提取网络,并通过软最近邻损失(SNNL)进行优化,以解决专家崩溃问题。在传统的MoE架构中,由于输入特征空间中类边界的重叠,多个专家往往学习到冗余的表示,导致门控网络的路由变得僵化。通过在将输入特征传递给门控和专家网络之前,对潜在空间进行预处理,最小化类相似数据点之间的距离,从而有效地解决了结构性专家崩溃问题。这种方法不仅提高了专家的多样性,还增强了模型在处理复杂数据集时的表现。

📄 English Summary

Mixture of Experts with Soft Nearest Neighbor Loss: Resolving Expert Collapse via Representation Disentanglement

An enhanced Mixture-of-Experts (MoE) architecture is proposed that employs a feature extractor network optimized using Soft Nearest Neighbor Loss (SNNL) to address the issue of expert collapse. In traditional MoE architectures, overlapping class boundaries in the input feature space often lead to multiple experts learning redundant representations, forcing the gating network into rigid routing. By pre-conditioning the latent space to minimize distances among class-similar data points before feeding input features to the gating and expert networks, structural expert collapse is effectively resolved. This approach not only improves the diversity of experts but also enhances the model's performance when dealing with complex datasets.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等