基于令牌图的改进句子表示方法

出处: Towards Improved Sentence Representations using Token Graphs

发布: 2026年3月5日

📄 中文摘要

获得来自大型语言模型（LLM）令牌级输出的单向量表示是几乎所有句子级任务的关键步骤。然而，标准的池化方法如均值或最大聚合将令牌视为独立集合，忽略了模型自注意力层捕捉的丰富关系结构，导致信号稀释。为了解决这个问题，提出了一种轻量级的结构感知池化模块GLOT，将池化重新框定为关系学习后跟聚合。GLOT在冻结的LLM输出上操作，首先构建潜在的令牌相似性图，然后通过图神经网络优化令牌表示，最后使用读出层进行聚合。

🏷️ 相关标签

#句子表示 #令牌图 #图神经网络 #池化方法

📄 English Summary

Towards Improved Sentence Representations using Token Graphs

Obtaining a single-vector representation from the token-level outputs of a Large Language Model (LLM) is crucial for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers, which leads to signal dilution. To address this issue, GLOT, a lightweight and structure-aware pooling module, is proposed. It reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, refines token representations using a graph neural network, and finally aggregates them with a readout layer.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Towards Improved Sentence Representations using Token Graphs

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误