图的标记化：连接图与变换器

出处: Graph Tokenization for Bridging Graphs and Transformers

发布: 2026年3月13日

📄 中文摘要

成功的大型预训练变换器与标记器密切相关，标记器将原始输入转换为离散符号。将这些模型扩展到图结构数据仍然是一个重大挑战。研究提出了一种图标记化框架，通过结合可逆图序列化和广泛采用的字节对编码（BPE），生成图的顺序表示。为了更好地捕捉结构信息，图序列化过程受到图子结构全局统计的指导，确保频繁出现的子结构在序列中更常出现，并能被BPE合并为有意义的标记。实验证明了该方法在图数据处理中的有效性。

🏷️ 相关标签

#图标记化 #图结构数据 #字节对编码 #序列表示 #结构信息

📄 English Summary

Graph Tokenization for Bridging Graphs and Transformers

The success of large pretrained Transformers is closely tied to tokenizers that convert raw input into discrete symbols. Extending these models to graph-structured data remains a significant challenge. This research introduces a graph tokenization framework that generates sequential representations of graphs by combining reversible graph serialization, which preserves graph information, with Byte Pair Encoding (BPE), a widely adopted tokenizer in large language models (LLMs). To better capture structural information, the graph serialization process is guided by global statistics of graph substructures, ensuring that frequently occurring substructures appear more often in the sequence and can be merged by BPE into meaningful tokens. Empirical results demonstrate the effectiveness of this approach in processing graph data.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Graph Tokenization for Bridging Graphs and Transformers

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误