全球进化引导:通过跨层一致性优化激活引导控制

📄 中文摘要

激活工程使得在不增加计算成本的情况下对大型语言模型(LLMs)进行精确控制成为可能。然而,现有方法从静态激活差异中导出向量,容易受到高维噪声和层间语义漂移的影响,往往捕捉到虚假的相关性而非目标意图。为了解决这一问题,提出了一种基于网络表示演化几何稳定性的训练无关框架——全球进化精细引导(GER-steer)。GER-steer利用这一全局信号来修正原始引导向量,有效地将稳健的语义意图与正交伪影解耦。大量评估结果表明,GER-steer在性能上始终优于现有方法。

📄 English Summary

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. Existing methods deriving vectors from static activation differences are prone to high-dimensional noise and layer-wise semantic drift, often capturing spurious correlations rather than the intended semantic intent. To address these issues, Global Evolutionary Refined Steering (GER-steer) is proposed as a training-free framework grounded in the geometric stability of the network's representation evolution. GER-steer exploits this global signal to rectify raw steering vectors, effectively decoupling robust semantic intent from orthogonal artifacts. Extensive evaluations demonstrate that GER-steer consistently outperforms existing methods.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等