会话风险记忆(SRM):用于确定性预执行安全门的时间授权

📄 中文摘要

确定性预执行安全门评估个体代理的行为是否与其分配的角色相兼容。尽管在逐步授权方面有效,但这些系统对分布式攻击的结构性盲点使得有害意图可以通过多个合规步骤进行分解。提出了一种轻量级的确定性模块——会话风险记忆(SRM),它通过轨迹级授权扩展了无状态执行门。SRM维护一个紧凑的语义质心,代表代理会话的不断演变的行为特征,并通过对基线减法门输出的指数移动平均累积风险信号。该模块在与基础执行门相同的语义向量表示上运行,增强了系统对潜在风险的识别能力。

📄 English Summary

Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. A lightweight deterministic module, Session Risk Memory (SRM), is introduced to extend stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gates, enhancing the system's ability to identify potential risks.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等