Agent Banana:基于代理思维和工具的高保真图像编辑
📄 中文摘要
研究针对专业工作流程中的基于指令的图像编辑,识别出三个持续存在的挑战:一是编辑者常常过度编辑,修改内容超出用户意图;二是现有模型大多为单轮交互,而多轮编辑可能影响对象的真实性;三是在约1000分辨率下的评估与实际工作流程不符,后者通常在超高清图像(如4K)上进行。提出了Agent Banana,一个层次化的代理规划-执行框架,用于高保真、对象感知和深思熟虑的编辑。Agent Banana引入了两个关键机制:(1)上下文折叠,将长交互历史压缩为结构化记忆,以实现稳定的长时间控制;(2)图像层解构,增强了编辑的灵活性和精确性。
📄 English Summary
Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
This study addresses instruction-based image editing within professional workflows and identifies three persistent challenges: (i) editors often over-edit, modifying content beyond the user's intent; (ii) existing models are predominantly single-turn, while multi-turn edits can compromise object fidelity; and (iii) evaluation at around 1K resolution is misaligned with real workflows that typically operate on ultra high-definition images (e.g., 4K). Agent Banana is proposed as a hierarchical agentic planner-executor framework for high-fidelity, object-aware, deliberative editing. It introduces two key mechanisms: (1) Context Folding, which compresses long interaction histories into structured memory for stable long-horizon control; and (2) Image Layer Decomposition, enhancing the flexibility and precision of editing.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等