临床症状检测中自主代理工作流程的优化不稳定性

📄 中文摘要

自主代理工作流程通过迭代自我优化行为展现出巨大的潜力,但其失败模式尚未得到充分表征。研究了优化不稳定性这一现象,即持续的自主改进反而导致分类器性能下降。利用开源框架Pythia进行自动化提示优化,对三种不同流行率的临床症状进行了评估:呼吸急促(23%)、胸痛(12%)和长期新冠脑雾(3%)。观察到验证敏感性在迭代过程中在1.0和0.0之间波动,且其严重程度与类别流行率成反比。在3%流行率下,系统在检测到零个阳性病例的情况下实现了95%的准确率,这一失败模式凸显了优化过程中的潜在风险。

📄 English Summary

Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection

Autonomous agentic workflows that iteratively refine their behavior show great promise, yet their failure modes are not well characterized. This research investigates optimization instability, a phenomenon where continued autonomous improvement paradoxically degrades classifier performance. Using Pythia, an open-source framework for automated prompt optimization, three clinical symptoms with varying prevalence were evaluated: shortness of breath (23%), chest pain (12%), and Long COVID brain fog (3%). Validation sensitivity was observed to oscillate between 1.0 and 0.0 across iterations, with severity inversely proportional to class prevalence. At a prevalence of 3%, the system achieved 95% accuracy while detecting zero positive cases, highlighting the potential risks inherent in the optimization process.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等