加速因子:定量多迭代主动学习性能指标

📄 中文摘要

机器学习模型在拥有大量标注数据时表现优异,但标注过程往往成本高且耗时。主动学习(AL)旨在通过查询方法(QM)迭代选择最具信息量的样本,从而提高性能与标注的比率。尽管AL研究主要集中在QM的开发上,但对这一迭代过程的评估缺乏适当的性能指标。该研究回顾了八年的AL评估文献,正式引入了加速因子这一定量多迭代QM性能指标,表明所需样本的比例,以匹配随机采样的性能。通过使用来自不同领域的四个数据集和七种不同类型的QM,进行了加速因子的实证评估。

📄 English Summary

The Speed-up Factor: A Quantitative Multi-Iteration Active Learning Performance Metric

Machine learning models perform well with abundant annotated data, yet annotation is often costly and time-consuming. Active learning (AL) seeks to enhance the performance-to-annotation ratio by employing query methods (QMs) to iteratively select the most informative samples. While AL research primarily focuses on the development of QMs, the evaluation of this iterative process lacks appropriate performance metrics. This study reviews eight years of AL evaluation literature and formally introduces the speed-up factor, a quantitative multi-iteration QM performance metric that indicates the fraction of samples needed to match random sampling performance. Empirical evaluation of the speed-up factor is conducted using four datasets from diverse domains and seven QMs of various types.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等