📄 中文摘要
在工业场景中,数据增强是一种有效提升模型性能的方法。然而,其好处并非单向的。当前缺乏关于增强中最佳样本大小(OSS)的理论研究或估计,也没有建立评估OSS准确性或其与真实值偏差的指标。为了解决这些问题,提出了一种基于信息论的最佳样本大小估计方法(IT-OSE),以提供可靠的OSS估计。提出了区间覆盖和偏差(ICD)评分,以直观地评估估计的OSS。此外,理论分析和公式化了OSS与主导因素之间的关系,从而增强了对数据增强过程的理解。
📄 English Summary
IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation
In industrial contexts, data augmentation serves as an effective method to enhance model performance. However, its advantages are not uniformly beneficial. There is a lack of theoretical research or established estimations for the optimal sample size (OSS) in augmentation, as well as no established metrics to assess the accuracy of OSS or its deviation from the ground truth. To address these gaps, an information-theoretic optimal sample size estimation (IT-OSE) is proposed to provide reliable OSS estimation for industrial data augmentation. An interval coverage and deviation (ICD) score is introduced to intuitively evaluate the estimated OSS. The relationship between OSS and dominant factors is theoretically analyzed and formulated, thereby enhancing the understanding of the data augmentation process.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等