📄 中文摘要
深度学习后门攻击对模型安全构成严重威胁,然而,与图像分类领域相比,其对目标检测的影响尚缺乏深入理解。尽管已有一些检测后门攻击方法被提出,但这些方法普遍存在关键性弱点:它们依赖不切实际的假设,并且缺乏物理世界中的验证。为了弥补这一空白,BadDet+框架被引入,它是一个基于惩罚机制的统一框架,旨在解决现有方法的局限性并提升后门攻击的鲁棒性与隐蔽性。BadDet+的核心思想是利用区域误分类(Region Misclassification)策略,通过在训练过程中引入精心设计的惩罚项,强制模型在特定触发器出现时,对目标区域的分类结果进行错误的预测。
📄 English Summary
BadDet+: Robust Backdoor Attacks for Object Detection
Backdoor attacks pose a significant threat to deep learning models, yet their implications for object detection remain less understood compared to image classification. While existing detection-based attack methods have been proposed, they suffer from critical weaknesses, primarily their reliance on unrealistic assumptions and a notable lack of physical validation. To address these limitations, BadDet+ is introduced as a penalty-based framework that unifies Region Misclassification strategies. This framework aims to enhance the robustness and stealthiness of backdoor attacks in object detection. The core idea behind BadDet+ involves incorporating meticulously designed penalty terms during the training phase. These penalties compel the model to misclassify or manipulate the detection outcomes of target regions when a specific trigger is present. Unlike methods that focus solely on pixel-level perturbations, BadDet+ emphasizes semantic tampering at the region level. Specifically, in the presence of a trigger, BadDet+ guides the model to incorrectly classify objects of a particular category into a predetermined erroneous class, completely ignore the existence of a target, or even generate spurious detections. BadDet+’s design explicitly considers real-world attack scenarios by introducing constraints on trigger size, position, and visibility. This ensures that the generated backdoor attacks exhibit higher stealthiness and robustness in practical deployments. Compared to conventional methods, BadDet+ can create backdoors that are more challenging for defense mechanisms to detect, while achieving a superior balance between attack success rate and minimal impact on the model's primary task performance.