开放词汇检测器在航空影像上的迁移性：一项比较评估

出处: Do Open-Vocabulary Detectors Transfer to Aerial Imagery? A Comparative Evaluation

发布: 2026年2月2日

📄 中文摘要

开放词汇目标检测（OVD）技术通过视觉-语言模型实现了对新颖类别的零样本识别，在自然图像领域展现出卓越性能。然而，其在航空影像领域的迁移能力尚未被探索。本研究首次系统性地评估了五种最先进的开放词汇目标检测模型在LAE-80C航空数据集上的表现。LAE-80C数据集包含3,592张图像和80个类别，评估严格遵循零样本条件。实验协议旨在隔离模型对语义概念的理解能力，而非其对特定图像特征的泛化能力。通过对这些模型的性能进行深入分析，揭示了当前开放词汇检测器在航空影像领域面临的挑战和机遇。

🏷️ 相关标签

#开放词汇目标检测 #航空影像 #零样本学习 #模型评估 #视觉-语言模型

📄 English Summary

Do Open-Vocabulary Detectors Transfer to Aerial Imagery? A Comparative Evaluation

Open-vocabulary object detection (OVD) leverages vision-language models for zero-shot recognition of novel categories, achieving strong performance on natural images. Nevertheless, its transferability to aerial imagery remains largely unexplored. This research presents the first systematic benchmark evaluating five state-of-the-art OVD models on the LAE-80C aerial dataset, which comprises 3,592 images and 80 categories, under strict zero-shot conditions. The experimental protocol is meticulously designed to isolate the models' understanding of semantic concepts from their generalization capabilities to specific image features. An in-depth analysis of these models' performance reveals the challenges and opportunities for current open-vocabulary detectors in the aerial imagery domain. Results indicate that despite their excellent performance on natural images, a significant performance drop is observed on aerial imagery due to differences in perspective, object scale, background complexity, and category distribution. Specifically, the zero-shot recognition capabilities of these models struggle with detecting small, densely packed, or highly occluded aerial targets. The evaluation also investigates the impact of different vision-language model architectures on performance, identifying which model design characteristics contribute to improved detection accuracy and robustness in aerial imagery scenarios. The experimental data provides a crucial benchmark and direction for future development of more effective open-vocabulary detectors tailored for aerial imagery, emphasizing the need for further research to bridge the domain gap between natural and aerial images for more efficient zero-shot object recognition.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Do Open-Vocabulary Detectors Transfer to Aerial Imagery? A Comparative Evaluation

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误