📄 中文摘要
针对北极严苛环境下驯鹿(Rangifer tarandus)数量下降的挑战,提出了一种弱监督框架,旨在实现大规模、高精度的野生动物检测和计数。当前,依赖人工判读的图像分析方法耗时且易错,尤其是在背景异质性强、空旷地形占主导、以及目标物体(如驯鹿)在不同场景(如雪地、苔原、水域)中外观变化大、尺度差异显著、数量密集或稀疏等复杂条件下,自动检测面临巨大挑战。此外,由于北极区域的偏远性,获取大规模标注数据成本极高,进一步限制了传统全监督深度学习方法的应用。本框架通过利用有限或不完整的标注信息,例如仅提供图像级标签或稀疏点注释,训练能够识别和计数驯鹿的模型。核心技术包括多实例学习(Multiple Instance Learning, MIL)范式,将图像视为包含多个潜在驯鹿实例的“包”,并通过学习包的标签来推断单个实例的存在。同时,结合自监督学习策略,利用未标注数据中的内在结构信息,提升模型在不同环境下的泛化能力。为了克服目标尺度变化和密集分布问题,引入了基于密度图回归的方法,将计数任务转化为预测像素级密度图,从而实现更精确的数量估计和定位。针对背景异质性,设计了鲁棒的特征提取模块,能够区分前景目标与复杂背景。该框架在实际北极驯鹿数据集上进行了验证,展示了其在低标注成本下实现高精度检测和计数的潜力,为野生动物监测和保护提供了有效工具,并为后续基于证据的保护行动和政策制定提供数据支持。
📄 English Summary
Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)
Addressing the decline of caribou (Rangifer tarandus) populations across the Arctic, a weakly supervised framework is proposed for scalable and accurate wildlife detection and counting in challenging environments. Manual interpretation of imagery is labor-intensive and prone to errors, particularly given severe background heterogeneity, dominant empty terrain, significant variations in target appearance across scenes (e.g., snow, tundra, water), diverse object scales, and varying densities (sparse to crowded). Furthermore, the remote nature of Arctic regions makes acquiring large-scale annotated data prohibitively expensive, thus limiting the applicability of traditional fully supervised deep learning methods. This framework leverages limited or incomplete annotation information, such as image-level labels or sparse point annotations, to train models capable of identifying and counting caribou. Key techniques include a Multiple Instance Learning (MIL) paradigm, where images are treated as 'bags' containing potential caribou instances, inferring individual instance presence from bag labels. Concurrently, self-supervised learning strategies are incorporated to exploit intrinsic structural information within unlabeled data, enhancing model generalization across diverse environments. To address target scale variations and dense distributions, a density map regression approach is introduced, reformulating the counting task as predicting pixel-level density maps for more accurate quantity estimation and localization. For background heterogeneity, a robust feature extraction module is designed to differentiate foreground targets from complex backgrounds. The framework is validated on real-world Arctic caribou datasets, demonstrating its potential for achieving high-precision detection and counting with low annotation costs. This provides an effective tool for wildlife monitoring and conservation, offering data support for evidence-based conservation actions and policy decisions.