尺寸至关重要:从单目图像重建真实尺度三维模型用于食物份量估计
📄 中文摘要
慢性疾病如肥胖和糖尿病的日益增多,凸显了精确监测食物摄入量的必要性。尽管人工智能驱动的膳食评估近年来取得了显著进展,但从单目图像中恢复尺寸(份量)信息以准确估计“你吃了多少?”的病态问题仍是一个紧迫的挑战。现有的一些三维重建方法在几何重建方面表现出色,但未能有效解决真实世界尺度的恢复问题。本研究旨在通过开发一种新颖的方法来解决这一限制,该方法能够从单目图像中重建具有真实世界尺度的三维食物模型。核心思想是结合深度学习与几何约束,以克服单目视觉固有的尺度模糊性。具体来说,引入了一个多任务学习框架,其中包含一个用于预测深度图的深度估计网络和一个用于预测物体尺寸参考的尺寸回归网络。
📄 English Summary
Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
The increasing prevalence of chronic diseases linked to diet, such as obesity and diabetes, underscores the critical need for accurate food intake monitoring. While AI-driven dietary assessment has advanced significantly in recent years, the ill-posed problem of recovering size (portion) information from monocular images for precise estimation of “how much was consumed?” remains a pressing challenge. Existing 3D reconstruction methods have achieved impressive geometric fidelity but often fall short in addressing real-world scale recovery. This research aims to overcome this limitation by developing a novel approach capable of reconstructing real-scale 3D food models from monocular images. The core idea involves integrating deep learning with geometric constraints to resolve the inherent scale ambiguity in monocular vision. Specifically, a multi-task learning framework is introduced, comprising a depth estimation network for predicting depth maps and a size regression network for predicting object size references. These depth maps and size references are subsequently integrated into an optimization process to generate 3D mesh models with accurate physical dimensions. To enhance reconstruction accuracy, prior knowledge-based food shape constraints and textural information are leveraged. Extensive experiments on a diverse dataset of food images demonstrate the superior performance of this method in recovering the true dimensions and shapes of food items, thereby laying the groundwork for more accurate food portion estimation. Compared to methods focusing solely on geometric reconstruction, this approach exhibits significant improvements in metric scale, effectively addressing the difficulty of obtaining actual size information from monocular images.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等