从不完整的电子健康记录数据中学习表征的双掩码自编码
📄 中文摘要
学习电子健康记录(EHRs)时间序列面临不规则采样、异质缺失和观察稀疏性等挑战。以往的自监督方法通常在学习之前进行插补,或通过专门的输入信号表示缺失情况,或仅优化插补过程,这降低了其有效学习支持临床下游任务的表征能力。提出了一种增强内在双掩码自编码器(AID-MAE),该方法直接从不完整的时间序列中学习,通过应用内在缺失掩码来表示自然缺失值,并使用增强掩码在训练期间隐藏一部分观察值以进行重建。AID-MAE仅处理未观察到的值,从而提高了对不完整数据的学习效率。
📄 English Summary
Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding
Learning from electronic health records (EHRs) time series is challenging due to irregular sampling, heterogeneous missingness, and the resulting sparsity of observations. Previous self-supervised methods either impute data before learning, represent missingness through a dedicated input signal, or focus solely on imputation, which reduces their ability to efficiently learn representations that support clinical downstream tasks. The Augmented-Intrinsic Dual-Masked Autoencoder (AID-MAE) is proposed to learn directly from incomplete time series by applying an intrinsic missing mask to represent naturally missing values and an augmented mask that hides a subset of observed values for reconstruction during training. AID-MAE processes only the unobserved values, thereby improving the learning efficiency from incomplete data.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等