从不完整的电子健康记录数据中学习表征的双掩码自编码

出处: Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

发布: 2026年2月18日

📄 中文摘要

学习电子健康记录（EHRs）时间序列面临不规则采样、异质缺失和观察稀疏性等挑战。以往的自监督方法通常在学习之前进行插补，或通过专门的输入信号表示缺失情况，或仅优化插补过程，这降低了其有效学习支持临床下游任务的表征能力。提出了一种增强内在双掩码自编码器（AID-MAE），该方法直接从不完整的时间序列中学习，通过应用内在缺失掩码来表示自然缺失值，并使用增强掩码在训练期间隐藏一部分观察值以进行重建。AID-MAE仅处理未观察到的值，从而提高了对不完整数据的学习效率。

🏷️ 相关标签

#电子健康记录 #自监督学习 #缺失数据 #时间序列 #自编码器

📄 English Summary

Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

Learning from electronic health records (EHRs) time series is challenging due to irregular sampling, heterogeneous missingness, and the resulting sparsity of observations. Previous self-supervised methods either impute data before learning, represent missingness through a dedicated input signal, or focus solely on imputation, which reduces their ability to efficiently learn representations that support clinical downstream tasks. The Augmented-Intrinsic Dual-Masked Autoencoder (AID-MAE) is proposed to learn directly from incomplete time series by applying an intrinsic missing mask to represent naturally missing values and an augmented mask that hides a subset of observed values for reconstruction during training. AID-MAE processes only the unobserved values, thereby improving the learning efficiency from incomplete data.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误