📄 中文摘要
DMCD(数据映射因果发现)是一种两阶段因果发现框架,结合了基于大语言模型的变量元数据语义草拟与观察数据的统计验证。在第一阶段,大语言模型提出一个稀疏的草拟有向无环图(DAG),作为可能因果结构空间的语义先验。在第二阶段,通过条件独立性检验对该草拟进行审计和精炼,检测到的差异指导针对性的边缘修订。该方法在三个丰富元数据的真实世界基准数据集上进行了评估,涵盖工业工程、环境监测和IT系统分析。在这些数据集中,DMCD在因果发现方面实现了与多种方法的竞争性或领先性能。
📄 English Summary
DMCD: Semantic-Statistical Framework for Causal Discovery
DMCD (DataMap Causal Discovery) is a two-phase causal discovery framework that integrates LLM-based semantic drafting from variable metadata with statistical validation on observational data. In Phase I, a large language model proposes a sparse draft directed acyclic graph (DAG), serving as a semantically informed prior over the space of possible causal structures. In Phase II, this draft is audited and refined through conditional independence testing, with detected discrepancies guiding targeted edge revisions. The approach is evaluated on three metadata-rich real-world benchmarks spanning industrial engineering, environmental monitoring, and IT systems analysis. Across these datasets, DMCD achieves competitive or leading performance against various causal discovery methods.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等