📄 中文摘要
在欺诈检测领域,结合简单的领域规则与损失函数的尝试初看似乎能显著提升在超不平衡数据上的检测效果。然而,在修正了一个阈值错误并在五个不同随机种子上进行全面测试后,最初的“巨大胜利”大部分消失。最终得到的启示是,在稀有事件问题(如欺诈)中,成功的衡量方式(阈值、种子、指标)往往比模型本身更容易误导。虽然规则确实在排名上有微小的改善(在ROC-AUC中可以看到),但实际收益却是微小且脆弱的。这一过程揭示了错误、方差和所学到的教训。
📄 English Summary
Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Domain Rules
The attempt to enhance fraud detection by incorporating simple domain rules into the loss function initially appeared promising, particularly on highly imbalanced data. However, after fixing a threshold bug and running comprehensive tests across five different random seeds, the initial 'huge win' largely dissipated. The key takeaway is that in rare-event problems like fraud, the metrics used to measure success (thresholds, seeds, metrics) can mislead more than the model itself. While the rule does slightly nudge the rankings (as evidenced by consistent ROC-AUC improvements), the actual gains are small and fragile. This experience highlights the importance of understanding bugs, variance, and the lessons learned throughout the process.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等