灾害人道信息分类的轻量级大语言模型框架

📄 中文摘要

及时分类社交媒体上的人道信息对有效的灾害响应至关重要。然而,在资源有限的紧急环境中部署大型语言模型(LLMs)面临挑战。该研究开发了一种轻量级、成本效益高的框架,用于灾害推文分类,采用参数高效的微调方法。通过整合和规范化HumAID数据集(包含76,484条推文,涵盖19个灾害事件),构建了一个统一的实验语料库,形成了人道信息分类和事件类型识别的双任务基准。通过对提示策略、LoRA微调和检索增强生成(RAG)在Llama 3.1 8B上的系统评估,验证了该框架的有效性。

📄 English Summary

A Lightweight LLM Framework for Disaster Humanitarian Information Classification

Timely classification of humanitarian information from social media is crucial for effective disaster response. However, deploying large language models (LLMs) in resource-constrained emergency settings presents significant challenges. This study develops a lightweight, cost-effective framework for disaster tweet classification using parameter-efficient fine-tuning. A unified experimental corpus is constructed by integrating and normalizing the HumAID dataset, which contains 76,484 tweets across 19 disaster events, into a dual-task benchmark for humanitarian information categorization and event type identification. Systematic evaluation of prompting strategies, LoRA fine-tuning, and retrieval-augmented generation (RAG) on Llama 3.1 8B demonstrates the effectiveness of the proposed framework.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等