GhazalBench:基于使用的波斯古诗评估大型语言模型

📄 中文摘要

波斯诗歌在伊朗文化实践中发挥着重要作用,经典诗人如哈菲兹的诗句常被引用、改写或根据部分提示进行补全。支持这种互动需要语言模型不仅理解诗歌的意义,还要掌握文化根深蒂固的表面形式。GhazalBench是一个评估大型语言模型(LLMs)在使用基础条件下与波斯古诗互动能力的基准。GhazalBench评估两种互补能力:生成忠实的对联散文改写,以及在不同语义和形式提示下访问经典诗句。在多种专有和开放权重的多语言LLMs中,观察到模型生成的结果存在一致的分离现象。

📄 English Summary

GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

Persian poetry plays a significant role in Iranian cultural practices, with verses from canonical poets like Hafez frequently quoted, paraphrased, or completed from partial cues. Supporting such interactions necessitates that language models engage not only with the poetic meaning but also with culturally entrenched surface forms. GhazalBench is introduced as a benchmark for evaluating how large language models (LLMs) interact with Persian ghazals under usage-grounded conditions. It assesses two complementary abilities: generating faithful prose paraphrases of couplets and accessing canonical verses under varying semantic and formal cues. A consistent dissociation is observed across several proprietary and open-weight multilingual LLMs, indicating distinct performance characteristics.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等