GhazalBench：基于使用的波斯古诗评估大型语言模型

出处: GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

发布: 2026年3月12日

📄 中文摘要

波斯诗歌在伊朗文化实践中发挥着重要作用，经典诗人如哈菲兹的诗句常被引用、改写或根据部分提示进行补全。支持这种互动需要语言模型不仅理解诗歌的意义，还要掌握文化根深蒂固的表面形式。GhazalBench是一个评估大型语言模型（LLMs）在使用基础条件下与波斯古诗互动能力的基准。GhazalBench评估两种互补能力：生成忠实的对联散文改写，以及在不同语义和形式提示下访问经典诗句。在多种专有和开放权重的多语言LLMs中，观察到模型生成的结果存在一致的分离现象。

🏷️ 相关标签

#波斯诗歌 #大型语言模型 #GhazalBench #文化交互 #评估基准

📄 English Summary

GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

Persian poetry plays a significant role in Iranian cultural practices, with verses from canonical poets like Hafez frequently quoted, paraphrased, or completed from partial cues. Supporting such interactions necessitates that language models engage not only with the poetic meaning but also with culturally entrenched surface forms. GhazalBench is introduced as a benchmark for evaluating how large language models (LLMs) interact with Persian ghazals under usage-grounded conditions. It assesses two complementary abilities: generating faithful prose paraphrases of couplets and accessing canonical verses under varying semantic and formal cues. A consistent dissociation is observed across several proprietary and open-weight multilingual LLMs, indicating distinct performance characteristics.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

GhazalBench: Usage-Grounded Evaluation of LLMs on Persian Ghazals

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误