LATMiX:可学习的仿射变换用于大规模语言模型的微缩量化

📄 中文摘要

后训练量化(PTQ)是一种广泛应用于降低大型语言模型(LLMs)内存和计算成本的方法。研究表明,对激活应用可逆变换可以显著提高量化的鲁棒性,减少激活异常值。然而,现有方法主要局限于旋转或Hadamard变换。此外,大多数研究主要集中在传统量化方案上,而现代硬件越来越支持微缩(MX)数据格式。尝试将两者结合的工作表现出严重的性能下降,导致先前的研究对变换提出了一些假设。该研究采用了互补的视角,首先提出了一种新的可学习的仿射变换方法,以提高微缩量化的性能和鲁棒性。

📄 English Summary

LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

Post-training quantization (PTQ) is a widely adopted method for reducing the memory and computational costs of large language models (LLMs). Recent research indicates that applying invertible transformations to activations can significantly enhance quantization robustness by mitigating activation outliers. However, existing methods are primarily limited to rotation or Hadamard-based transformations. Furthermore, most studies have predominantly focused on traditional quantization schemes, while modern hardware increasingly supports the microscaling (MX) data format. Attempts to integrate both approaches have resulted in significant performance degradation, leading prior work to impose assumptions on the transformations. This research presents a novel learnable affine transformation method aimed at improving the performance and robustness of microscaling quantization.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等