OptiML:一个端到端的程序合成与CUDA内核优化框架

📄 中文摘要

生成高性能的CUDA内核面临着挑战,因为需要在噪声和昂贵的硬件反馈下导航低级变换的组合空间。尽管大型语言模型能够合成功能上正确的CUDA代码,但要实现竞争力的性能,需要对优化选择进行系统的探索和验证。OptiML被提出作为一个端到端框架,将自然语言意图或输入的CUDA代码映射到性能优化的CUDA内核,通过将内核优化形式化为验证下的搜索。OptiML由两个解耦的阶段组成。当输入为自然语言时,Mixture-of-Thoughts生成器(OptiML-G)作为内核实现的提议策略。

📄 English Summary

OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization

Generating high-performance CUDA kernels remains a challenge due to the need to navigate a combinatorial space of low-level transformations under noisy and expensive hardware feedback. While large language models can synthesize functionally correct CUDA code, achieving competitive performance requires systematic exploration and verification of optimization choices. This research presents OptiML, an end-to-end framework that maps either natural-language intent or input CUDA code to performance-optimized CUDA kernels by formulating kernel optimization as search under verification. OptiML consists of two decoupled stages. When the input is natural language, a Mixture-of-Thoughts generator (OptiML-G) acts as a proposal policy over kernel implementation strategies.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等