Moonshot AI's Attention Residuals Challenge a Core Assumption of Modern LLMs
Mar 21, 2026AI Research·Noah OgbiMar 21
Moonshot AI's Kimi team proposes replacing transformer residual connections with a lightweight attention mechanism over prior layer outputs. The result: equivalent training performance at 1.25 times less compute, with gains confirmed across model sizes. It is the cleanest architectural challenge to a foundational LLM assumption in years.