AttnRes

Moonshot AI's Attention Residuals Challenge a Core Assumption of Modern LLMs

Mar 21, 2026

Moonshot AI's Kimi team proposes replacing transformer residual connections with a lightweight attention mechanism over prior layer outputs. The result: equivalent training performance at 1.25 times less compute, with gains confirmed across model sizes. It is the cleanest architectural challenge to a foundational LLM assumption in years.

No more posts tagged AttnRes. Browse the archive →