This folder contains annotated reading material for the lab page.
Follow one hidden tensor through Q/K/V projection and head reshape. This prevents most attention shape bugs.
Running code is optional. The expected outcome is that you can explain the mechanism, the relevant state/shape, the common misunderstanding, and the interview answer.