|
9-branch dissection results:
- zero_target crashes (-9.1%): aux must output non-zero
- constant_input neutral (+0.0%): needs at least depth info
- time_only works (+1.0%): h_l not needed, just depth index
- shuffled/fresh_random work (+1.3-1.4%): no semantic content needed
- prefit60_trainable ≈ random_trainable: prefit adds nothing
- All frozen branches crash: trainability is essential
Mechanism: depth-aware trainable auxiliary perturbation that diversifies
block-local updates. Not semantic credit, not pure trainability.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|