diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-03 17:27:08 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-03 17:27:08 -0500 |
| commit | 52b421fde3faa673e7007a456846f8195cb45942 (patch) | |
| tree | d60eba25ed0ca0880e3753def92bf7cd742205b2 /experiments/cifar_resmlp.py | |
| parent | 58e859d83c77002d22571003075150d7e20d18a4 (diff) | |
Fix CNN compute_bp_grads: remove inter-layer detach so gradients flow to all layers
Old code detached hidden states between layers, making layers 0-2 disconnected
from the loss (gradient = None → 0). Fixed by keeping the forward graph connected.
BP CNN Gamma per-layer now: [0.985, 0.990, 0.987, 0.967] (was [0, 0, 0, 0.967])
But gradient norms are ~1e-17 (genuine numerical precision issue with CNN architecture).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'experiments/cifar_resmlp.py')
0 files changed, 0 insertions, 0 deletions
