summaryrefslogtreecommitdiff
path: root/experiments/cifar_resmlp.py
AgeCommit message (Collapse)Author
2026-04-22Add vanilla FA (Lillicrap 2016) implementation + full experiment suiteYurenHao0426
PAPER-CHANGING FINDING: FA is dramatically different from DFA on the same architecture. FA has genuine deep credit quality where DFA has none. Implementation: - experiments/cifar_resmlp.py: added train_fa() + FA diagnostic support FA uses sequential backward credit propagation with d×d random matrices (a_l = B_l @ a_{l+1}) instead of DFA's direct output-error projection (a_l = B_l^T @ e_T). Same local loss form <f_l, a_l>. Core results (A-H, 100ep 3-seed d=256 terminal-LN ResMLP): FA main audit: 0.401 ± 0.009 (DFA: 0.306 ± 0.008) +9.5 pp FA vs frozen: +5.2 pp ABOVE baseline (DFA: -4.3 pp below) FA deep cos: +0.33 (DFA: ~0 degenerate) FA ||h_L||: ~10^5 (DFA: ~5×10^8) 3 OOM less growth FA ||g_L||: ~10^-6 meaningful (DFA: ~10^-10 floor) Mode 1(b) fires: NO for FA; YES for DFA FA+pen lam=1e-2: 0.369 ± 0.003 (DFA+pen: 0.360 ± 0.002) FA+pen lam=1e-4: 0.377 ± 0.006 (DFA+pen lam=1e-4: 0.360) At lam=1e-4, FA already has deep cos +0.30 while DFA has -0.02 FA random-target: acc 0.12 (chance), h_L=1.3e5 (DFA: 1.7e8) FA early 5ep: deep cos already +0.32 (DFA ep1: -0.008) Extension results (d=512 depth sweep, 100ep, s42): L=2: FA 0.350, cos +0.96 (DFA: n/a) L=4: FA 0.424, cos +0.29 (DFA: n/a) L=6: FA 0.401, cos +0.16 (DFA: n/a) L=8: FA 0.409, cos +0.11 (DFA: 0.306, cos -0.0001) L=12: FA 0.404, cos +0.09 (DFA: 0.309, cos -0.0001) FA deep cos is positive at EVERY depth; DFA is ~0 everywhere. FA accuracy exceeds DFA by 5-10 pp at L=8 and L=12. This is the strongest empirical support for the Mode 2 → Mode 1 hypothesis: same local loss, same architecture, same optimizer — only the credit signal differs. FA's sequential propagation produces much better per-layer credit (cos +0.33 vs ~0), which prevents the catastrophic activation growth that DFA exhibits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 38: add --penalty_lam flag to cifar_resmlp.py for Mode 2 cross-method testYurenHao0426
Patches: - main(): add --penalty_lam (separate from CB's bridge temperature args.lam) - train_dfa block update (line 195): add penalty_lam * (f_l**2).sum(-1).mean() - train_state_bridge block update (line 326): same penalty - train_credit_bridge block update (line 533): same penalty Codex round 38 GO STAGE: keep penalty separate from CB lam, blocks-only, sanity-check that hidden_norms remain nontrivial (not silencing the blocks). 2-epoch smoke (results/round38_smoke_sbcb_pen) passes the silencing check: SB ||h_L||=229, CB ||h_L||=1258, both nontrivial. Deep cosines positive across all layers for SB ([0.28, 0.25, 0.23]) and rising for CB ([0.04, 0.08, 0.13, 0.15]). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 35: SB and CB also show data-agnostic Mode 1 growth on random targetsYurenHao0426
- experiments/cifar_resmlp.py: add --methods filter and --random_targets flag; extend compute_diagnostics to log hidden_norms_per_layer and bp_grad_norms_per_layer - paper/main.tex §3 ¶1: broaden random-target finding to all 3 fixed-feedback methods (DFA: ||h_L||=14510, SB: ||h_L||=6225, CB: ||h_L||=19974 at ep 3, all at chance acc) - paper/main.tex Appendix J: extended with cross-method smoke-test table This generalizes the §3 mechanism story from 'DFA-specific' to 'all 3 audited fixed-feedback local-credit methods'. Combined with rounds 32-34, the proximate cause of Mode 1 (a) is now well-localized: - Not requires residual skip (round 33 H2 walkback) - Not requires task signal (round 34 random targets, DFA) - Not DFA-specific (round 35 random targets, SB+CB) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23Initial implementation: all models, methods, toy and CIFAR experimentsYurenHao0426
Debug phase. Toy LQ experiments (3 seeds) complete with terminal gradient matching. Credit bridge matches state bridge on linear system (~0.94 cosine). CIFAR experiments in progress.