From 05c935ab03ee0bdb8597d19466192dfb92ee889d Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 22 Apr 2026 23:46:33 -0500 Subject: Add vanilla FA (Lillicrap 2016) implementation + full experiment suite MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PAPER-CHANGING FINDING: FA is dramatically different from DFA on the same architecture. FA has genuine deep credit quality where DFA has none. Implementation: - experiments/cifar_resmlp.py: added train_fa() + FA diagnostic support FA uses sequential backward credit propagation with d×d random matrices (a_l = B_l @ a_{l+1}) instead of DFA's direct output-error projection (a_l = B_l^T @ e_T). Same local loss form . Core results (A-H, 100ep 3-seed d=256 terminal-LN ResMLP): FA main audit: 0.401 ± 0.009 (DFA: 0.306 ± 0.008) +9.5 pp FA vs frozen: +5.2 pp ABOVE baseline (DFA: -4.3 pp below) FA deep cos: +0.33 (DFA: ~0 degenerate) FA ||h_L||: ~10^5 (DFA: ~5×10^8) 3 OOM less growth FA ||g_L||: ~10^-6 meaningful (DFA: ~10^-10 floor) Mode 1(b) fires: NO for FA; YES for DFA FA+pen lam=1e-2: 0.369 ± 0.003 (DFA+pen: 0.360 ± 0.002) FA+pen lam=1e-4: 0.377 ± 0.006 (DFA+pen lam=1e-4: 0.360) At lam=1e-4, FA already has deep cos +0.30 while DFA has -0.02 FA random-target: acc 0.12 (chance), h_L=1.3e5 (DFA: 1.7e8) FA early 5ep: deep cos already +0.32 (DFA ep1: -0.008) Extension results (d=512 depth sweep, 100ep, s42): L=2: FA 0.350, cos +0.96 (DFA: n/a) L=4: FA 0.424, cos +0.29 (DFA: n/a) L=6: FA 0.401, cos +0.16 (DFA: n/a) L=8: FA 0.409, cos +0.11 (DFA: 0.306, cos -0.0001) L=12: FA 0.404, cos +0.09 (DFA: 0.309, cos -0.0001) FA deep cos is positive at EVERY depth; DFA is ~0 everywhere. FA accuracy exceeds DFA by 5-10 pp at L=8 and L=12. This is the strongest empirical support for the Mode 2 → Mode 1 hypothesis: same local loss, same architecture, same optimizer — only the credit signal differs. FA's sequential propagation produces much better per-layer credit (cos +0.33 vs ~0), which prevents the catastrophic activation growth that DFA exhibits. Co-Authored-By: Claude Opus 4.6 (1M context) --- results/fa_extension_experiments.log | 159 +++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 results/fa_extension_experiments.log (limited to 'results/fa_extension_experiments.log') diff --git a/results/fa_extension_experiments.log b/results/fa_extension_experiments.log new file mode 100644 index 0000000..ffc2805 --- /dev/null +++ b/results/fa_extension_experiments.log @@ -0,0 +1,159 @@ +========================================== +FA EXTENSION EXPERIMENTS (I-M) +========================================== +Start: Wed Apr 22 10:26:41 PM CDT 2026 + +=== I: FA+pen lam=1e-4 (30ep, 3 seeds) === +Using device: cuda:0 + +============================================================ +Seed 42 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0346, train=0.2557, test=0.2909 + [FA] Epoch 10: loss=1.8700, train=0.3346, test=0.3635 + [FA] Epoch 20: loss=1.8495, train=0.3436, test=0.3682 + [FA] Epoch 30: loss=1.8430, train=0.3521, test=0.3759 + Final test acc: 0.3759 + +============================================================ +Seed 123 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0259, train=0.2600, test=0.3099 + [FA] Epoch 10: loss=1.8666, train=0.3358, test=0.3532 + [FA] Epoch 20: loss=1.8505, train=0.3472, test=0.3685 + [FA] Epoch 30: loss=1.8391, train=0.3530, test=0.3725 + Final test acc: 0.3725 + +============================================================ +Seed 456 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0371, train=0.2562, test=0.2999 + [FA] Epoch 10: loss=1.8573, train=0.3373, test=0.3567 + [FA] Epoch 20: loss=1.8335, train=0.3500, test=0.3831 + [FA] Epoch 30: loss=1.8207, train=0.3574, test=0.3837 + Final test acc: 0.3837 + +All results saved to results/fa_penalty_lam1e-4_30ep/results_cifar10.json + +=== L: FA d=512 depth sweep (100ep, s42) === + L=2 +Using device: cuda:0 + +============================================================ +Seed 42 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0612, train=0.2476, test=0.3028 + [FA] Epoch 10: loss=1.8290, train=0.3435, test=0.3705 + [FA] Epoch 20: loss=1.8102, train=0.3489, test=0.3634 + [FA] Epoch 30: loss=1.7963, train=0.3546, test=0.3398 + [FA] Epoch 40: loss=1.7775, train=0.3605, test=0.3497 + [FA] Epoch 50: loss=1.7610, train=0.3685, test=0.3288 + [FA] Epoch 60: loss=1.7592, train=0.3704, test=0.3376 + [FA] Epoch 70: loss=1.7588, train=0.3747, test=0.3421 + [FA] Epoch 80: loss=1.7564, train=0.3751, test=0.3497 + [FA] Epoch 90: loss=1.7543, train=0.3769, test=0.3472 + [FA] Epoch 100: loss=1.7527, train=0.3768, test=0.3495 + Final test acc: 0.3495 + +All results saved to results/fa_depth_scan_d512/results_cifar10.json + L=4 +Using device: cuda:0 + +============================================================ +Seed 42 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0301, train=0.2531, test=0.2917 + [FA] Epoch 10: loss=1.8487, train=0.3366, test=0.3541 + [FA] Epoch 20: loss=1.7864, train=0.3609, test=0.3908 + [FA] Epoch 30: loss=1.7510, train=0.3724, test=0.3990 + [FA] Epoch 40: loss=1.7387, train=0.3767, test=0.3946 + [FA] Epoch 50: loss=1.7209, train=0.3875, test=0.4165 + [FA] Epoch 60: loss=1.7052, train=0.3913, test=0.4173 + [FA] Epoch 70: loss=1.6945, train=0.3963, test=0.4137 + [FA] Epoch 80: loss=1.6868, train=0.4018, test=0.4219 + [FA] Epoch 90: loss=1.6830, train=0.4009, test=0.4250 + [FA] Epoch 100: loss=1.6781, train=0.4021, test=0.4244 + Final test acc: 0.4244 + +All results saved to results/fa_depth_scan_d512/results_cifar10.json + L=6 +Using device: cuda:0 + +============================================================ +Seed 42 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0375, train=0.2474, test=0.2938 + [FA] Epoch 10: loss=1.8616, train=0.3294, test=0.3541 + [FA] Epoch 20: loss=1.8289, train=0.3459, test=0.3711 + [FA] Epoch 30: loss=1.7992, train=0.3579, test=0.3857 + [FA] Epoch 40: loss=1.7837, train=0.3631, test=0.3942 + [FA] Epoch 50: loss=1.7699, train=0.3710, test=0.3921 + [FA] Epoch 60: loss=1.7550, train=0.3741, test=0.3975 + [FA] Epoch 70: loss=1.7439, train=0.3770, test=0.4058 + [FA] Epoch 80: loss=1.7413, train=0.3796, test=0.4014 + [FA] Epoch 90: loss=1.7382, train=0.3791, test=0.4008 + [FA] Epoch 100: loss=1.7363, train=0.3785, test=0.4014 + Final test acc: 0.4014 + +All results saved to results/fa_depth_scan_d512/results_cifar10.json + L=8 +Using device: cuda:0 + +============================================================ +Seed 42 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0431, train=0.2481, test=0.2960 + [FA] Epoch 10: loss=1.8619, train=0.3303, test=0.3574 + [FA] Epoch 20: loss=1.8163, train=0.3500, test=0.3617 + [FA] Epoch 30: loss=1.7889, train=0.3612, test=0.3795 + [FA] Epoch 40: loss=1.7651, train=0.3681, test=0.3955 + [FA] Epoch 50: loss=1.7509, train=0.3738, test=0.4002 + [FA] Epoch 60: loss=1.7385, train=0.3783, test=0.4060 + [FA] Epoch 70: loss=1.7297, train=0.3819, test=0.4046 + [FA] Epoch 80: loss=1.7255, train=0.3861, test=0.4064 + [FA] Epoch 90: loss=1.7214, train=0.3872, test=0.4076 + [FA] Epoch 100: loss=1.7181, train=0.3879, test=0.4094 + Final test acc: 0.4094 + +All results saved to results/fa_depth_scan_d512/results_cifar10.json + L=12 +Using device: cuda:0 + +============================================================ +Seed 42 +============================================================ + +--- FA --- + [FA] Epoch 1: loss=2.0427, train=0.2406, test=0.2963 + [FA] Epoch 10: loss=1.8510, train=0.3333, test=0.3712 + [FA] Epoch 20: loss=1.8069, train=0.3520, test=0.3747 + [FA] Epoch 30: loss=1.7837, train=0.3593, test=0.3827 + [FA] Epoch 40: loss=1.7677, train=0.3684, test=0.4088 + [FA] Epoch 50: loss=1.7521, train=0.3730, test=0.3905 + [FA] Epoch 60: loss=1.7441, train=0.3767, test=0.4042 + [FA] Epoch 70: loss=1.7356, train=0.3807, test=0.4046 + [FA] Epoch 80: loss=1.7307, train=0.3824, test=0.4037 + [FA] Epoch 90: loss=1.7268, train=0.3839, test=0.4014 + [FA] Epoch 100: loss=1.7295, train=0.3834, test=0.4035 + Final test acc: 0.4035 + +All results saved to results/fa_depth_scan_d512/results_cifar10.json + +========================================== +FA EXTENSION EXPERIMENTS DONE +End: Wed Apr 22 11:13:24 PM CDT 2026 +========================================== -- cgit v1.2.3