From 05c935ab03ee0bdb8597d19466192dfb92ee889d Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Wed, 22 Apr 2026 23:46:33 -0500
Subject: Add vanilla FA (Lillicrap 2016) implementation + full experiment
 suite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

PAPER-CHANGING FINDING: FA is dramatically different from DFA on the
same architecture. FA has genuine deep credit quality where DFA has none.

Implementation:
- experiments/cifar_resmlp.py: added train_fa() + FA diagnostic support
  FA uses sequential backward credit propagation with d×d random matrices
  (a_l = B_l @ a_{l+1}) instead of DFA's direct output-error projection
  (a_l = B_l^T @ e_T). Same local loss form <f_l, a_l>.

Core results (A-H, 100ep 3-seed d=256 terminal-LN ResMLP):

  FA main audit:    0.401 ± 0.009 (DFA: 0.306 ± 0.008)  +9.5 pp
  FA vs frozen:     +5.2 pp ABOVE baseline (DFA: -4.3 pp below)
  FA deep cos:      +0.33 (DFA: ~0 degenerate)
  FA ||h_L||:       ~10^5 (DFA: ~5×10^8)  3 OOM less growth
  FA ||g_L||:       ~10^-6 meaningful (DFA: ~10^-10 floor)
  Mode 1(b) fires:  NO for FA; YES for DFA

  FA+pen lam=1e-2:  0.369 ± 0.003 (DFA+pen: 0.360 ± 0.002)
  FA+pen lam=1e-4:  0.377 ± 0.006 (DFA+pen lam=1e-4: 0.360)
    At lam=1e-4, FA already has deep cos +0.30 while DFA has -0.02

  FA random-target: acc 0.12 (chance), h_L=1.3e5 (DFA: 1.7e8)
  FA early 5ep:     deep cos already +0.32 (DFA ep1: -0.008)

Extension results (d=512 depth sweep, 100ep, s42):
  L=2:  FA 0.350, cos +0.96  (DFA: n/a)
  L=4:  FA 0.424, cos +0.29  (DFA: n/a)
  L=6:  FA 0.401, cos +0.16  (DFA: n/a)
  L=8:  FA 0.409, cos +0.11  (DFA: 0.306, cos -0.0001)
  L=12: FA 0.404, cos +0.09  (DFA: 0.309, cos -0.0001)

FA deep cos is positive at EVERY depth; DFA is ~0 everywhere.
FA accuracy exceeds DFA by 5-10 pp at L=8 and L=12.

This is the strongest empirical support for the Mode 2 → Mode 1
hypothesis: same local loss, same architecture, same optimizer —
only the credit signal differs. FA's sequential propagation produces
much better per-layer credit (cos +0.33 vs ~0), which prevents the
catastrophic activation growth that DFA exhibits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 results/fa_extension_experiments.log | 159 +++++++++++++++++++++++++++++++++++
 1 file changed, 159 insertions(+)
 create mode 100644 results/fa_extension_experiments.log

(limited to 'results/fa_extension_experiments.log')

diff --git a/results/fa_extension_experiments.log b/results/fa_extension_experiments.log
new file mode 100644
index 0000000..ffc2805
--- /dev/null
+++ b/results/fa_extension_experiments.log
@@ -0,0 +1,159 @@
+==========================================
+FA EXTENSION EXPERIMENTS (I-M)
+==========================================
+Start: Wed Apr 22 10:26:41 PM CDT 2026
+
+=== I: FA+pen lam=1e-4 (30ep, 3 seeds) ===
+Using device: cuda:0
+
+============================================================
+Seed 42
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0346, train=0.2557, test=0.2909
+  [FA] Epoch 10: loss=1.8700, train=0.3346, test=0.3635
+  [FA] Epoch 20: loss=1.8495, train=0.3436, test=0.3682
+  [FA] Epoch 30: loss=1.8430, train=0.3521, test=0.3759
+  Final test acc: 0.3759
+
+============================================================
+Seed 123
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0259, train=0.2600, test=0.3099
+  [FA] Epoch 10: loss=1.8666, train=0.3358, test=0.3532
+  [FA] Epoch 20: loss=1.8505, train=0.3472, test=0.3685
+  [FA] Epoch 30: loss=1.8391, train=0.3530, test=0.3725
+  Final test acc: 0.3725
+
+============================================================
+Seed 456
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0371, train=0.2562, test=0.2999
+  [FA] Epoch 10: loss=1.8573, train=0.3373, test=0.3567
+  [FA] Epoch 20: loss=1.8335, train=0.3500, test=0.3831
+  [FA] Epoch 30: loss=1.8207, train=0.3574, test=0.3837
+  Final test acc: 0.3837
+
+All results saved to results/fa_penalty_lam1e-4_30ep/results_cifar10.json
+
+=== L: FA d=512 depth sweep (100ep, s42) ===
+  L=2
+Using device: cuda:0
+
+============================================================
+Seed 42
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0612, train=0.2476, test=0.3028
+  [FA] Epoch 10: loss=1.8290, train=0.3435, test=0.3705
+  [FA] Epoch 20: loss=1.8102, train=0.3489, test=0.3634
+  [FA] Epoch 30: loss=1.7963, train=0.3546, test=0.3398
+  [FA] Epoch 40: loss=1.7775, train=0.3605, test=0.3497
+  [FA] Epoch 50: loss=1.7610, train=0.3685, test=0.3288
+  [FA] Epoch 60: loss=1.7592, train=0.3704, test=0.3376
+  [FA] Epoch 70: loss=1.7588, train=0.3747, test=0.3421
+  [FA] Epoch 80: loss=1.7564, train=0.3751, test=0.3497
+  [FA] Epoch 90: loss=1.7543, train=0.3769, test=0.3472
+  [FA] Epoch 100: loss=1.7527, train=0.3768, test=0.3495
+  Final test acc: 0.3495
+
+All results saved to results/fa_depth_scan_d512/results_cifar10.json
+  L=4
+Using device: cuda:0
+
+============================================================
+Seed 42
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0301, train=0.2531, test=0.2917
+  [FA] Epoch 10: loss=1.8487, train=0.3366, test=0.3541
+  [FA] Epoch 20: loss=1.7864, train=0.3609, test=0.3908
+  [FA] Epoch 30: loss=1.7510, train=0.3724, test=0.3990
+  [FA] Epoch 40: loss=1.7387, train=0.3767, test=0.3946
+  [FA] Epoch 50: loss=1.7209, train=0.3875, test=0.4165
+  [FA] Epoch 60: loss=1.7052, train=0.3913, test=0.4173
+  [FA] Epoch 70: loss=1.6945, train=0.3963, test=0.4137
+  [FA] Epoch 80: loss=1.6868, train=0.4018, test=0.4219
+  [FA] Epoch 90: loss=1.6830, train=0.4009, test=0.4250
+  [FA] Epoch 100: loss=1.6781, train=0.4021, test=0.4244
+  Final test acc: 0.4244
+
+All results saved to results/fa_depth_scan_d512/results_cifar10.json
+  L=6
+Using device: cuda:0
+
+============================================================
+Seed 42
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0375, train=0.2474, test=0.2938
+  [FA] Epoch 10: loss=1.8616, train=0.3294, test=0.3541
+  [FA] Epoch 20: loss=1.8289, train=0.3459, test=0.3711
+  [FA] Epoch 30: loss=1.7992, train=0.3579, test=0.3857
+  [FA] Epoch 40: loss=1.7837, train=0.3631, test=0.3942
+  [FA] Epoch 50: loss=1.7699, train=0.3710, test=0.3921
+  [FA] Epoch 60: loss=1.7550, train=0.3741, test=0.3975
+  [FA] Epoch 70: loss=1.7439, train=0.3770, test=0.4058
+  [FA] Epoch 80: loss=1.7413, train=0.3796, test=0.4014
+  [FA] Epoch 90: loss=1.7382, train=0.3791, test=0.4008
+  [FA] Epoch 100: loss=1.7363, train=0.3785, test=0.4014
+  Final test acc: 0.4014
+
+All results saved to results/fa_depth_scan_d512/results_cifar10.json
+  L=8
+Using device: cuda:0
+
+============================================================
+Seed 42
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0431, train=0.2481, test=0.2960
+  [FA] Epoch 10: loss=1.8619, train=0.3303, test=0.3574
+  [FA] Epoch 20: loss=1.8163, train=0.3500, test=0.3617
+  [FA] Epoch 30: loss=1.7889, train=0.3612, test=0.3795
+  [FA] Epoch 40: loss=1.7651, train=0.3681, test=0.3955
+  [FA] Epoch 50: loss=1.7509, train=0.3738, test=0.4002
+  [FA] Epoch 60: loss=1.7385, train=0.3783, test=0.4060
+  [FA] Epoch 70: loss=1.7297, train=0.3819, test=0.4046
+  [FA] Epoch 80: loss=1.7255, train=0.3861, test=0.4064
+  [FA] Epoch 90: loss=1.7214, train=0.3872, test=0.4076
+  [FA] Epoch 100: loss=1.7181, train=0.3879, test=0.4094
+  Final test acc: 0.4094
+
+All results saved to results/fa_depth_scan_d512/results_cifar10.json
+  L=12
+Using device: cuda:0
+
+============================================================
+Seed 42
+============================================================
+
+--- FA ---
+  [FA] Epoch 1: loss=2.0427, train=0.2406, test=0.2963
+  [FA] Epoch 10: loss=1.8510, train=0.3333, test=0.3712
+  [FA] Epoch 20: loss=1.8069, train=0.3520, test=0.3747
+  [FA] Epoch 30: loss=1.7837, train=0.3593, test=0.3827
+  [FA] Epoch 40: loss=1.7677, train=0.3684, test=0.4088
+  [FA] Epoch 50: loss=1.7521, train=0.3730, test=0.3905
+  [FA] Epoch 60: loss=1.7441, train=0.3767, test=0.4042
+  [FA] Epoch 70: loss=1.7356, train=0.3807, test=0.4046
+  [FA] Epoch 80: loss=1.7307, train=0.3824, test=0.4037
+  [FA] Epoch 90: loss=1.7268, train=0.3839, test=0.4014
+  [FA] Epoch 100: loss=1.7295, train=0.3834, test=0.4035
+  Final test acc: 0.4035
+
+All results saved to results/fa_depth_scan_d512/results_cifar10.json
+
+==========================================
+FA EXTENSION EXPERIMENTS DONE
+End: Wed Apr 22 11:13:24 PM CDT 2026
+==========================================
-- 
cgit v1.2.3