faeval.git/experiments/cifar_resmlp.py, branch master

BP+EP audit for d=512 L=2 qualifying seeds + CIFAR-100 support

2026-04-26T14:31:30+00:00

BP results for qualifying seeds (1, 2, 5) on d=512 L=2:
  BP s1: 0.606, s2: 0.608, s5: 0.607 (all above frozen 0.349)
  FA s1: 0.347, s2: 0.346, s5: 0.341 (all below frozen, cos +0.47-0.49)
  DFA s1: 0.298, s2: 0.297, s5: 0.296 (all below frozen, cos +0.18-0.21)

EP did not save (likely architecture compatibility issue at d=512 L=2).

Also: added CIFAR-100 dataset support to both cifar_resmlp.py and
resmlp_frozen_blocks_baseline.py for the harder-task scan.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add vanilla FA (Lillicrap 2016) implementation + full experiment suite

2026-04-23T04:46:33+00:00

PAPER-CHANGING FINDING: FA is dramatically different from DFA on the
same architecture. FA has genuine deep credit quality where DFA has none.

Implementation:
- experiments/cifar_resmlp.py: added train_fa() + FA diagnostic support
  FA uses sequential backward credit propagation with d×d random matrices
  (a_l = B_l @ a_{l+1}) instead of DFA's direct output-error projection
  (a_l = B_l^T @ e_T). Same local loss form .

Core results (A-H, 100ep 3-seed d=256 terminal-LN ResMLP):

  FA main audit:    0.401 ± 0.009 (DFA: 0.306 ± 0.008)  +9.5 pp
  FA vs frozen:     +5.2 pp ABOVE baseline (DFA: -4.3 pp below)
  FA deep cos:      +0.33 (DFA: ~0 degenerate)
  FA ||h_L||:       ~10^5 (DFA: ~5×10^8)  3 OOM less growth
  FA ||g_L||:       ~10^-6 meaningful (DFA: ~10^-10 floor)
  Mode 1(b) fires:  NO for FA; YES for DFA

  FA+pen lam=1e-2:  0.369 ± 0.003 (DFA+pen: 0.360 ± 0.002)
  FA+pen lam=1e-4:  0.377 ± 0.006 (DFA+pen lam=1e-4: 0.360)
    At lam=1e-4, FA already has deep cos +0.30 while DFA has -0.02

  FA random-target: acc 0.12 (chance), h_L=1.3e5 (DFA: 1.7e8)
  FA early 5ep:     deep cos already +0.32 (DFA ep1: -0.008)

Extension results (d=512 depth sweep, 100ep, s42):
  L=2:  FA 0.350, cos +0.96  (DFA: n/a)
  L=4:  FA 0.424, cos +0.29  (DFA: n/a)
  L=6:  FA 0.401, cos +0.16  (DFA: n/a)
  L=8:  FA 0.409, cos +0.11  (DFA: 0.306, cos -0.0001)
  L=12: FA 0.404, cos +0.09  (DFA: 0.309, cos -0.0001)

FA deep cos is positive at EVERY depth; DFA is ~0 everywhere.
FA accuracy exceeds DFA by 5-10 pp at L=8 and L=12.

This is the strongest empirical support for the Mode 2 → Mode 1
hypothesis: same local loss, same architecture, same optimizer —
only the credit signal differs. FA's sequential propagation produces
much better per-layer credit (cos +0.33 vs ~0), which prevents the
catastrophic activation growth that DFA exhibits.

Co-Authored-By: Claude Opus 4.6 (1M context)

Round 38: add --penalty_lam flag to cifar_resmlp.py for Mode 2 cross-method test

2026-04-08T11:37:23+00:00

Patches:
- main(): add --penalty_lam (separate from CB's bridge temperature args.lam)
- train_dfa block update (line 195): add penalty_lam * (f_l**2).sum(-1).mean()
- train_state_bridge block update (line 326): same penalty
- train_credit_bridge block update (line 533): same penalty

Codex round 38 GO STAGE: keep penalty separate from CB lam, blocks-only,
sanity-check that hidden_norms remain nontrivial (not silencing the blocks).

2-epoch smoke (results/round38_smoke_sbcb_pen) passes the silencing check:
SB ||h_L||=229, CB ||h_L||=1258, both nontrivial. Deep cosines positive across
all layers for SB ([0.28, 0.25, 0.23]) and rising for CB ([0.04, 0.08, 0.13, 0.15]).

Co-Authored-By: Claude Opus 4.6 (1M context)

Round 35: SB and CB also show data-agnostic Mode 1 growth on random targets

2026-04-08T10:57:53+00:00

- experiments/cifar_resmlp.py: add --methods filter and --random_targets flag;
  extend compute_diagnostics to log hidden_norms_per_layer and bp_grad_norms_per_layer
- paper/main.tex §3 ¶1: broaden random-target finding to all 3 fixed-feedback methods
  (DFA: ||h_L||=14510, SB: ||h_L||=6225, CB: ||h_L||=19974 at ep 3, all at chance acc)
- paper/main.tex Appendix J: extended with cross-method smoke-test table

This generalizes the §3 mechanism story from 'DFA-specific' to 'all 3 audited
fixed-feedback local-credit methods'. Combined with rounds 32-34, the proximate
cause of Mode 1 (a) is now well-localized:
  - Not requires residual skip (round 33 H2 walkback)
  - Not requires task signal (round 34 random targets, DFA)
  - Not DFA-specific (round 35 random targets, SB+CB)

Co-Authored-By: Claude Opus 4.6 (1M context)

Initial implementation: all models, methods, toy and CIFAR experiments

2026-03-23T23:21:26+00:00

Debug phase. Toy LQ experiments (3 seeds) complete with terminal gradient matching.
Credit bridge matches state bridge on linear system (~0.94 cosine).
CIFAR experiments in progress.