faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-04-08	v2 skeleton from round 25: section structure now matches round 23	YurenHao0426
	Round 24's skeleton had 3 deviations from round 23 redo: - Made §3 'Diagnostic Protocol' instead of 'Failure Mode 1' - Collapsed Mode 1 + Mode 2 into one §4 - Added §6 'Reference Implementation' (was supposed to be dropped) Round 25 fixed all three. New §3-§7 match round 23 redo exactly: §3 Failure Mode 1: Measurement Degeneracy §4 Failure Mode 2: Low Intrinsic Credit-Direction Quality §5 Intervention and Cross-Architecture Evidence §6 Recommended FA Evaluation Protocol §7 Discussion, Limits, Conclusion Also added: - In-line bibliography with 12 \bibitem entries (Paleka, O'Bray, Jordan + FA literature) — citations resolve correctly now - Appendices A-G with actual prose content (not just headers) - 7-pitfall catalog with descriptions - Walk-back chain methodology paragraph - 7-validation summary table Compiles to 9 pages with figures 1+3 inline (existing PNGs) and figures 2/4/5 as placeholder text PDFs (TODO: regenerate). Tables 1/2/3 still have TODO placeholders for numerical values. Next: fill in tables 1-3 with existing JSON data, generate figures 2/4/5 from existing data, then consult codex per-section for prose filling.
2026-04-08	Archive failed v1 draft as v1_rejected.tex; remove main.tex/main.pdf	YurenHao0426
	User rejected the v1 draft as '流水账实验报告' (sequential experiment report). Round 22 + 23 redid the outline with E&D-genre prescription. Saving v1 as v1_rejected.tex for reference. New main.tex will be written from round 24 LaTeX skeleton (codex offered to provide it), section by section, with codex check on each section's prose.
2026-04-08	Compile paper PDF + fix bibstyle for tectonic	YurenHao0426
	Compiled with tectonic (the only LaTeX engine on this server). Two fixes needed: 1. Pass [numbers,compress] to natbib via PassOptionsToPackage so the numerical bibliography style works 2. Use bibstyle 'abbrvnat' instead of 'plain' (compatible with natbib) Result: 10-page PDF, ~7.5 content pages (well under 9-page E&D limit), references on pages 8-9, appendices A-D on pages 9-10. PDF uploaded to broker as 1843506b_main.pdf for user review.
2026-04-08	Paper main.tex: add §5.4 λ sweep dissociation table	YurenHao0426
	The λ sweep is the strongest single piece of two-mode separation evidence and doesn't require the early-epoch caveat. New §5.4 with table showing: λ=0: vanilla, both modes broken λ=1e-4: mode 1 ALLEVIATED (\|\|h_L\|\|=2.4e4, \|\|g\|\|=6.3e-7), mode 2 NOT (cos -0.022, rho -0.004) λ=1e-2: mode 1 alleviated, mode 2 partially (cos +0.16, rho +0.09) λ=1e-1: slightly over-constrained (cos +0.13, rho +0.07) The two modes have different intervention thresholds. §5.4 is now the killer evidence; the early-epoch disambiguation in §5.3 becomes supporting. Updated section summary to 'five validations'.
2026-04-08	λ sweep on penalty strength: lam ∈ {1e-4, 1e-2, 1e-1} cos + rho results	YurenHao0426
	Round 19's #5 recommendation. Major new finding for the paper: \| lam \| acc \| \|\|h_L\|\| \| \|\|g_2\|\| \| deep cos \| deep rho \| \|-------\|------:\|--------:\|--------:\|---------:\|---------:\| \| 0 \| 0.308 \| 4e8 \| 5e-10 \| -0.008 \| -0.003 \| \| 1e-4 \| 0.359 \| 2.4e4 \| 6.3e-7 \| -0.022 \| -0.004 \| \| 1e-2 \| 0.363 \| 4e4 \| 1e-6 \| +0.155 \| +0.080 \| \| 1e-1 \| 0.349 \| 1.2e4 \| 1.6e-6 \| +0.131 \| +0.067 \| KEY: at lam=1e-4 the residual stream is contained AND \|\|g\|\| is healthy (mode 1 ALLEVIATED), but deep cos and rho are still essentially zero (mode 2 NOT alleviated). This is independent dissociation of the two modes via penalty strength: at weak penalty you get mode 1 fix WITHOUT mode 2 fix. Both metrics (cos, rho) agree at every lambda. Penalty strength has a non-monotonic effect on mode 2 alleviation: - lam=1e-4: too weak, mode 2 not alleviated (cos ~0) - lam=1e-2: sweet spot, cos +0.16, rho +0.08 - lam=1e-1: slightly over-constrained, cos +0.13, rho +0.07 This is the 7th independent validation of the two-mode separation, and the strongest one because it shows mode 1 alleviation WITHOUT mode 2 alleviation — the modes do not even respond to the same intervention strength.
2026-04-08	First draft of NeurIPS 2026 E&D paper	YurenHao0426
	Title: 'Beyond Accuracy and Alignment: A Diagnostic Evaluation Protocol for Feedback Alignment' Structure (per round 21 prescription): Abstract: 'broken because conflated' framing, 2 distinct modes named §1 Introduction: discovery hook -> 2-mode framing -> contribution §2 Related work §3 Audit (the field-standard pair walks back nothing) §4 The diagnostic protocol (4 diagnostics, calibrated thresholds, decision-utility ablation, cross-architecture validation) §5 Two distinct failure modes (mechanism, penalty rescue, direct cosine measurement, hypothesis-disambiguation, capacity-cost control) §6 Limitations §7 Broader impacts §8 Conclusion Appendices: reproducibility, 7-pitfalls catalog, walk-back chain (4 step), all 6 validations of the two-mode separation Includes 4 result tables, ~10 references, structured as eandd track double-blind submission. 760 lines of LaTeX, balanced environments verified. Ready for compilation on a system with pdflatex. Template: paper/neurips_2026.{sty,tex}, downloaded from official NeurIPS 2026 source. checklist.tex also unzipped.
2026-04-08	Add perturbation correlation metric calibration	YurenHao0426
	Anchors the rho +0.08 finding with positive and negative controls: positive control (BP grad as a_l): +0.9965 (perfect, expected ~1) negative control (random vector): +0.0056 (noise floor, expected ~0) vanilla DFA s42 (\|\|g\|\| at floor): +0.0020 (within noise floor) penalized DFA s42 (\|\|g\|\| healthy): +0.0937 (~48x above noise, ~9% of perfect) The metric is well-calibrated. BP gradient as a_l gives rho ~1 (Taylor), random vector gives rho ~0 (noise floor), random feedback in degenerate regime is indistinguishable from noise floor, random feedback in penalized regime is small-but-well-above-noise (~48x noise, ~9% perfect). Defensible paper claim: 'rho +0.08 is small in absolute terms but clearly above the calibrated noise floor and on the order of 10% of the perfect-signal ceiling — consistent with the 60% of BP accuracy the penalized network achieves.' Closes round 19's 'is rho +0.08 a meaningful number on this metric?' question with explicit calibration.
2026-04-08	PAPER_OUTLINE: add 6th validation (perturbation correlation triangulation)	YurenHao0426

2026-04-08	Extend perturbation audit to vanilla early-epoch checkpoints	YurenHao0426
	Cross-metric disambiguation confirmation. Vanilla DFA at ep 1 (meaningful regime, \|\|g\|\|~6e-7) deep rho across 3 seeds: s42: deep rho -0.008 s123: deep rho +0.000 s456: deep rho -0.000 mean: -0.003 ± 0.005 Compare to penalized DFA 3-seed: deep rho +0.080 ± 0.011. The disambiguation (penalty CREATES alignment, not just reveals it) is now confirmed by TWO independent metrics: - cos: vanilla -0.008 ± 0.013, penalized +0.155 ± 0.025 - rho: vanilla -0.003 ± 0.005, penalized +0.080 ± 0.011 Both metrics agree on the vanilla→penalized transition. The l0 (embedding) rho is high (~0.25-0.29) at every vanilla checkpoint, mirroring the cos l0 +0.42 — the embedding layer is genuinely useful while the deep blocks are not, by BOTH metrics. The penalty restores some deep usefulness to ~+0.08 rho / +0.16 cos. Cross-metric agreement rules out single-metric artifacts on either side.
2026-04-08	EVIDENCE_SUMMARY: add 6th validation (perturbation correlation triangulation)	YurenHao0426

2026-04-08	Add perturbation correlation audit (round 19's recommended alt metric)	YurenHao0426
	Codex round 19 said: 'use nudging or perturbation correlation on the penalized checkpoints. In the healthy-gradient regime, that is a more direct is-the-local-signal-useful test than cosine alone'. Result on existing checkpoints (eps=1e-3, M=32 random directions, n=1024): vanilla DFA s42: deep rho +0.002 penalized DFA s42 lam=1e-2 30ep: deep rho +0.094 penalized DFA s123 lam=1e-2 30ep: deep rho +0.073 penalized DFA s456 lam=1e-2 30ep: deep rho +0.072 penalized 3-seed mean: deep rho +0.080 ± 0.011 This INDEPENDENTLY TRIANGULATES the cos +0.17 finding via a different metric: - vanilla deep cos ~0 matches vanilla deep rho ~0 - penalized deep cos +0.155 matches penalized deep rho +0.080 The two metrics measure different things: - cos = directional alignment with BP grad - rho = correlation between predicted and true loss change under random perturbation Both show the same pattern: penalty creates partial usefulness from essentially zero. This is the 6th independent validation of the mode 2 'penalty creates partial alignment' framing. Crucially, rho doesn't use F.cosine_similarity (no eps clamp), and it measures sample-level loss change correlation rather than direction match — so it rules out 'cos is capturing some directional artifact unrelated to local usefulness'.
2026-04-08	EVIDENCE_SUMMARY: §4 fully rewritten under locked two-distinct-modes framing	YurenHao0426
	§4 now reflects all 5 independent validations of the converged framing: 1. Direct deep cos on penalized DFA (3 seeds): +0.155 ± 0.025 2. Null calibration with fresh Bs: +0.002 ± 0.022 (real signal) 3. Hypothesis B disambiguation (vanilla early ep): -0.008 ± 0.013 4. BP+penalty 2×2 control: 17 pp residual = credit quality 5. Multi-seed lock-in: 24 measurements all near zero Round 20 language tightening applied: - 'lower bound on non-capacity gap' instead of 'clean isolation' - Explicit caveats about end-to-end vs local-loss difference - Counter to 'different optimization regime' objection The §4 framing is locked. Five independent validations done. Stop iterating, start writing.
2026-04-08	PAPER_OUTLINE: round 20 language tightening + 5 validation summary	YurenHao0426
	§4 updates per round 20: - Soften 'confirmed' to 'strongly supports' - Add §4.4 BP+penalty capacity-cost control with the round 20 phrasing: 'lower bound on residual gap under matched architecture/data/optimizer/ penalty, after accounting for the penalty's direct capacity cost in BP' - Add multi-seed lock-in to §4.3 (24 measurements all near zero) - List 5 independent validations supporting the converged framing The §4 narrative is now complete and the framing is locked.
2026-04-08	Multi-seed vanilla DFA early-epoch cos: lock-in for round 19 disambiguation	YurenHao0426
	Round 20's minimal lock-in experiment: 3 seeds × {ep 1, ep 2} vanilla DFA cosine. Closes the 'single-seed fluke' objection. Vanilla DFA early-epoch deep cosines (l1-l4): \| seed \| ep \| \|\|g\|\| \| deep mean \| \|---\|---\|---\|---\| \| 42 \| 1 \| 6.7e-7 \| -0.025 \| \| 42 \| 2 \| 1.5e-7 \| -0.038 \| \| 123 \| 1 \| 6.5e-7 \| +0.002 \| \| 123 \| 2 \| 1.4e-7 \| -0.006 \| \| 456 \| 1 \| 3.9e-7 \| +0.000 \| \| 456 \| 2 \| 8.5e-8 \| -0.009 \| 3-seed mean at ep 1 (most meaningful regime): -0.008 ± 0.013 3-seed mean at ep 2: -0.018 ± 0.018 ALL 24 measurements (3 seeds × 2 ep × 4 deep layers) are in [-0.04, +0.02]. Compare to penalized DFA 3-seed mean +0.155 ± 0.025. The penalty CREATING deep alignment finding is now seed-robust. Three seeds × two early epochs all show vanilla deep cos essentially zero even when \|\|g\|\| is in the meaningful regime. This is the round 20 lock-in. Framing is locked.
2026-04-08	BP+penalty control result: mode 2 (intrinsic credit quality) confirmed REAL	YurenHao0426
	BP + lam=1e-2 \|\|f\|\|^2 penalty trained for 30 epochs (s42): ep 30 final: test_acc 0.5303 margin vs DFA-shallow 0.349: +18.13 pp The 2x2 accuracy grid: no penalty with penalty BP 0.609 0.530 DFA 0.308 0.363 Penalty effect on BP: -8 pp (capacity regularization cost) Penalty effect on DFA: +5.5 pp (rescue from active harm) Mode 2 (intrinsic credit quality) is confirmed REAL by this control: even after the penalty's capacity cost, BP achieves +18 pp depth utilization. DFA under the same penalty achieves only +1.4 pp. The difference (~17 pp) cannot be attributed to capacity loss — it is genuine credit-quality cost of random feedback vs true backprop gradient. This validates the round 19 'two distinct failure modes' framing: mode 2 is not a penalty-induced regularization artifact.
2026-04-08	Add BP+penalty control (round 19's #4 critical experiment)	YurenHao0426
	Trains end-to-end BP with the same lambda*\|\|f_l(h_l)\|\|^2 penalty used in the DFA penalty rescue. Tests whether the penalty's depth utilization loss in penalized DFA is intrinsic to DFA's random-feedback credit quality (mode 2) or due to penalty-induced capacity regularization. Decision rule: BP+pen margin > 25 pp -> mode 2 confirmed (penalty is not the cap) BP+pen margin < 5 pp -> penalty itself caps depth (capacity loss) intermediate -> both effects present
2026-04-08	PAPER_OUTLINE: §4 rewrite under 'two distinct failure modes' framing	YurenHao0426
	After the round 19 disambiguation experiment confirmed hypothesis B (penalty CREATES deep alignment, not just reveals it), the paper §4 needs to use the new framing: Mode 1: measurement degeneracy via terminal LN gradient cancellation Mode 2: low intrinsic credit-direction quality of random feedback Both modes are direct-measured (mode 1 by diagnostic (b), mode 2 by per-layer cos in the meaningful regime). The penalty partially alleviates BOTH modes. Neither is fully fixed. §4 rewrite includes: - The two modes (4.1) - Penalty causal validation with 3-seed cos (4.2) - Disambiguation: vanilla early-epoch cos table proving hypothesis B (4.3) - Why the residual gap is partial alignment (4.4) - Why this framing is paper-cleaner than prior ones (4.5) Walk-back chain extended to 7 entries, with 6 and 7 happening same-day and converging on the final two-distinct-modes framing.
2026-04-08	3-seed multi-seed verification of penalized DFA deep cos = +0.17	YurenHao0426
	\| seed \| l0 \| l1 \| l2 \| l3 \| l4 \| layer-mean \| \|---\|---:\|---:\|---:\|---:\|---:\|---:\| \| 42 \| +0.316 \| +0.169 \| +0.151 \| +0.165 \| +0.166 \| +0.193 \| \| 123 \| +0.333 \| +0.093 \| +0.155 \| +0.178 \| +0.177 \| +0.187 \| \| 456 \| +0.339 \| +0.131 \| +0.123 \| +0.150 \| +0.150 \| +0.179 \| 3-seed mean deep cos (l1-l4): ~0.155 ± 0.025 3-seed layer-mean: +0.186 ± 0.007 The +0.17 finding is rock-solid, combined with: - null calibration: training-Bs +0.16 vs fresh-Bs +0.002 - hypothesis B confirmed: vanilla early ep deep cos ~0 - 3-seed reproducibility (this commit) This is the §4 evidence for the paper's 'penalty creates partial deep alignment, partially alleviating mode 2'.
2026-04-08	DISAMBIGUATION: vanilla DFA early-epoch checkpoints + cos measurement	YurenHao0426
	Round 19's #3 critical experiment. Trained vanilla DFA s42 for 5 epochs, saved checkpoint at each, then measured per-layer cos(e_T B^T, BP grad). Key trajectory of \|\|g_l\|\| during vanilla DFA training: ep 0: ~1e-3 (random init, healthy) ep 1: ~1.4e-6 (3 OOM drop, STILL above 1e-7 floor) ep 2: ~3e-7 (above floor) ep 3: ~1.3e-7 (above floor, barely) ep 4: ~7e-8 (BELOW floor) ep 5: ~4e-8 (well below floor) So ep 1, 2, 3 vanilla checkpoints are in the MEANINGFUL \|\|g\|\| regime. Cos measurement on those: ep 1: l0=+0.42, l1=+0.005, l2=-0.028, l3=-0.039, l4=-0.038 ep 2: l0=+0.44, l1=-0.002, l2=-0.040, l3=-0.055, l4=-0.054 ep 3: l0=+0.43, l1=+0.007, l2=-0.039, l3=-0.054, l4=-0.054 DEEP-LAYER COSINES ARE ESSENTIALLY ZERO AT EVERY VANILLA EPOCH, even when \|\|g\|\| is in the meaningful regime (ep 1: \|\|g\|\|=6.7e-7). Compare to penalized DFA s42 at 30 ep: deep cos = +0.17. Hypothesis B confirmed: the penalty CREATED the deep-layer alignment. It is a training outcome of the regularization, not a measurement-regime revelation. Paper implications: there are two distinct failure modes after all, but they are not 'scale + direction'. They are: (1) Measurement degeneracy via terminal LN gradient cancellation (caught by diagnostic (b)) (2) Low intrinsic credit quality of random feedback even in the meaningful regime (caught by direct cos measurement) The penalty partially alleviates BOTH (residual stream contained AND deep alignment improved from ~0 to +0.17), but neither fully.
2026-04-08	Add vanilla DFA early-epoch checkpoint training (round 19 disambiguation)	YurenHao0426
	Trains vanilla DFA (no penalty) for max_epoch epochs and saves checkpoints + Bs at specified early epochs (default: 1, 2, 3, 4, 5). Logs per-layer \|\|h_l\|\| and \|\|g_l\|\| at each epoch so we can see when \|\|g_L\|\| crosses the 1e-7 floor. Codex round 19's #3 critical experiment for disambiguating: Hypothesis A: deep alignment was always there in vanilla DFA but hidden by the post-collapse measurement degeneracy Hypothesis B: deep alignment was created by the penalty intervention Test: measure deep-layer cos at vanilla checkpoints from ep 1-3 (when \|\|g_L\|\| should still be in the meaningful regime). If cos > 0 at ep 1-2 vanilla -> hypothesis A If cos ~ 0 at ep 1-2 vanilla -> hypothesis B
2026-04-08	Add null calibration script: training-Bs vs fresh-Bs cos on penalized DFA	YurenHao0426
	Codex round 19's #1 critical control. Result on penalized DFA s42 (lam=1e-2, 30 ep): training-Bs deep-layer cos: +0.1627 fresh-Bs deep-layer cos: +0.0022 ± 0.0220 (n=20 draws) The +0.17 measurement is REAL signal, not artifact. The network specifically adapted to its training-time Bs during the penalized run. Fresh Bs give essentially zero cosine (within noise). This validates the walk-back interpretation: in the rescued regime where \|\|g_l\|\| is meaningful, DFA's local credit signal shows partial alignment with BP grad — and this alignment is specifically the network learning to align with its specific Bs. Round 19 caveat preserved: cannot yet distinguish whether the alignment was always present in vanilla but hidden by measurement degeneracy, OR whether it was created by the penalty intervention. The early-epoch vanilla checkpoint sweep (round 19's other proposed control) would disambiguate.
2026-04-08	MAJOR: penalized DFA deep-layer cosine is +0.17, NOT zero	YurenHao0426
	Direct deep-block credit measurement on penalized DFA s42 checkpoint (lam=1e-2, 30 epochs, just trained): per-layer cos(e_T B^T, BP grad) — TRAINING Bs, no eps clamp: l0: +0.316 (±0.188) \|\|g\|\|=9.18e-7 \|\|a\|\|=4.53 l1: +0.169 (±0.087) \|\|g\|\|=8.87e-7 \|\|a\|\|=4.57 l2: +0.151 (±0.084) \|\|g\|\|=8.77e-7 \|\|a\|\|=4.50 l3: +0.165 (±0.099) \|\|g\|\|=8.73e-7 \|\|a\|\|=4.64 l4: +0.166 (±0.098) \|\|g\|\|=8.69e-7 \|\|a\|\|=4.64 layer-mean: +0.193 Compare to vanilla DFA (existing measurement, scale-broken regime): l0: +0.42 l1-4: ~0 (essentially zero) CRITICAL INTERPRETATION: The penalty doesn't just fix scale, it ALSO restores deep-layer direction quality from ~0 to ~0.17. This contradicts the prior 'two failure modes' framing where I assumed direction would remain broken even after scale fix. The honest story is: - vanilla DFA: scale catastrophic, BP grad at floor, cosine measurement DEGENERATE (cos ~0 is noise dominance, not 'no alignment') - penalized DFA: scale fixed, BP grad healthy, cosine measurement INTERPRETABLE — and the value is +0.17 on deep layers (partially aligned, much less than BP's self-cosine of 1.0) - the +0.17 alignment explains why penalized DFA gets 0.36 (60% of BP's 0.61) — partial credit gives partial training, not zero training The 'second failure mode' claim is wrong. There's ONE unified failure mode (scale + measurement degeneracy), and the penalty rescues BOTH. The remaining gap to BP is 'partial credit quality', not a separate failure mode.
2026-04-08	Add PAPER_OUTLINE.md: §1-§6 draft reflecting round 17 + 18	YurenHao0426
	Comprehensive paper draft outline for the NeurIPS 2026 E&D submission: §1 Discovery-first hook (round 16 narrative arc): broken eval -> evidence -> metrics miss -> need protocol -> validation §2 Audit findings: 5-method × 3-seed audit, walk-back details, EP internal control §3 The diagnostic protocol: 4 diagnostics, decision-utility ablation, threshold sensitivity (with (d) fragility flagged), temporal validation, cross-architecture validation, sub-mode discrimination §4 Two failure modes: mechanism story + causal penalty rescue, with the round 18 softening (partial dissociation rather than full separability) §5 Pipeline pitfalls catalog: 7 bugs (incl. new #6.5 self-cosine fallback) §6 Reference implementation + Limitations / walk-backs section listing all 5 walked-back claims explicitly This is a working draft to make the next writing step concrete. Reflects all evidence collected through the round 18 follow-up.
2026-04-08	Add penalty lambda 3-seed summary script + checkpoint save in penalty test	YurenHao0426
	- New script: protocol/examples/penalty_lam_3seed_summary.py Loads existing penalty JSON files for lam=1e-3 and lam=1e-2 across seeds, computes 3-seed mean margin vs DFA-shallow baseline, and explicitly checks the (d) verdict at 2pp threshold per seed and in aggregate. Reports MIXED if seeds disagree. Current result: lam=1e-2 has 3 seeds (margin +1.38 ± 0.05 pp, all FIRE), lam=1e-3 has 1 seed (+2.31 pp, PASSES). Awaiting s123/s456 for lam=1e-3. - experiments/dfa_residual_penalty_test.py: now saves model checkpoint + Bs alongside JSON log so post-hoc protocol can be applied without re-running. Closes the pitfall #6.5 self-disclosure (auxiliary nets must be saved for post-hoc Gamma to be reconstructible).
2026-04-08	Add penalty λ sweep figure: shows λ-dependence of (d) verdict	YurenHao0426
	3-panel figure: vanilla DFA + penalty at λ=1e-3 (green) + penalty at λ=1e-2 (blue): (a) ‖h_L‖: vanilla 4e8, both penalties ~4e4 (similar) (b) ‖g_2‖: vanilla 5e-10, penalties 7e-7 to 1e-6 (above floor) (c) acc: vanilla 0.31, λ=1e-2 0.36, λ=1e-3 0.37; horizontal lines at DFA-shallow 0.349 and 2pp threshold 0.371 Visual: at λ=1e-3 the test acc curve crosses ABOVE the 2pp threshold line; at λ=1e-2 it stays below. This is the (d) lambda-dependence finding from the round 18 follow-up.
2026-04-08	EVIDENCE_SUMMARY: add (d) threshold sensitivity finding (round 18)	YurenHao0426

2026-04-07	Add (d) frozen-baseline threshold sensitivity — IMPORTANT new finding	YurenHao0426
	Critical observation: at lambda=1e-3 (single seed), penalized DFA margin above shallow baseline is +2.3 pp — which PASSES (d) at the 2 pp default threshold. At lambda=1e-2 (3 seeds), the margin is +1.4 pp — FIRES (d) at 2 pp. So the (d) verdict on penalized DFA depends on BOTH the lambda choice AND the threshold choice. This is a significantly weaker claim than 'two failure modes are separable via (d)'. The honest framing per round 18 lesson: there is a real tradeoff between penalty strength and depth utilization. Weaker penalty preserves more depth contribution but also more scale pathology. Stronger penalty kills depth contribution. The protocol surfaces this tradeoff but doesn't establish the second failure mode by itself. Compared to (a) 63x and (b) 24338x separation gaps, (d) is the LEAST robust diagnostic and the most sensitive to threshold choice. Need to flag this prominently in the paper.
2026-04-07	CHECKLIST pitfall #6: layer-0 dominance is ResMLP-specific, not universal	YurenHao0426
	Verified by extracting per-layer gamma_dfa from existing ViT-Mini snapshot JSON (3 seeds, final epoch). On ViT all 4 layers have per-layer cosine near zero (~0.001 with eps clamp); no layer dominates. Compare to ResMLP where layer 0 has +0.42 and layers 1-4 are essentially zero. The pitfall is real on ResMLP but the specific 'layer 0 dominates' framing doesn't generalize to ViT. Reframed as 'aggregation hides per-layer structure'; lesson is to always report per-layer values regardless of which architecture-specific pattern you might be hiding.
2026-04-07	EVIDENCE_SUMMARY: round 18 language softening on CNN + penalty audit	YurenHao0426

2026-04-07	CHECKLIST: add pitfall #6.5 — silent self-cosine fallback when aux nets ↵	YurenHao0426
	not saved Discovered in our own cnn_baseline.py: when the random feedback Bs (for DFA) or bridge predictor (for SB/CB) are not persisted alongside the model checkpoint, post-hoc Gamma computation cannot reconstruct the local credit signal. Instead of erroring, the script falls back to cos(BP_grad, BP_grad) = 1.0 and records that as Gamma. Reader who doesn't notice the small 'Gamma_note' field interprets 1.0 as perfect alignment. Recommendation: always save aux nets alongside checkpoints; if they're missing, report Gamma as N/A, not 1.0.
2026-04-07	EVIDENCE_SUMMARY: add §3.7 CNN cross-architecture audit results	YurenHao0426

2026-04-07	Add CNN third-architecture audit: BN, no terminal LN	YurenHao0426
	5 methods × 3 seeds on the SmallCNN (3 conv + BN + 1 FC + head, no terminal LN) using existing checkpoints in results/cnn_baseline/. Key findings: BP CNN: 0.866 acc, max/block 1.3, trustworthy State Bridge CNN: 0.633 acc, max/block 2.4, trustworthy EP CNN: 0.512 acc, max/block 12, trustworthy DFA CNN: 0.566 acc, max/block 237, walked back via (a) Credit Bridge CNN: 0.325 acc, max/block 96, walked back via (a) CRITICAL: diagnostic (b) \|\|g_L\|\| floor NEVER fires on CNN for any method. The deepest BP grad is at ~1e-5 to 6e-1, all well above the 1e-7 floor. This is the cleanest confirmation that terminal LayerNorm is the structural cause of the catastrophic gradient collapse in (b). Without out_ln, the BP grad does NOT collapse to the floor, even on DFA. The scale pathology (a) still appears on DFA and CB, but the gradient collapse pathology (b) is specific to terminal-LN architectures. DFA CNN's accuracy (56.6%) is much higher than DFA ResMLP (30.8%) or DFA ViT (23.7%) — partially because the scale pathology is less catastrophic without the LN-driven gradient cancellation amplifying it. This is the cross-architecture mechanism story made concrete.
2026-04-07	Add minimal worked example: end-to-end protocol usage tutorial	YurenHao0426
	5-epoch DFA training on CIFAR-10 + apply protocol + interpret verdict. Self-contained, runs on CPU in <2 minutes. Demonstrates the API a future paper author would use: 1. train your model (any FA-style method) 2. build eval_batches from your test loader 3. call diagnose(model, eval_batches, headline_acc, frozen_baseline_acc) 4. read report.verdict; walk back if 'needs walk-back' Not run during this session to avoid GPU contention with the in-flight direction-quality and ViT/ResNet experiments.
2026-04-07	Add §4 penalty rescue figure: visual two-failure-modes story	YurenHao0426
	3-panel side-by-side showing per-epoch trajectories of vanilla DFA vs DFA + lambda*\|\|f\|\|^2 penalty: (a) \|\|h_L\|\|: vanilla 4e8 vs penalty 4e4 (4 OOM rescue) (b) \|\|g_L\|\|: vanilla 5e-10 vs penalty ~1e-6 (4 OOM rescue) (d) test acc: vanilla 0.31 vs penalty 0.36 vs frozen baseline 0.349 vs BP 0.61 The visual story: (a) and (b) show the penalty pulling the diagnostics back into the healthy regime, but (d) shows the rescue translates to only +1 pp above the DFA-shallow baseline and 24 pp below BP-trainable. The two failure modes (scale + direction) are visually separable: scale is fixed, direction is not. Together with figure_audit_5method.png and figure_cross_arch_temporal_s42.png, this is the third paper-ready figure for §3-§4.
2026-04-07	EVIDENCE_SUMMARY: add §3.5 sensitivity, §3.6 cross-width, §4 ↵	YurenHao0426
	separability, figures section
2026-04-07	Add §2/§3 hero figure: 5-method audit horizontal bar chart	YurenHao0426
	4-panel layout (one per diagnostic), 5 methods sorted bottom-to-top by ascending accuracy, color-coded healthy (BP/EP, blue) vs degenerate (DFA/SB/CB, red), with threshold lines drawn: (a) max per-block growth (log scale, threshold 50x) (b) \|\|g_L\|\| (log scale, floor 1e-7) (c) cross-batch stability (linear, ceiling 0.30) (d) headline acc (linear, frozen baseline 0.349) The visual layout makes it immediately obvious that: - (a) and (b) cleanly split healthy from degenerate (4-7 OOM gap) - (c) is bimodal and doesn't cleanly split — confirms it's a sub-mode discriminator, not a primary detector - (d) shows BP above the frozen baseline by ~25 pp while DFA/CB/SB are at or below it
2026-04-07	Add d=512 ResMLP audit table (3 seeds): cross-width validation	YurenHao0426
	Same protocol applied to the 4-block d=512 ResMLP variant (vs the d=256 default). 4 methods × 3 seeds = 12 conditions: BP @ d=512: trustworthy on all 3 seeds (acc 0.60-0.61) DFA @ d=512: walked back on all 3 seeds via (a)+(b) State Bridge @ d=512: walked back on all 3 seeds via (a)+(b), with drift sub-mode on s123 (stability 0.879) Credit Bridge @ d=512: walked back on all 3 seeds via (a)+(b) Width effect: max-per-block growth is HIGHER at d=512 (6e3-7e4) than at d=256 (~1e3). Larger width amplifies the explosion. The protocol verdicts are robust to this — same binary outcome, more extreme quantitative numbers. This is the cross-width validation: the protocol's findings are not d=256-specific. The §3 audit results generalize across the width dimension.
2026-04-07	Add fast direction-quality measurement on existing DFA checkpoints	YurenHao0426
	3-seed result on the existing dfa_s{42,123,456}.pt checkpoints from results/confirmatory/checkpoints_A2/, computing per-layer cosine of DFA's local credit signal e_T@B_l^T vs the true BP gradient at h_l. Key findings: per-layer cos (3-seed mean): l0: +0.42 (high — embedding alignment) l1: +0.006 (essentially zero) l2: -0.015 (essentially zero) l3: -0.004 (essentially zero) l4: -0.004 (essentially zero) layer-mean across all 5: +0.07-0.10 The deep blocks (l1-l4) have essentially zero alignment with BP grad in the vanilla scale-failure regime. Layer 0 dominates the headline. The script reconstructs the training-time random Bs by replaying the RNG sequence (torch.manual_seed + ResidualMLP construction + randn draws), since the existing checkpoints don't save Bs. For the still-running direction-quality experiment which DOES save Bs, the script auto-detects the dict format and uses the saved Bs directly.
2026-04-07	Partial protocol audit on penalized DFA: (a)+(b) pass, (d) still fires	YurenHao0426
	3-seed analysis of DFA + lambda=1e-2 \|\|f\|\|^2 penalty using only the data already in the existing penalty JSON logs (no checkpoint or full layer norms needed): (a) per-block growth: avg ~8x per block (geom mean), well below 50x threshold. PASS likely (with small caveat that max could differ from mean). (b) BP grad floor: g_2 = 8-10e-7 across 3 seeds, 10x above the 1e-7 floor. PASS exact. (d) frozen baseline: margin = 1.35-1.45 pp (mean 1.38) < 2 pp required. FIRE on all 3 seeds. Aggregate partial verdict: protocol catches the SECOND failure mode (direction quality / passive blocks) on penalized DFA even though it PASSES the scale-related diagnostics. This is the cleanest possible evidence that the two failure modes are separable: the penalty fixes the scale failure but not the direction failure. The protocol's (d) diagnostic is the right test for the second failure mode and it still fires after the penalty rescue. This is the §4 'two failure modes' evidence that doesn't depend on the direction-quality direct test (which is still running). The (d) diagnostic alone shows the separation.
2026-04-07	Add EVIDENCE_SUMMARY.md: consolidated snapshot of all protocol evidence	YurenHao0426
	Single-document overview of every result the protocol package has produced so far, with reproducibility commands and the file/memory entry where each result is recorded. Organized by paper section (§1 protocol, §2 audit, §3 decision utility, §4 temporal validation, §5 pitfalls). Includes the headline tables (3-seed audit, cross-architecture, penalty sweep) ready for the paper, and an explicit status field for each ongoing experiment. This is a reading guide for anyone (codex, future-me, the user) who needs to know what evidence is ready and how to reproduce it.
2026-04-07	Add §3 cross-architecture temporal evolution figure	YurenHao0426
	3-column 3-row plot: rows: \|\|h_L\|\|, \|\|g_L\|\|, test accuracy cols: ResMLP (with LN) \| ViT-Mini (cls + LN) \| StudentNet (no LN) BP and DFA trajectories overlaid. Floor threshold drawn on the \|\|g_L\|\| row. Visualizes the cross-architecture causal control: with-LN architectures both show \|\|g_L\|\| collapse below 1e-7 (DFA hits the floor within 5 epochs); without-LN architecture shows \|\|g_L\|\| stays in the healthy regime even though \|\|h_L\|\| still grows (catastrophic vs mild).
2026-04-07	Add threshold sensitivity analysis: (a) 63x gap, (b) 24338x gap	YurenHao0426
	For each diagnostic, sweeps threshold across orders of magnitude on the 3-seed audit data and reports the verdict at each value. Key calibration findings (3 seeds): Diagnostic (a) max per-block growth: Healthy max (BP/EP): 11.0 Degenerate min (DFA/SB/CB): 694 Separation gap: 63x Default threshold 50 sits comfortably in the middle. Any threshold in [12, 693] gives the same verdicts. Diagnostic (b) \|\|g_L\|\| at floor: Healthy min (BP/EP): 1.02e-4 Degenerate max (DFA/SB/CB): 4.18e-9 Separation gap: 24,338x Default threshold 1e-7 sits comfortably in the middle. Any threshold in [4.2e-9, 1.0e-4] gives the same verdicts. Diagnostic (c) cross-batch stability: NOT a clean binary discriminator across seeds. BP s456=0.114 near threshold; DFA s42=0.047 (noise sub-mode) doesn't fire; SB s456=0.035 (noise sub-mode) doesn't fire. (c) is for sub-mode interpretation, not binary detection. This is the calibration evidence answering the E&D reviewer question 'why these specific thresholds?'.
2026-04-07	Add ViT-Mini DFA training script that saves checkpoint + Bs	YurenHao0426
	The existing snapshot_evolution_vit.py and vit_frozen_blocks_baseline.py do not save model checkpoints — they only emit per-epoch JSON logs. This makes it impossible to apply the diagnostic protocol to a trained ViT post-hoc, since the protocol needs an actual model object. This script trains a 4-block d=128 ViT-Mini with block-level DFA on CIFAR-10 (same training rule as snapshot_evolution_vit.py) for 60 epochs and saves: - the final state_dict - the random feedback Bs (so the protocol can also verify bug 4 on this checkpoint) - test_acc and config Output: results/vit_dfa_checkpoints/dfa_vit_s{seed}.pt
2026-04-07	Add reproducers for pitfalls 4-6 (Bs reproducibility, aggregation, layer-0)	YurenHao0426
	All 3 verified on the real DFA s42 checkpoint: Bug 4: training Bs gives Γ=+0.068, 10 fresh Bs draws give Γ=+0.0043±0.007. The 'alignment' is the network adapting to specific Bs. Bug 5: 4 valid aggregation strategies give Γ in [-0.028, +0.074]. The spread is 0.10 (3.45x ratio) and the sign flips between strategies. Pick the wrong aggregation and DFA is anti-aligned; pick the right one and DFA looks aligned. Bug 6: Γ_layer0 = +0.429 dominates the mean +0.068. Hidden layers 1-4 are all near zero or slightly negative. Mean of hidden layers only is -0.022 (negative!). The deep blocks the paper claims to be 'training' have Γ ≈ 0 or below. Bugs 5 and 6 are causally linked: 'median over layers' strategies pick a negative deep layer; 'mean over layers' is dominated by the positive l0. The catalog under-reported bug 5 (it said 2.5x, actual is 3.45x with sign flip).
2026-04-07	Add training-monitor early-stop demo: 96% compute savings on DFA	YurenHao0426
	Demonstrates the practical use case of the protocol — not as a post-hoc audit but as an in-training abort condition. Walks through the existing per-epoch trace and shows when the protocol would have triggered an early stop on DFA training and what the saved compute would be. Result: DFA on 4-block d=256 ResMLP fires diagnostic (b) at epoch 4 with test acc 0.3076. The final acc at epoch 100 is also 0.3076 (identical). Stopping at epoch 4 saves 96% of compute with zero headline acc loss.
2026-04-07	Cross-architecture temporal validation: 3 archs x 3 seeds x 2 methods	YurenHao0426
	ResMLP (4-block d=256, with out_ln, CIFAR-10): s42: DFA (a) ep 8, (b) ep 4, acc 0.308 s123: DFA (a) ep 11, (b) ep 4, acc 0.320 s456: DFA (a) ep 8, (b) ep 3, acc 0.300 ViT-Mini (4-block d=128, cls token + terminal LN, CIFAR-10): s42: DFA (a) ep 1, (b) ep 3, acc 0.256 s123: DFA (a) ep 1, (b) ep 2, acc 0.202 s456: DFA (a) ep 1, (b) ep 3, acc 0.253 StudentNet (4-block d=128, NO terminal LN, synthetic alpha=1.0): s42: DFA (a) ep 18, (b) NEVER, acc 0.332 s123: DFA (a) ep 14, (b) NEVER, acc 0.314 s456: DFA (a) ep 25, (b) NEVER, acc 0.336 BP: never fires on any seed x any architecture (9/9 sanity passes). Key cross-architecture finding: diagnostic (b) is specifically the LN- driven failure mode. Without out_ln, the BP grad never crosses the 1e-7 floor, even though (a) still fires (the residual stream still grows, just without the LN-cancellation pathology that drives the BP grad to the floor). This is the causal architectural control: (b) specifically tests 'is terminal-LN gradient cancellation active?' and (a) tests 'is the residual stream growing without bound?'. They are linked but separable. This is the §3 cross-architecture validation evidence.
2026-04-07	Protocol diagnostic (a): use max per-block growth, not max/min ratio	YurenHao0426
	Old metric: max(\|\|h\|\|) / max(\|\|h_0\|\|, eps). False-positives on ViT-style architectures because the cls token at layer 0 (right after patch_embed) has anomalously small magnitude (~0.3-1.5), inflating the ratio even on healthy BP-trained ViTs. New metric: max_l(\|\|h_{l+1}\|\| / \|\|h_l\|\|) — the largest single-block residual amplification. Architecture-invariant. Calibration: - BP-trained, late training: <5x per block - BP ViT, early epochs (cls token resolving): 13-25x max - DFA-trained ResMLP/ViT: 100-4000x per block Threshold raised from 10 to 50 to sit cleanly between healthy-early- training (max 25) and failure-regime (min 100). Re-verifications: - smoke test (BP/DFA/EP): all 3 verdicts unchanged - random init (3 seeds): trustworthy on all 3 - 5-method audit table single-seed: identical verdicts - decision-utility ablation: identical (still 0/5 by S1, 3/5 by S_full) - temporal evolution 3-seed: (b) now fires first at ep 3-4, (a) at ep 8-11. Both well before training ends. The 'protocol fires ~92 epochs early' story still holds. - ViT temporal evolution: BP no longer false-fires; DFA fires (a) ep 1, (b) ep 3 — protocol works on the second architecture.
2026-04-07	Add reproducers for pitfalls 1-3 in CHECKLIST.md	YurenHao0426
	Each bug from the catalog has a synthetic reproducer that runs in <1 sec without GPU: Bug 1: x.norm(-1) on a 2x2 tensor returns 1.143 (L_{-1} of whole tensor) instead of [5, 10] (per-row L_2 along dim=-1). Bug 2: F.cosine_similarity(a, b) with \|\|b\|\|=5e-10 returns +0.000905 instead of the true +0.018101. The clamp (eps=1e-8) underestimates the divisor 20x. Bug 3: 5e-10 in fp16 -> 0 (underflows smallest subnormal ~6e-8). Downstream F.cosine_similarity returns NaN. bf16 works because it shares fp32's exponent range. Bugs 4-6 (Bs reproducibility, aggregation, layer-0 dominance) require a trained network and are demonstrated inside audit_table and ablation_decision_utility.
2026-04-07	Temporal evolution 3-seed: protocol fires at DFA epoch 3-4 on all seeds	YurenHao0426
	s42: (a)+(b) fire at epoch 4, DFA final acc 0.3076 s123: (a)+(b) fire at epoch 4, DFA final acc 0.3203 s456: (a)+(b) fire at epoch 3, DFA final acc 0.2998 BP never fires on any seed (final acc 0.61-0.63). The 'protocol catches it 96 epochs early' finding is fully reproducible across seeds.
2026-04-07	Add temporal diagnostic evolution: protocol fires at epoch 4 of DFA	YurenHao0426
	Replays per-epoch logged data from results/snapshot_evolution_v2/ through the protocol thresholds. Result: diagnostics (a) \|\|h_l\|\| explosion AND (b) \|\|g_L\|\| at floor BOTH first fire at epoch 4 of DFA training. At that point, DFA test acc is 0.308 — its final value at epoch 100 is also 0.308. The protocol could have walked back the headline 96 epochs before training finished. DFA's gamma hovers at 0.087-0.107 for all 100 epochs. A reviewer looking at acc+gamma would conclude 'DFA is hovering at 31% acc with ~0.10 alignment, both reasonable'. Wrong on both counts. BP never fires any diagnostic at any epoch. Stays bounded at \|\|h_L\|\|~200, \|\|g_L\|\|~3-5e-5, accuracy climbs to 0.61. This is the temporal validation of decision utility: the protocol catches the pathology AS IT HAPPENS, not just retrospectively.