summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-04-08§3 fix: correctly distinguish DFA/SB/CB local credit vectorsYurenHao0426
Previous §3 ¶1 wrote the local loss as -<f_l, B_l^T e_T> as if it applied to DFA, SB, and CB all three. But that's only DFA's form. SB and CB use learned bridge networks to derive credit: - DFA: a_l = B_l^T e_T (fixed random projection) - State Bridge: a_l = gradient of CE(head(LN(G_psi(h_l, t_l, s))), y) where G_psi is a learned state predictor of h_L - Credit Bridge: a_l = gradient of learned value net V(h_l, t_l, s) The fix correctly writes the shared local loss form -<f_l, a_l> and defines a_l for each method in-line. This also serves as the first definition of SB and CB in the paper (previously they were named in Table 1 without being defined). Main content still ends at p9 (just slightly before the bottom margin now); references span p9-p10 but are not counted against the 9-page content budget. Total 17 pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Polish: fix §3 typo 'to an meaningful' -> 'to a meaningful'YurenHao0426
2026-04-08OPTION F polish pass: update abstract, §1 ¶2, §6 ¶3, §7 ¶2 with round ↵YurenHao0426
38-40 findings Codex round 40 sequencing: A then F. OPTION F (polish) pass found 4 real issues: 1. Abstract: added narrow conditional-positive finding sentence about SB+penalty beating the shallow baseline + cos-vs-acc dissociation 2. §1 ¶2: replaced 'deep cosine can improve to about +0.16' (DFA-specific) with a fuller cross-method statement mentioning SB +0.32, CB +0.68, and the cos-vs-acc dissociation 3. §6 ¶3 (protocol asymmetry): added one sentence noting that the cross-method cos-vs-acc dissociation REINFORCES the necessity of keeping all four diagnostics separate 4. §7 ¶2 (limits): upgraded 'terminal-LN interpretation is observational rather than causal identification' to 'established causally on the audited residual ResMLP via the matched same-backbone no-terminal-LN control but not proven to extend beyond that architecture family' — reflects round 36 wording upgrade based on existing April 7 no_outln data All four changes are prose-level updates driven by data that was already in the paper. Main content still fits at 9 pages exactly (E&D limit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 40 §4 update: cos-vs-acc 3-part proposition (Observation / Inference ↵YurenHao0426
/ Mechanism hypothesis) Codex round 40 recommended turning the Mode 2 dissociation from empirical curiosity into a methodological claim via a 3-part proposition: 1. Observation: CB+pen and DFA+pen reach same acc despite 4x deep-cos gap; SB+pen best acc with intermediate cos 2. Inference: layerwise BP-cosine is NECESSARY to rule out grossly wrong credit signals but NOT SUFFICIENT to certify usable credit for depth 3. Mechanism hypothesis: usefulness depends on whether local updates induce coordinated forward-state change across blocks, not just angular agreement with BP Method framing (codex-approved safer versions): - CB = 'gradient-direction surrogate' (high angular agreement, low functional credit) - SB = 'state-level downstream teaching signal' (lower angular agreement, higher functional credit) - Explicitly framed as HYPOTHESIS not theorem Main content still 9 pages exactly (within E&D limit). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 38 CB+penalty multi-seed: confirm clean cos-vs-acc dissociation across ↵YurenHao0426
3 fixed-feedback methods CB+penalty 3-seed (4-block d=256, 30ep, lam=1e-2): - acc 0.360±0.003 (same as DFA+pen, 9pp below SB+pen) - ||h_L||=5680±178, ||g_L||=1.9e-5 (HEALTHY) - layer-0 cos +0.652±0.005 - deep cos +0.679±0.008 (4x DFA+pen, 2x SB+pen) - deep rho +0.464±0.025 (6x DFA+pen) Final 3-method rescue comparison: DFA+pen: acc 0.363, deep cos 0.155, deep rho 0.080 SB+pen: acc 0.453, deep cos 0.322, deep rho 0.402 CB+pen: acc 0.360, deep cos 0.679, deep rho 0.464 Clean cos-vs-acc dissociation: - CB has 4x higher deep cos than DFA but SAME accuracy - SB has intermediate deep cos but HIGHEST accuracy - Alignment to BP gradient is NECESSARY but NOT SUFFICIENT for usable credit Paper updates: - §4 ¶4: now includes all 3 methods with numbers, adds 'cos is necessary but not sufficient' framing based on 3-method dissociation - Appendix K: adds CB+pen 3-seed rows + vanilla CB baseline for comparison - Main content still 9 pages exactly (within E&D limit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Appendix G + J: 3-seed depth scan L=4 validation + SB/CB random_targets ↵YurenHao0426
100ep convergence - Appendix G: depth scan L=4 now 3-seed (s42, s123, s456). 3-seed DFA layer-0 cos +0.412±0.011, deep cos -0.0004±0.0008, CB deep cos +0.039±0.010. Indistinguishable from single-seed row shown in table. - Appendix J: full 100-epoch random_targets trajectory for SB and CB: SB: ||h_L||=3.6e5, ||g_L||=4e-8 (at floor), acc 0.100 (chance) CB: ||h_L||=1.38e8, ||g_L||=0 (collapsed), acc 0.085 (chance) Both reach Mode 1 (a)+(b) at 100ep, consistent with DFA's 1.67e8 / 8e-12. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Appendix J: add EP random_targets full 100ep convergence ||h_L||=2085, acc=0.081YurenHao0426
2026-04-08Round 38 paper update: §4 + §5 + new Appendix K with SB+penalty 3-seed resultYurenHao0426
CODEX ROUND 39 VERDICT: PAPER-CHANGING for SB; wait for CB multi-seed for CB claims. Round 38 3-seed SB+penalty (4-block d=256, 30ep, lam=1e-2): - acc 0.453±0.003 (BEATS shallow baseline 0.349 by +10.4pp -- FIRST non-BP method) - ||h_L||=302±8 (contained, not silenced) - ||g_L||=1.8e-4 (HEALTHY) - deep cos +0.322±0.007 (2x DFA+pen +0.155) - deep rho +0.402±0.015 (5x DFA+pen +0.080) Penalty rescue magnitudes (method-dependent): - DFA: +5.5 pp (0.306 -> 0.363) - SB: +24 pp (0.213 -> 0.453) - CB: +15 pp (single seed, multi-seed in flight) - BP: -8 pp (capacity cost, 0.609 -> 0.530) Paper updates: - §4 ¶4 NEW: Mode 2 has method-dependent severity within fixed-feedback family; SB+penalty is the first audited non-BP method to substantively use deep blocks via intervention; deep cos doesn't predict acc across methods (methodological obs) - §5 ¶3 EXTENDED: BP+penalty -> 3x penalty control (BP, DFA, SB) with all margins vs frozen-blocks baseline; BP-to-SB gap only 7.7 pp vs BP-to-DFA gap 17 pp - Appendix K NEW: full SB+penalty 3-seed table with vanilla SB and DFA+pen comparison Main content stays at 9 pages exactly (within E&D limit). Total 16 pages. CB multi-seed (s123, s456) launched in parallel (PIDs 576938, 576939) — claims deferred until those land. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Appendix H: H2 ablation now 3-seed (s42, s123, s456) — multi-seed ↵YurenHao0426
validates falsification 3-seed mean: ||h_L||=8.2e7, ||g_L||=1.9e-10 Per-seed: ||h_L|| in {1.06e8, 3.15e7, 1.09e8}, ||g_L|| in {1.08, 2.94, 1.77}e-10 All deeply below the (b) floor, all confirm Mode 1 (a)+(b) fire on no-residual ResMLP+terminal-LN. Multi-seed H2 falsification of 'residual skip causes Mode 1' is now robust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 38: add --penalty_lam flag to cifar_resmlp.py for Mode 2 cross-method testYurenHao0426
Patches: - main(): add --penalty_lam (separate from CB's bridge temperature args.lam) - train_dfa block update (line 195): add penalty_lam * (f_l**2).sum(-1).mean() - train_state_bridge block update (line 326): same penalty - train_credit_bridge block update (line 533): same penalty Codex round 38 GO STAGE: keep penalty separate from CB lam, blocks-only, sanity-check that hidden_norms remain nontrivial (not silencing the blocks). 2-epoch smoke (results/round38_smoke_sbcb_pen) passes the silencing check: SB ||h_L||=229, CB ||h_L||=1258, both nontrivial. Deep cosines positive across all layers for SB ([0.28, 0.25, 0.23]) and rising for CB ([0.04, 0.08, 0.13, 0.15]). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Fix bibtex citation key: refinetti2023align -> refinetti2023aligning ↵YurenHao0426
(matches bibitem)
2026-04-08Fix §3 vanilla DFA comparison: use 100ep audit value 0.306±0.006 (matches ↵YurenHao0426
table), not 30ep 0.308±0.014
2026-04-08Fix precision: ||g_L|| = 7.2e-4 (mean), not 7.4e-4 (rounding)YurenHao0426
2026-04-08Round 37 OPTION C: §3 compression — preserve causal structure, drop ↵YurenHao0426
per-round narration Codex round 37 verdict: 'Page budget is the bottleneck, not mechanism uncertainty.' Mode 1 is mechanism-complete after rounds 32-36. Strict compression rule: 'one claim sentence per falsified alternative, one for the positive mechanism, everything numeric goes to appendix tables.' §3 rewrite (4 -> 3 paragraphs): 1. Phenomenon class — 6-line geometric argument inlined as one sentence; LN Jacobian derivation for (b); empirical anchors for vanilla DFA. 2. Falsification chain — 4 alternative attributions each in one sentence: not residual-skip-driven (App H), not task-signal-driven (App I), not DFA-specific (App I), not shared by EP. 3. Positive necessity for (b) — same-backbone no_outln control with full numbers; cross-architecture support; temporal early-fire result. Result: main content 9 -> 8 pages (1 page of slack restored). Total 15 -> 14 pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 36: upgrade (b) wording + add EP random-target neg control to §3YurenHao0426
Two changes from round 36: 1. §3 paragraph 3: replace 'observational association' with full causal claim based on existing April 7 no-out_ln data (3 seeds, ResMLP-d256+terminal-LN removed, residual skip kept): ||h_L||=1.21e7 (Mode 1 (a) still fires) but ||g_L||=7.4e-4 (HEALTHY, ~10000x above floor — (b) eliminated). Final acc 0.327±0.013 indistinguishable from vanilla DFA's 0.308±0.014. Wording upgraded to 'terminal LayerNorm is necessary for Mode 1(b) in the audited residual ResMLP and ViT-Mini setting'. 2. §3 paragraph after random-target ablation: add EP under random targets smoke result (||h_L||=586 at ep 5 vs DFA's 14510 at ep 3, 25x gap). Random-target assay now cleanly separates fixed-feedback methods (explode) from EP (bounded). Cross-method negative control complete. - experiments/ep_baseline.py: add --random_targets flag + train_ep parameter - v2.5 paper compiles to 15 pages, main content 1-9 (right at E&D limit) Combined picture (rounds 32-36): - Mode 1 (a) localized to 'fixed-feedback local-credit objectives without scale control on architectures absorbing scale at output'. Falsified: residual skip (round 33), task signal (round 34), DFA-specific (round 35). EP is the working negative control (round 36). - Mode 1 (b) localized to terminal LayerNorm via the 1/||h|| Jacobian. Causally established by April 7 no_outln 3-seed data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Add 100ep trajectory confirmations to Appendix I (H2) and Appendix J (random ↵YurenHao0426
targets) H2 100ep: ||h_L||=1.06e8, ||g_L||=1.09e-10 (below (b) floor) Random-target DFA 100ep: ||h_L||=1.67e8, ||g_L||=8e-12 (worse than vanilla) Both fully confirm the smoke-test trends at converged training horizons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 35: SB and CB also show data-agnostic Mode 1 growth on random targetsYurenHao0426
- experiments/cifar_resmlp.py: add --methods filter and --random_targets flag; extend compute_diagnostics to log hidden_norms_per_layer and bp_grad_norms_per_layer - paper/main.tex §3 ¶1: broaden random-target finding to all 3 fixed-feedback methods (DFA: ||h_L||=14510, SB: ||h_L||=6225, CB: ||h_L||=19974 at ep 3, all at chance acc) - paper/main.tex Appendix J: extended with cross-method smoke-test table This generalizes the §3 mechanism story from 'DFA-specific' to 'all 3 audited fixed-feedback local-credit methods'. Combined with rounds 32-34, the proximate cause of Mode 1 (a) is now well-localized: - Not requires residual skip (round 33 H2 walkback) - Not requires task signal (round 34 random targets, DFA) - Not DFA-specific (round 35 random targets, SB+CB) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 34 random-target ablation: Mode 1 fires under random labels tooYurenHao0426
Codex round 34 picked OPTION A (i.i.d. random class targets per minibatch) over the analytic-only OPTION D as the most discriminating test of 'is (a) intrinsic to DFA update geometry or task-driven?'. Smoke test result is unambiguous: ep 0: ||h_L||=8.9 ||g_L||=9.8e-4 ep 1: ||h_L||=1616 ||g_L||=5.1e-6 ep 2: ||h_L||=9768 ||g_L||=8.5e-7 ep 3: ||h_L||=14510 ||g_L||=5.6e-7 (test acc still at chance ~0.07) Three orders of magnitude growth in ||h_L|| in 3 epochs, three orders of magnitude collapse in ||g_L|| in the same 3 epochs, with NO task signal whatsoever — DFA's local-loss geometry is the proximate driver, not data adaptation. - experiments/snapshot_evolution_residual_explosion.py: add --random_targets and --skip_bp flags - paper/main.tex §3 ¶1: replace 'no explicit scale constraint' framing with codex round 34's 6-line geometric argument and the random-target empirical falsifier - paper/main.tex Appendix J: full smoke-test table + interpretation - v2.3: 14 pages total, main content still 8 pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 32+33 H2 ablation: add no_residual_add flag; falsify residual-as-cause ↵YurenHao0426
hypothesis - models/residual_mlp.py: add residual_add and w2_std flags (default unchanged) - experiments/snapshot_evolution_residual_explosion.py: add --no_residual_add and --w2_std CLI flags - paper/main.tex §3 ¶3: add 1-sentence reference to no-residual control showing Mode 1 still fires - paper/main.tex Appendix I: full smoke-test table + interpretation - v2.2 main content stays at 8 pages (within 9-page E&D budget); 13 pages total Smoke test (3 ep, w2_std=0.5, seed 42): - DFA no-residual: ||h_L|| 4.69 -> 22050, ||g|| 1.6e-7 (Mode 1 (a) fires; (b) at floor) - BP no-residual: acc only 0.16 at ep 3 (architecture is partially degenerate) - Conclusion: residual skip is NOT necessary for Mode 1; the proximate trigger is more general - Codex round 33 verdict: WALK BACK H2; demote 100ep run to confirmatory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Add depth-sweep evidence to §5 + Appendix H from existing d=512 ↵YurenHao0426
L=2,4,6,8,12 data The cifar_depth_scan_s42 results were already on disk but not surfaced in the paper. Across L in {2,4,6,8,12} on the d=512 ResMLP, DFA's layer-0 cosine stays in [+0.39,+0.40] and its mean deep cosine stays within [-0.005,+0.000], while BP retains a deep cosine of +0.94 even at L=12. This rules out the 'too deep to receive useful credit' explanation: making the network shallower does not reach the deep blocks any better. - §5 paragraph 4: one-sentence depth-invariance summary citing the new appendix - New Appendix H: Depth-Sweep Layerwise Profiles, with full table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 31: fill §6 Recommended Protocol + §7 Discussion prose; v2 ↵YurenHao0426
content-complete Six paragraphs total via codex round 31: - §6 §6.1: measurement-validity-first ordering with 63x and 24338x calibration gaps - §6 §6.2: minimal four-check protocol; decision-utility 0/5 vs 3/5 walk-back - §6 §6.3: conservative asymmetry (BP/EP preserved, DFA/SB/CB walked back) - §7 §7.1: scope claim — evaluation failure not algorithmic impossibility - §7 §7.2: limits — CIFAR-10 only, observational LN interpretation, lower-bound BP+penalty control - §7 §7.3: lesson — decompose evaluation question, position vs Jordan/O'Bray/Paleka Compiles to 12 pages (main content 1-8, refs+appendices 8-12), within E&D 9-page main budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Round 30: fill in §5 Intervention and Cross-Arch prose (4 paragraphs) via codexYurenHao0426
2026-04-08Round 29: fill in §4 Failure Mode 2 prose (3 paragraphs) via codexYurenHao0426
2026-04-08Round 28: fill in §3 Failure Mode 1 prose (4 paragraphs) via codexYurenHao0426
2026-04-08Round 27: fill in §2 Audit prose (4 paragraphs) via codexYurenHao0426
Codex round 27 produced 4 substantive paragraphs for §2, replacing thin placeholders. Each paragraph follows round 23's prescription: P1: canonical setting (4-block d=256, AdamW, 100 ep, 3 seeds) + table/figure references P2: under field-standard reporting, all 5 methods look fine P3: EP internal comparison — same trustworthy measurement regime BUT EP depth contribution is also marginally negative (-3.3 pp vs frozen baseline). Honest about EP being trustworthy-measurement but neutral-depth-contribution (per round 27 prompt's caveat). P4: frozen-baseline comparison gives the walk-back: BP +26.6 pp, DFA -4.3 pp, SB -14.4 pp, CB -6.0 pp. Diagnostic split lines up with acc split. Compiles cleanly. Next: §3 Failure Mode 1 prose via round 28.
2026-04-08Round 26: fill in §1 Introduction prose (3 paragraphs) via codexYurenHao0426
Codex round 26 produced 3 substantive paragraphs for §1, replacing the 3 thin placeholder sentences. Each paragraph follows round 23's prescription: P1: claim sentence + numerical evidence (DFA 0.306 < frozen 0.349; layer-0 +0.42 vs deep ~0; ||g_L|| ~ 5e-10 < eps clamp 1e-8) + closing 'measurement regime must be valid' P2: 5-method audit shows the two modes; intervention dissociation (lambda=1e-4 alleviates Mode 1 not Mode 2; vanilla ep 1 has meaningful ||g|| but deep cos still ~0) + closing P3: methodological contribution framing + cite Paleka, O'Bray, Jordan + closing roadmap Compiles cleanly. PDF still has §2-§7 with topic sentences only (TODO next via per-section codex rounds).
2026-04-08Fill in tables 1-3 + generate figures 2/4/5 from existing dataYurenHao0426
Tables filled with real values: Table 1: 5-method audit (3-seed mean ± std for acc, headline Γ, verdict) Table 2: 4-condition mode 2 validation (cos and ρ values from existing checkpoint measurements) Table 3: protocol thresholds (50×, 1e-7, 0.30, 2pp) Figures generated from existing data: fig2_decision_utility.pdf: 5×7 verdict heatmap from results/protocol_audit/ablation_decision_utility.json fig4_penalty_rescue.pdf: 3-panel — trajectory + cos/ρ bars + 2×2 acc from snapshot_evolution_v2 + dfa_residual_penalty + bp_with_penalty fig5_cross_arch_summary.pdf: 5×4 BP/DFA verdict matrix across architectures Compiles to 8 pages with all tables/figures rendered. §1-§7 main body still has only paragraph topic sentences (TODO: per-section prose filling via codex). Figure numbering is wrong (codex put figures in section order not numerical order — need fixing).
2026-04-08v2 skeleton from round 25: section structure now matches round 23YurenHao0426
Round 24's skeleton had 3 deviations from round 23 redo: - Made §3 'Diagnostic Protocol' instead of 'Failure Mode 1' - Collapsed Mode 1 + Mode 2 into one §4 - Added §6 'Reference Implementation' (was supposed to be dropped) Round 25 fixed all three. New §3-§7 match round 23 redo exactly: §3 Failure Mode 1: Measurement Degeneracy §4 Failure Mode 2: Low Intrinsic Credit-Direction Quality §5 Intervention and Cross-Architecture Evidence §6 Recommended FA Evaluation Protocol §7 Discussion, Limits, Conclusion Also added: - In-line bibliography with 12 \bibitem entries (Paleka, O'Bray, Jordan + FA literature) — citations resolve correctly now - Appendices A-G with actual prose content (not just headers) - 7-pitfall catalog with descriptions - Walk-back chain methodology paragraph - 7-validation summary table Compiles to 9 pages with figures 1+3 inline (existing PNGs) and figures 2/4/5 as placeholder text PDFs (TODO: regenerate). Tables 1/2/3 still have TODO placeholders for numerical values. Next: fill in tables 1-3 with existing JSON data, generate figures 2/4/5 from existing data, then consult codex per-section for prose filling.
2026-04-08Archive failed v1 draft as v1_rejected.tex; remove main.tex/main.pdfYurenHao0426
User rejected the v1 draft as '流水账实验报告' (sequential experiment report). Round 22 + 23 redid the outline with E&D-genre prescription. Saving v1 as v1_rejected.tex for reference. New main.tex will be written from round 24 LaTeX skeleton (codex offered to provide it), section by section, with codex check on each section's prose.
2026-04-08Compile paper PDF + fix bibstyle for tectonicYurenHao0426
Compiled with tectonic (the only LaTeX engine on this server). Two fixes needed: 1. Pass [numbers,compress] to natbib via PassOptionsToPackage so the numerical bibliography style works 2. Use bibstyle 'abbrvnat' instead of 'plain' (compatible with natbib) Result: 10-page PDF, ~7.5 content pages (well under 9-page E&D limit), references on pages 8-9, appendices A-D on pages 9-10. PDF uploaded to broker as 1843506b_main.pdf for user review.
2026-04-08Paper main.tex: add §5.4 λ sweep dissociation tableYurenHao0426
The λ sweep is the strongest single piece of two-mode separation evidence and doesn't require the early-epoch caveat. New §5.4 with table showing: λ=0: vanilla, both modes broken λ=1e-4: mode 1 ALLEVIATED (||h_L||=2.4e4, ||g||=6.3e-7), mode 2 NOT (cos -0.022, rho -0.004) λ=1e-2: mode 1 alleviated, mode 2 partially (cos +0.16, rho +0.09) λ=1e-1: slightly over-constrained (cos +0.13, rho +0.07) The two modes have different intervention thresholds. §5.4 is now the killer evidence; the early-epoch disambiguation in §5.3 becomes supporting. Updated section summary to 'five validations'.
2026-04-08λ sweep on penalty strength: lam ∈ {1e-4, 1e-2, 1e-1} cos + rho resultsYurenHao0426
Round 19's #5 recommendation. Major new finding for the paper: | lam | acc | ||h_L|| | ||g_2|| | deep cos | deep rho | |-------|------:|--------:|--------:|---------:|---------:| | 0 | 0.308 | 4e8 | 5e-10 | -0.008 | -0.003 | | 1e-4 | 0.359 | 2.4e4 | 6.3e-7 | -0.022 | -0.004 | | 1e-2 | 0.363 | 4e4 | 1e-6 | +0.155 | +0.080 | | 1e-1 | 0.349 | 1.2e4 | 1.6e-6 | +0.131 | +0.067 | KEY: at lam=1e-4 the residual stream is contained AND ||g|| is healthy (mode 1 ALLEVIATED), but deep cos and rho are still essentially zero (mode 2 NOT alleviated). This is independent dissociation of the two modes via penalty strength: at weak penalty you get mode 1 fix WITHOUT mode 2 fix. Both metrics (cos, rho) agree at every lambda. Penalty strength has a non-monotonic effect on mode 2 alleviation: - lam=1e-4: too weak, mode 2 not alleviated (cos ~0) - lam=1e-2: sweet spot, cos +0.16, rho +0.08 - lam=1e-1: slightly over-constrained, cos +0.13, rho +0.07 This is the 7th independent validation of the two-mode separation, and the strongest one because it shows mode 1 alleviation WITHOUT mode 2 alleviation — the modes do not even respond to the same intervention strength.
2026-04-08First draft of NeurIPS 2026 E&D paperYurenHao0426
Title: 'Beyond Accuracy and Alignment: A Diagnostic Evaluation Protocol for Feedback Alignment' Structure (per round 21 prescription): Abstract: 'broken because conflated' framing, 2 distinct modes named §1 Introduction: discovery hook -> 2-mode framing -> contribution §2 Related work §3 Audit (the field-standard pair walks back nothing) §4 The diagnostic protocol (4 diagnostics, calibrated thresholds, decision-utility ablation, cross-architecture validation) §5 Two distinct failure modes (mechanism, penalty rescue, direct cosine measurement, hypothesis-disambiguation, capacity-cost control) §6 Limitations §7 Broader impacts §8 Conclusion Appendices: reproducibility, 7-pitfalls catalog, walk-back chain (4 step), all 6 validations of the two-mode separation Includes 4 result tables, ~10 references, structured as eandd track double-blind submission. 760 lines of LaTeX, balanced environments verified. Ready for compilation on a system with pdflatex. Template: paper/neurips_2026.{sty,tex}, downloaded from official NeurIPS 2026 source. checklist.tex also unzipped.
2026-04-08Add perturbation correlation metric calibrationYurenHao0426
Anchors the rho +0.08 finding with positive and negative controls: positive control (BP grad as a_l): +0.9965 (perfect, expected ~1) negative control (random vector): +0.0056 (noise floor, expected ~0) vanilla DFA s42 (||g|| at floor): +0.0020 (within noise floor) penalized DFA s42 (||g|| healthy): +0.0937 (~48x above noise, ~9% of perfect) The metric is well-calibrated. BP gradient as a_l gives rho ~1 (Taylor), random vector gives rho ~0 (noise floor), random feedback in degenerate regime is indistinguishable from noise floor, random feedback in penalized regime is small-but-well-above-noise (~48x noise, ~9% perfect). Defensible paper claim: 'rho +0.08 is small in absolute terms but clearly above the calibrated noise floor and on the order of 10% of the perfect-signal ceiling — consistent with the 60% of BP accuracy the penalized network achieves.' Closes round 19's 'is rho +0.08 a meaningful number on this metric?' question with explicit calibration.
2026-04-08PAPER_OUTLINE: add 6th validation (perturbation correlation triangulation)YurenHao0426
2026-04-08Extend perturbation audit to vanilla early-epoch checkpointsYurenHao0426
Cross-metric disambiguation confirmation. Vanilla DFA at ep 1 (meaningful regime, ||g||~6e-7) deep rho across 3 seeds: s42: deep rho -0.008 s123: deep rho +0.000 s456: deep rho -0.000 mean: -0.003 ± 0.005 Compare to penalized DFA 3-seed: deep rho +0.080 ± 0.011. The disambiguation (penalty CREATES alignment, not just reveals it) is now confirmed by TWO independent metrics: - cos: vanilla -0.008 ± 0.013, penalized +0.155 ± 0.025 - rho: vanilla -0.003 ± 0.005, penalized +0.080 ± 0.011 Both metrics agree on the vanilla→penalized transition. The l0 (embedding) rho is high (~0.25-0.29) at every vanilla checkpoint, mirroring the cos l0 +0.42 — the embedding layer is genuinely useful while the deep blocks are not, by BOTH metrics. The penalty restores some deep usefulness to ~+0.08 rho / +0.16 cos. Cross-metric agreement rules out single-metric artifacts on either side.
2026-04-08EVIDENCE_SUMMARY: add 6th validation (perturbation correlation triangulation)YurenHao0426
2026-04-08Add perturbation correlation audit (round 19's recommended alt metric)YurenHao0426
Codex round 19 said: 'use nudging or perturbation correlation on the penalized checkpoints. In the healthy-gradient regime, that is a more direct is-the-local-signal-useful test than cosine alone'. Result on existing checkpoints (eps=1e-3, M=32 random directions, n=1024): vanilla DFA s42: deep rho +0.002 penalized DFA s42 lam=1e-2 30ep: deep rho +0.094 penalized DFA s123 lam=1e-2 30ep: deep rho +0.073 penalized DFA s456 lam=1e-2 30ep: deep rho +0.072 penalized 3-seed mean: deep rho +0.080 ± 0.011 This INDEPENDENTLY TRIANGULATES the cos +0.17 finding via a different metric: - vanilla deep cos ~0 matches vanilla deep rho ~0 - penalized deep cos +0.155 matches penalized deep rho +0.080 The two metrics measure different things: - cos = directional alignment with BP grad - rho = correlation between predicted and true loss change under random perturbation Both show the same pattern: penalty creates partial usefulness from essentially zero. This is the 6th independent validation of the mode 2 'penalty creates partial alignment' framing. Crucially, rho doesn't use F.cosine_similarity (no eps clamp), and it measures sample-level loss change correlation rather than direction match — so it rules out 'cos is capturing some directional artifact unrelated to local usefulness'.
2026-04-08EVIDENCE_SUMMARY: §4 fully rewritten under locked two-distinct-modes framingYurenHao0426
§4 now reflects all 5 independent validations of the converged framing: 1. Direct deep cos on penalized DFA (3 seeds): +0.155 ± 0.025 2. Null calibration with fresh Bs: +0.002 ± 0.022 (real signal) 3. Hypothesis B disambiguation (vanilla early ep): -0.008 ± 0.013 4. BP+penalty 2×2 control: 17 pp residual = credit quality 5. Multi-seed lock-in: 24 measurements all near zero Round 20 language tightening applied: - 'lower bound on non-capacity gap' instead of 'clean isolation' - Explicit caveats about end-to-end vs local-loss difference - Counter to 'different optimization regime' objection The §4 framing is locked. Five independent validations done. Stop iterating, start writing.
2026-04-08PAPER_OUTLINE: round 20 language tightening + 5 validation summaryYurenHao0426
§4 updates per round 20: - Soften 'confirmed' to 'strongly supports' - Add §4.4 BP+penalty capacity-cost control with the round 20 phrasing: 'lower bound on residual gap under matched architecture/data/optimizer/ penalty, after accounting for the penalty's direct capacity cost in BP' - Add multi-seed lock-in to §4.3 (24 measurements all near zero) - List 5 independent validations supporting the converged framing The §4 narrative is now complete and the framing is locked.
2026-04-08Multi-seed vanilla DFA early-epoch cos: lock-in for round 19 disambiguationYurenHao0426
Round 20's minimal lock-in experiment: 3 seeds × {ep 1, ep 2} vanilla DFA cosine. Closes the 'single-seed fluke' objection. Vanilla DFA early-epoch deep cosines (l1-l4): | seed | ep | ||g|| | deep mean | |---|---|---|---| | 42 | 1 | 6.7e-7 | -0.025 | | 42 | 2 | 1.5e-7 | -0.038 | | 123 | 1 | 6.5e-7 | +0.002 | | 123 | 2 | 1.4e-7 | -0.006 | | 456 | 1 | 3.9e-7 | +0.000 | | 456 | 2 | 8.5e-8 | -0.009 | 3-seed mean at ep 1 (most meaningful regime): -0.008 ± 0.013 3-seed mean at ep 2: -0.018 ± 0.018 ALL 24 measurements (3 seeds × 2 ep × 4 deep layers) are in [-0.04, +0.02]. Compare to penalized DFA 3-seed mean +0.155 ± 0.025. The penalty CREATING deep alignment finding is now seed-robust. Three seeds × two early epochs all show vanilla deep cos essentially zero even when ||g|| is in the meaningful regime. This is the round 20 lock-in. Framing is locked.
2026-04-08BP+penalty control result: mode 2 (intrinsic credit quality) confirmed REALYurenHao0426
BP + lam=1e-2 ||f||^2 penalty trained for 30 epochs (s42): ep 30 final: test_acc 0.5303 margin vs DFA-shallow 0.349: +18.13 pp The 2x2 accuracy grid: no penalty with penalty BP 0.609 0.530 DFA 0.308 0.363 Penalty effect on BP: -8 pp (capacity regularization cost) Penalty effect on DFA: +5.5 pp (rescue from active harm) Mode 2 (intrinsic credit quality) is confirmed REAL by this control: even after the penalty's capacity cost, BP achieves +18 pp depth utilization. DFA under the same penalty achieves only +1.4 pp. The difference (~17 pp) cannot be attributed to capacity loss — it is genuine credit-quality cost of random feedback vs true backprop gradient. This validates the round 19 'two distinct failure modes' framing: mode 2 is not a penalty-induced regularization artifact.
2026-04-08Add BP+penalty control (round 19's #4 critical experiment)YurenHao0426
Trains end-to-end BP with the same lambda*||f_l(h_l)||^2 penalty used in the DFA penalty rescue. Tests whether the penalty's depth utilization loss in penalized DFA is intrinsic to DFA's random-feedback credit quality (mode 2) or due to penalty-induced capacity regularization. Decision rule: BP+pen margin > 25 pp -> mode 2 confirmed (penalty is not the cap) BP+pen margin < 5 pp -> penalty itself caps depth (capacity loss) intermediate -> both effects present
2026-04-08PAPER_OUTLINE: §4 rewrite under 'two distinct failure modes' framingYurenHao0426
After the round 19 disambiguation experiment confirmed hypothesis B (penalty CREATES deep alignment, not just reveals it), the paper §4 needs to use the new framing: Mode 1: measurement degeneracy via terminal LN gradient cancellation Mode 2: low intrinsic credit-direction quality of random feedback Both modes are direct-measured (mode 1 by diagnostic (b), mode 2 by per-layer cos in the meaningful regime). The penalty partially alleviates BOTH modes. Neither is fully fixed. §4 rewrite includes: - The two modes (4.1) - Penalty causal validation with 3-seed cos (4.2) - Disambiguation: vanilla early-epoch cos table proving hypothesis B (4.3) - Why the residual gap is partial alignment (4.4) - Why this framing is paper-cleaner than prior ones (4.5) Walk-back chain extended to 7 entries, with 6 and 7 happening same-day and converging on the final two-distinct-modes framing.
2026-04-083-seed multi-seed verification of penalized DFA deep cos = +0.17YurenHao0426
| seed | l0 | l1 | l2 | l3 | l4 | layer-mean | |---|---:|---:|---:|---:|---:|---:| | 42 | +0.316 | +0.169 | +0.151 | +0.165 | +0.166 | +0.193 | | 123 | +0.333 | +0.093 | +0.155 | +0.178 | +0.177 | +0.187 | | 456 | +0.339 | +0.131 | +0.123 | +0.150 | +0.150 | +0.179 | 3-seed mean deep cos (l1-l4): ~0.155 ± 0.025 3-seed layer-mean: +0.186 ± 0.007 The +0.17 finding is rock-solid, combined with: - null calibration: training-Bs +0.16 vs fresh-Bs +0.002 - hypothesis B confirmed: vanilla early ep deep cos ~0 - 3-seed reproducibility (this commit) This is the §4 evidence for the paper's 'penalty creates partial deep alignment, partially alleviating mode 2'.
2026-04-08DISAMBIGUATION: vanilla DFA early-epoch checkpoints + cos measurementYurenHao0426
Round 19's #3 critical experiment. Trained vanilla DFA s42 for 5 epochs, saved checkpoint at each, then measured per-layer cos(e_T B^T, BP grad). Key trajectory of ||g_l|| during vanilla DFA training: ep 0: ~1e-3 (random init, healthy) ep 1: ~1.4e-6 (3 OOM drop, STILL above 1e-7 floor) ep 2: ~3e-7 (above floor) ep 3: ~1.3e-7 (above floor, barely) ep 4: ~7e-8 (BELOW floor) ep 5: ~4e-8 (well below floor) So ep 1, 2, 3 vanilla checkpoints are in the MEANINGFUL ||g|| regime. Cos measurement on those: ep 1: l0=+0.42, l1=+0.005, l2=-0.028, l3=-0.039, l4=-0.038 ep 2: l0=+0.44, l1=-0.002, l2=-0.040, l3=-0.055, l4=-0.054 ep 3: l0=+0.43, l1=+0.007, l2=-0.039, l3=-0.054, l4=-0.054 DEEP-LAYER COSINES ARE ESSENTIALLY ZERO AT EVERY VANILLA EPOCH, even when ||g|| is in the meaningful regime (ep 1: ||g||=6.7e-7). Compare to penalized DFA s42 at 30 ep: deep cos = +0.17. Hypothesis B confirmed: the penalty CREATED the deep-layer alignment. It is a training outcome of the regularization, not a measurement-regime revelation. Paper implications: there are two distinct failure modes after all, but they are not 'scale + direction'. They are: (1) Measurement degeneracy via terminal LN gradient cancellation (caught by diagnostic (b)) (2) Low intrinsic credit quality of random feedback even in the meaningful regime (caught by direct cos measurement) The penalty partially alleviates BOTH (residual stream contained AND deep alignment improved from ~0 to +0.17), but neither fully.
2026-04-08Add vanilla DFA early-epoch checkpoint training (round 19 disambiguation)YurenHao0426
Trains vanilla DFA (no penalty) for max_epoch epochs and saves checkpoints + Bs at specified early epochs (default: 1, 2, 3, 4, 5). Logs per-layer ||h_l|| and ||g_l|| at each epoch so we can see when ||g_L|| crosses the 1e-7 floor. Codex round 19's #3 critical experiment for disambiguating: Hypothesis A: deep alignment was always there in vanilla DFA but hidden by the post-collapse measurement degeneracy Hypothesis B: deep alignment was created by the penalty intervention Test: measure deep-layer cos at vanilla checkpoints from ep 1-3 (when ||g_L|| should still be in the meaningful regime). If cos > 0 at ep 1-2 vanilla -> hypothesis A If cos ~ 0 at ep 1-2 vanilla -> hypothesis B
2026-04-08Add null calibration script: training-Bs vs fresh-Bs cos on penalized DFAYurenHao0426
Codex round 19's #1 critical control. Result on penalized DFA s42 (lam=1e-2, 30 ep): training-Bs deep-layer cos: +0.1627 fresh-Bs deep-layer cos: +0.0022 ± 0.0220 (n=20 draws) The +0.17 measurement is REAL signal, not artifact. The network specifically adapted to its training-time Bs during the penalized run. Fresh Bs give essentially zero cosine (within noise). This validates the walk-back interpretation: in the rescued regime where ||g_l|| is meaningful, DFA's local credit signal shows partial alignment with BP grad — and this alignment is specifically the network learning to align with its specific Bs. Round 19 caveat preserved: cannot yet distinguish whether the alignment was always present in vanilla but hidden by measurement degeneracy, OR whether it was created by the penalty intervention. The early-epoch vanilla checkpoint sweep (round 19's other proposed control) would disambiguate.
2026-04-08MAJOR: penalized DFA deep-layer cosine is +0.17, NOT zeroYurenHao0426
Direct deep-block credit measurement on penalized DFA s42 checkpoint (lam=1e-2, 30 epochs, just trained): per-layer cos(e_T B^T, BP grad) — TRAINING Bs, no eps clamp: l0: +0.316 (±0.188) ||g||=9.18e-7 ||a||=4.53 l1: +0.169 (±0.087) ||g||=8.87e-7 ||a||=4.57 l2: +0.151 (±0.084) ||g||=8.77e-7 ||a||=4.50 l3: +0.165 (±0.099) ||g||=8.73e-7 ||a||=4.64 l4: +0.166 (±0.098) ||g||=8.69e-7 ||a||=4.64 layer-mean: +0.193 Compare to vanilla DFA (existing measurement, scale-broken regime): l0: +0.42 l1-4: ~0 (essentially zero) CRITICAL INTERPRETATION: The penalty doesn't just fix scale, it ALSO restores deep-layer direction quality from ~0 to ~0.17. This contradicts the prior 'two failure modes' framing where I assumed direction would remain broken even after scale fix. The honest story is: - vanilla DFA: scale catastrophic, BP grad at floor, cosine measurement DEGENERATE (cos ~0 is noise dominance, not 'no alignment') - penalized DFA: scale fixed, BP grad healthy, cosine measurement INTERPRETABLE — and the value is +0.17 on deep layers (partially aligned, much less than BP's self-cosine of 1.0) - the +0.17 alignment explains why penalized DFA gets 0.36 (60% of BP's 0.61) — partial credit gives partial training, not zero training The 'second failure mode' claim is wrong. There's ONE unified failure mode (scale + measurement degeneracy), and the penalty rescues BOTH. The remaining gap to BP is 'partial credit quality', not a separate failure mode.
2026-04-08Add PAPER_OUTLINE.md: §1-§6 draft reflecting round 17 + 18YurenHao0426
Comprehensive paper draft outline for the NeurIPS 2026 E&D submission: §1 Discovery-first hook (round 16 narrative arc): broken eval -> evidence -> metrics miss -> need protocol -> validation §2 Audit findings: 5-method × 3-seed audit, walk-back details, EP internal control §3 The diagnostic protocol: 4 diagnostics, decision-utility ablation, threshold sensitivity (with (d) fragility flagged), temporal validation, cross-architecture validation, sub-mode discrimination §4 Two failure modes: mechanism story + causal penalty rescue, with the round 18 softening (partial dissociation rather than full separability) §5 Pipeline pitfalls catalog: 7 bugs (incl. new #6.5 self-cosine fallback) §6 Reference implementation + Limitations / walk-backs section listing all 5 walked-back claims explicitly This is a working draft to make the next writing step concrete. Reflects all evidence collected through the round 18 follow-up.