summaryrefslogtreecommitdiff
path: root/results/vanilla_dfa_early_ckpts
AgeCommit message (Collapse)Author
2026-04-08paper v2.30: fix layer-0 cosine numbers + add per-seed appendix MYurenHao0426
Found a numerical error in §4 ¶3: the layer-0 vanilla DFA cosines were listed as +0.42, +0.45, +0.39 across seeds 42/123/456 but the actual re-measurement on the saved early-epoch checkpoints gives +0.421, +0.436, +0.418 (the s456 value was off by 0.03). The deep-mean numbers in Table 2 (-0.008 ± 0.013) were already correct. Changes: - §4 ¶3: layer-0 trio updated to +0.42, +0.44, +0.42 across seeds and cite now points to a new per-seed appendix. - New Appendix M (Layer-0 Dominance): 6-row table of per-seed per-layer cosines on vanilla DFA early checkpoints (3 seeds × ep 1, 2), with per-layer ||g||. Documents the layer-0 dominance pattern that drives the headline aggregate Γ on these checkpoints. - results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json: machine- readable dump of all 6 measurements for future audit. - §7 compressed (~30 words trimmed across the closing paragraph) and Figure 3 width 0.92 → 0.82 to keep main content at exactly 9 pages after the appendix addition. Verified: 9 pages main + refs on p10, 18 total, 0 overfull boxes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Multi-seed vanilla DFA early-epoch cos: lock-in for round 19 disambiguationYurenHao0426
Round 20's minimal lock-in experiment: 3 seeds × {ep 1, ep 2} vanilla DFA cosine. Closes the 'single-seed fluke' objection. Vanilla DFA early-epoch deep cosines (l1-l4): | seed | ep | ||g|| | deep mean | |---|---|---|---| | 42 | 1 | 6.7e-7 | -0.025 | | 42 | 2 | 1.5e-7 | -0.038 | | 123 | 1 | 6.5e-7 | +0.002 | | 123 | 2 | 1.4e-7 | -0.006 | | 456 | 1 | 3.9e-7 | +0.000 | | 456 | 2 | 8.5e-8 | -0.009 | 3-seed mean at ep 1 (most meaningful regime): -0.008 ± 0.013 3-seed mean at ep 2: -0.018 ± 0.018 ALL 24 measurements (3 seeds × 2 ep × 4 deep layers) are in [-0.04, +0.02]. Compare to penalized DFA 3-seed mean +0.155 ± 0.025. The penalty CREATING deep alignment finding is now seed-robust. Three seeds × two early epochs all show vanilla deep cos essentially zero even when ||g|| is in the meaningful regime. This is the round 20 lock-in. Framing is locked.
2026-04-08DISAMBIGUATION: vanilla DFA early-epoch checkpoints + cos measurementYurenHao0426
Round 19's #3 critical experiment. Trained vanilla DFA s42 for 5 epochs, saved checkpoint at each, then measured per-layer cos(e_T B^T, BP grad). Key trajectory of ||g_l|| during vanilla DFA training: ep 0: ~1e-3 (random init, healthy) ep 1: ~1.4e-6 (3 OOM drop, STILL above 1e-7 floor) ep 2: ~3e-7 (above floor) ep 3: ~1.3e-7 (above floor, barely) ep 4: ~7e-8 (BELOW floor) ep 5: ~4e-8 (well below floor) So ep 1, 2, 3 vanilla checkpoints are in the MEANINGFUL ||g|| regime. Cos measurement on those: ep 1: l0=+0.42, l1=+0.005, l2=-0.028, l3=-0.039, l4=-0.038 ep 2: l0=+0.44, l1=-0.002, l2=-0.040, l3=-0.055, l4=-0.054 ep 3: l0=+0.43, l1=+0.007, l2=-0.039, l3=-0.054, l4=-0.054 DEEP-LAYER COSINES ARE ESSENTIALLY ZERO AT EVERY VANILLA EPOCH, even when ||g|| is in the meaningful regime (ep 1: ||g||=6.7e-7). Compare to penalized DFA s42 at 30 ep: deep cos = +0.17. Hypothesis B confirmed: the penalty CREATED the deep-layer alignment. It is a training outcome of the regularization, not a measurement-regime revelation. Paper implications: there are two distinct failure modes after all, but they are not 'scale + direction'. They are: (1) Measurement degeneracy via terminal LN gradient cancellation (caught by diagnostic (b)) (2) Low intrinsic credit quality of random feedback even in the meaningful regime (caught by direct cos measurement) The penalty partially alleviates BOTH (residual stream contained AND deep alignment improved from ~0 to +0.17), but neither fully.