diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:09:03 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:09:03 -0500 |
| commit | 7fbbe2c18a08f0a6314dfe22dc8790462252050a (patch) | |
| tree | 9cd07701e936973ff66a56cb1352dc2568c7cea6 /results/confirmatory/clean_sparsity/synth_dfa_s456_a0.0_L8.json | |
| parent | e53327ac6d7d5be097c3de434caa700c52c598e9 (diff) | |
Add reproducers for pitfalls 4-6 (Bs reproducibility, aggregation, layer-0)
All 3 verified on the real DFA s42 checkpoint:
Bug 4: training Bs gives Γ=+0.068, 10 fresh Bs draws give Γ=+0.0043±0.007.
The 'alignment' is the network adapting to specific Bs.
Bug 5: 4 valid aggregation strategies give Γ in [-0.028, +0.074]. The
spread is 0.10 (3.45x ratio) and **the sign flips** between
strategies. Pick the wrong aggregation and DFA is anti-aligned;
pick the right one and DFA looks aligned.
Bug 6: Γ_layer0 = +0.429 dominates the mean +0.068. Hidden layers 1-4 are
all near zero or slightly negative. Mean of hidden layers only is
-0.022 (negative!). The deep blocks the paper claims to be
'training' have Γ ≈ 0 or below.
Bugs 5 and 6 are causally linked: 'median over layers' strategies pick a
negative deep layer; 'mean over layers' is dominated by the positive l0.
The catalog under-reported bug 5 (it said 2.5x, actual is 3.45x with sign
flip).
Diffstat (limited to 'results/confirmatory/clean_sparsity/synth_dfa_s456_a0.0_L8.json')
0 files changed, 0 insertions, 0 deletions
