faeval.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-07 23:09:03 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-07 23:09:03 -0500
commit	7fbbe2c18a08f0a6314dfe22dc8790462252050a (patch)
tree	9cd07701e936973ff66a56cb1352dc2568c7cea6 /results/confirmatory/clean_sparsity/synth_dfa_s456_a0.0_L8.json
parent	e53327ac6d7d5be097c3de434caa700c52c598e9 (diff)

Add reproducers for pitfalls 4-6 (Bs reproducibility, aggregation, layer-0)

All 3 verified on the real DFA s42 checkpoint: Bug 4: training Bs gives Γ=+0.068, 10 fresh Bs draws give Γ=+0.0043±0.007. The 'alignment' is the network adapting to specific Bs. Bug 5: 4 valid aggregation strategies give Γ in [-0.028, +0.074]. The spread is 0.10 (3.45x ratio) and **the sign flips** between strategies. Pick the wrong aggregation and DFA is anti-aligned; pick the right one and DFA looks aligned. Bug 6: Γ_layer0 = +0.429 dominates the mean +0.068. Hidden layers 1-4 are all near zero or slightly negative. Mean of hidden layers only is -0.022 (negative!). The deep blocks the paper claims to be 'training' have Γ ≈ 0 or below. Bugs 5 and 6 are causally linked: 'median over layers' strategies pick a negative deep layer; 'mean over layers' is dominated by the positive l0. The catalog under-reported bug 5 (it said 2.5x, actual is 3.45x with sign flip).

Diffstat (limited to 'results/confirmatory/clean_sparsity/synth_dfa_s456_a0.0_L8.json')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: