diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 03:22:36 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 03:22:36 -0500 |
| commit | 25ee60c8277ba82b9cb6471471b1a727e0712ea7 (patch) | |
| tree | 9e8b9a050e05f8bb43d5b437157fd6c77c27de34 /experiments/minimal_scaffold_replication.py | |
| parent | 24dea480b61aefd2cd74dbe4b341256ce50412fa (diff) | |
λ sweep on penalty strength: lam ∈ {1e-4, 1e-2, 1e-1} cos + rho results
Round 19's #5 recommendation. Major new finding for the paper:
| lam | acc | ||h_L|| | ||g_2|| | deep cos | deep rho |
|-------|------:|--------:|--------:|---------:|---------:|
| 0 | 0.308 | 4e8 | 5e-10 | -0.008 | -0.003 |
| 1e-4 | 0.359 | 2.4e4 | 6.3e-7 | -0.022 | -0.004 |
| 1e-2 | 0.363 | 4e4 | 1e-6 | +0.155 | +0.080 |
| 1e-1 | 0.349 | 1.2e4 | 1.6e-6 | +0.131 | +0.067 |
KEY: at lam=1e-4 the residual stream is contained AND ||g|| is healthy
(mode 1 ALLEVIATED), but deep cos and rho are still essentially zero
(mode 2 NOT alleviated). This is independent dissociation of the two
modes via penalty strength: at weak penalty you get mode 1 fix WITHOUT
mode 2 fix.
Both metrics (cos, rho) agree at every lambda. Penalty strength has a
non-monotonic effect on mode 2 alleviation:
- lam=1e-4: too weak, mode 2 not alleviated (cos ~0)
- lam=1e-2: sweet spot, cos +0.16, rho +0.08
- lam=1e-1: slightly over-constrained, cos +0.13, rho +0.07
This is the 7th independent validation of the two-mode separation, and
the strongest one because it shows mode 1 alleviation WITHOUT mode 2
alleviation — the modes do not even respond to the same intervention
strength.
Diffstat (limited to 'experiments/minimal_scaffold_replication.py')
0 files changed, 0 insertions, 0 deletions
