diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:59:35 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:59:35 -0500 |
| commit | 503bb06ed9ab00cfa72d2d9a532b0caf10cb83b3 (patch) | |
| tree | 719fbb2e237a98d8985f54350a8d081072957993 /results/confirmatory/persample/dfa_s789.csv | |
| parent | ab1b783c7a4f3d586d082ba142d7c046453a310c (diff) | |
Add (d) frozen-baseline threshold sensitivity — IMPORTANT new finding
Critical observation: at lambda=1e-3 (single seed), penalized DFA margin
above shallow baseline is +2.3 pp — which PASSES (d) at the 2 pp default
threshold. At lambda=1e-2 (3 seeds), the margin is +1.4 pp — FIRES (d)
at 2 pp.
So the (d) verdict on penalized DFA depends on BOTH the lambda choice
AND the threshold choice. This is a significantly weaker claim than
'two failure modes are separable via (d)'.
The honest framing per round 18 lesson: there is a real tradeoff between
penalty strength and depth utilization. Weaker penalty preserves more
depth contribution but also more scale pathology. Stronger penalty kills
depth contribution. The protocol surfaces this tradeoff but doesn't
establish the second failure mode by itself.
Compared to (a) 63x and (b) 24338x separation gaps, (d) is the LEAST
robust diagnostic and the most sensitive to threshold choice. Need to
flag this prominently in the paper.
Diffstat (limited to 'results/confirmatory/persample/dfa_s789.csv')
0 files changed, 0 insertions, 0 deletions
