summaryrefslogtreecommitdiff
path: root/results/cnn_baseline/ep_s456.json
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 23:59:35 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 23:59:35 -0500
commit503bb06ed9ab00cfa72d2d9a532b0caf10cb83b3 (patch)
tree719fbb2e237a98d8985f54350a8d081072957993 /results/cnn_baseline/ep_s456.json
parentab1b783c7a4f3d586d082ba142d7c046453a310c (diff)
Add (d) frozen-baseline threshold sensitivity — IMPORTANT new finding
Critical observation: at lambda=1e-3 (single seed), penalized DFA margin above shallow baseline is +2.3 pp — which PASSES (d) at the 2 pp default threshold. At lambda=1e-2 (3 seeds), the margin is +1.4 pp — FIRES (d) at 2 pp. So the (d) verdict on penalized DFA depends on BOTH the lambda choice AND the threshold choice. This is a significantly weaker claim than 'two failure modes are separable via (d)'. The honest framing per round 18 lesson: there is a real tradeoff between penalty strength and depth utilization. Weaker penalty preserves more depth contribution but also more scale pathology. Stronger penalty kills depth contribution. The protocol surfaces this tradeoff but doesn't establish the second failure mode by itself. Compared to (a) 63x and (b) 24338x separation gaps, (d) is the LEAST robust diagnostic and the most sensitive to threshold choice. Need to flag this prominently in the paper.
Diffstat (limited to 'results/cnn_baseline/ep_s456.json')
0 files changed, 0 insertions, 0 deletions