summaryrefslogtreecommitdiff
path: root/models/value_net.py
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-25 08:22:04 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-25 08:22:04 -0500
commit7e01fbc0ce871857c1e1879ed0d3559e8bfae7c7 (patch)
tree4f0da6c6362b8ebe8109fe9a40ed28e2d7759595 /models/value_net.py
parent825d973428450cb24d8cccc8c2604235ef974b7c (diff)
Add Phase 6.5A: same-batch linesearch REVISES Phase 6A conclusion
Phase 6A's "better credit → worse loss" was a protocol artifact caused by: 1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering) 2. Held-out evaluation (measured generalization failure, not exploitability) 3. Gradient clamping With strict same-batch evaluation: - Oracle BP: dL_same = -0.406 (strongest descent) - Vec_M4: dL_same = -0.135 - ScalarCB: dL_same = -0.025 - DFA: dL_same = -0.003 Same-batch loss decrease is MONOTONIC with credit quality. But held-out loss INCREASES for all non-DFA methods (Case D: overfitting). The bottleneck is batch-level generalization, not surrogate exploitability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'models/value_net.py')
0 files changed, 0 insertions, 0 deletions