faeval.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-25 08:22:04 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-25 08:22:04 -0500
commit	7e01fbc0ce871857c1e1879ed0d3559e8bfae7c7 (patch)
tree	4f0da6c6362b8ebe8109fe9a40ed28e2d7759595 /models/value_net.py
parent	825d973428450cb24d8cccc8c2604235ef974b7c (diff)

Add Phase 6.5A: same-batch linesearch REVISES Phase 6A conclusion

Phase 6A's "better credit → worse loss" was a protocol artifact caused by: 1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering) 2. Held-out evaluation (measured generalization failure, not exploitability) 3. Gradient clamping With strict same-batch evaluation: - Oracle BP: dL_same = -0.406 (strongest descent) - Vec_M4: dL_same = -0.135 - ScalarCB: dL_same = -0.025 - DFA: dL_same = -0.003 Same-batch loss decrease is MONOTONIC with credit quality. But held-out loss INCREASES for all non-DFA methods (Case D: overfitting). The bottleneck is batch-level generalization, not surrogate exploitability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat (limited to 'models/value_net.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: