Add Phase 6.5A: same-batch linesearch REVISES Phase 6A conclusion

Phase 6A's "better credit → worse loss" was a protocol artifact caused by: 1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering) 2. Held-out evaluation (measured generalization failure, not exploitability) 3. Gradient clamping With strict same-batch evaluation: - Oracle BP: dL_same = -0.406 (strongest descent) - Vec_M4: dL_same = -0.135 - ScalarCB: dL_same = -0.025 - DFA: dL_same = -0.003 Same-batch loss decrease is MONOTONIC with credit quality. But held-out loss INCREASES for all non-DFA methods (Case D: overfitting). The bottleneck is batch-level generalization, not surrogate exploitability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-03-25 08:22:04 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-03-25 08:22:04 -0500
commit: 7e01fbc0ce871857c1e1879ed0d3559e8bfae7c7 (patch)
tree: 4f0da6c6362b8ebe8109fe9a40ed28e2d7759595 /report_explore
parent: 825d973428450cb24d8cccc8c2604235ef974b7c (diff)
1 files changed, 44 insertions, 0 deletions
diff --git a/report_explore/MEMO_6.5A_samebatch_linesearch.md b/report_explore/MEMO_6.5A_samebatch_linesearch.md
new file mode 100644
index 0000000..733db12
--- /dev/null
+++ b/report_explore/MEMO_6.5A_samebatch_linesearch.md
@@ -0,0 +1,44 @@
+# Phase 6.5A Memo: Same-Batch Linesearch
+
+**Date**: 2026-03-25
+
+## Question
+Under strict same-batch evaluation, does better credit produce better loss decrease?
+
+## Answer: YES.
+
+Phase 6A's conclusion was wrong due to protocol confounds. With same-batch evaluation:
+
+### All blocks, normalized credit (closest to Phase 6A protocol):
+
+| Method | best eta | dL_same | dL_held |
+|--------|---------|---------|---------|
+| DFA | 1e-2 | -0.003 | +0.004 |
+| ScalarCB | 3e-3 | -0.025 | +0.027 |
+| Vec_M4 | 3e-3 | **-0.135** | +0.045 |
+| Oracle BP | 1e-2 | **-0.406** | +0.094 |
+
+**Same-batch loss decreases monotonically with credit quality**: Oracle > Vec > ScalarCB > DFA.
+
+### But held-out loss increases for all non-DFA methods.
+
+This is **Case D**: the local surrogate correctly exploits credit to decrease training loss, but the update overfits to the batch. The better the credit, the more effective the overfitting.
+
+### Key confounds in Phase 6A:
+1. **Normalization**: Phase 6A always normalized credit, which amplified DFA's weak signals to the same magnitude as Vec's strong signals, erasing the natural ordering
+2. **Held-out evaluation**: Phase 6A evaluated on held-out batches, showing the generalization failure rather than the exploitability success
+3. **Gradient clamping**: Phase 6A clamped gradients to [-1, 1], further distorting the relationship
+
+### Raw vs Normalized (all blocks):
+| Method | raw dL_same (best) | norm dL_same (best) |
+|--------|--------------------|---------------------|
+| Vec_M4 | -0.005 | -0.135 |
+| Oracle | -0.003 | -0.406 |
+
+Raw credit produces tiny updates because BP gradients have RMS ≈ 0.00004. Normalization brings all methods to comparable magnitude but introduces overfitting.
+
+## Revised Diagnosis
+
+The bottleneck is NOT "local surrogate cannot exploit good credit" (Case B from Phase 6A). It IS:
+- **Generalization/overfitting**: local surrogate with good credit decreases train loss but increases held-out loss
+- This means the project direction should be about **regularizing local updates** rather than replacing the surrogate entirely
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-25 08:22:04 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-25 08:22:04 -0500
commit	7e01fbc0ce871857c1e1879ed0d3559e8bfae7c7 (patch)
tree	4f0da6c6362b8ebe8109fe9a40ed28e2d7759595 /report_explore
parent	825d973428450cb24d8cccc8c2604235ef974b7c (diff)