faeval.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 19:57:52 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 19:57:52 -0500
commit	1b085a17237dfa8a1df49c0005ba26d7ba41ebaa (patch)
tree	5a1693d2592d291e7ade843449e02d1e42014fe5 /report_explore/MEMO_6_exploitability.md
parent	9ebaa25377996c8ad437856d68f515b6d0d64a36 (diff)

paper v2.34: §4 ¶4 mechanism hypothesis adds Mode 2 → Mode 1 causal chain

User flagged that Mode 1 may itself be a downstream consequence of Mode 2 (rather than a parallel failure mode), and asked for this to be added to the §4 mechanism hypothesis. The causal chain: 1. Local credit signal a_l has poor functional usefulness (Mode 2) 2. Optimizer cannot drive useful per-block forward-state change 3. The only easy way to increase <f_l, a_l> is to inflate ‖f_l‖ along the cheap random direction set by a_l (Mode 1(a) growth) 4. Inflated residual stream → terminal LN gradient cancellation (Mode 1(b) collapse) 5. Per-block penalty caps ‖f_l‖, breaking the chain at step 3 without fixing the underlying credit quality → Explains why penalty alleviates Mode 1 fully but Mode 2 only partially This is more parsimonious than "two parallel failure modes" and is consistent with the observed asymmetry that the penalty rescues Mode 1 without fully fixing Mode 2 (deep cos +0.151 vs BP's ≈1.0). §4 ¶4 mechanism hypothesis section now contains: - Original CB/SB descriptions (gradient-direction surrogate vs state-level teaching signal) - NEW: Mode 2 → Mode 1 downstream-symptom hypothesis with the explicit causal chain - Hypothesis caveat (we have measured angle-to-accuracy + functional proxies but not full forward-state-change content) Page-budget compensation: - §4 ¶4 setup recital compressed (combined SB/CB/DFA into one sentence) - §4 ¶4 functional measurements paragraph compressed (used \emph{Nudging:} / \emph{Training-loss trajectory:} structure) - §7 ¶1 closing compressed (merged the redundant no-terminal-LN ablation + BatchNorm CNN sentences) Page layout: §1-§7 still 9 pages exactly (§7 ends p9 line 358, refs p10 line 359). Total 19 pages (was 18) — one extra appendix page from the v2.33 per-seed nudging values, unchanged here. 9-page main content budget preserved. This responds to user message: "主要逻辑风险是 Mode 1 可能是 Mode 2 的下游后果（而非并列失败模式）... 建议优先处理：把 nudging test 提到主文、把 Mode 2→Mode 1 因果链作为 mechanism hypothesis" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat (limited to 'report_explore/MEMO_6_exploitability.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: