paper v2.37: §7 add 'Open questions and concrete next experiments'

§7 currently has only the Scope/limits/recommendation paragraph. Adding a second paragraph that explicitly flags the Mode 2 → Mode 1 hypothesis status as an open question and proposes two concrete falsification tests, plus a wider-scope replication path. The new paragraph: 1. Acknowledges the Mode 2 → Mode 1 causal reading is a hypothesis, not a theorem, and that the parallel-failure reading is also formally consistent with the data. 2. Proposes a *direct* test: measure per-block forward-state-change content along the training trajectory and check whether per-block loss decrease tracks per-block credit usefulness more tightly than per-block cosine. 3. Proposes a *falsification* test for the downstream-of-Mode-2 reading: substitute the random B_l with a high-quality credit signal (sparse, learned, or weight-transport-restored à la Akrout 2019) at fixed ‖f_l‖ and check whether Mode 1 activation growth still appears. If yes, Mode 1 is NOT downstream of Mode 2. 4. Notes the wider-scope replication path: CIFAR-100, Tiny-ImageNet, architectures outside ResMLP/ViT-Mini, with a pointer to Appendix A as the structured configuration entry point. This explicitly answers the reviewer question "what would falsify your hypothesis?" without overclaiming. It positions the paper as honest about open questions and points at concrete next steps. Page count: 20 (unchanged) — the paragraph fit within the existing slack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 20:51:04 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 20:51:04 -0500
commit: 5995929511404ba3e0b8b4f1bfef69dbf291c7a9 (patch)
tree: 78ed5677f26799533511ff69eb0262dc52ca579d /paper
parent: 29c2396ee6480e94d4543cb603587a4cc7b640cd (diff)
2 files changed, 2 insertions, 0 deletions
diff --git a/paper/main.pdf b/paper/main.pdf
index b65701f..30f1465 100644
--- a/paper/main.pdf
+++ b/paper/main.pdf
diff --git a/paper/main.tex b/paper/main.tex
index 4650898..a63d0b5 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -197,6 +197,8 @@ Diag. & Measurement & Default threshold & Role \\
 
 \paragraph{Scope, limits, and reporting recommendation.} \looseness=-2 Our claim is about evidence, not impossibility: we show that current FA evaluation practice can misread what happened, not that FA cannot work in deep networks. DFA, SB, and CB all pass status-quo reporting (Table~\ref{tab:main_audit}) but fail the protocol's deep checks, and the Figure~\ref{fig:penalty_rescue} penalty partially rescues credit signal rather than validating headlines. Our strongest claim is scoped to $d{=}256/512$ pre-LayerNorm ResMLPs and ViT-Mini, where both Mode~1 diagnostics fire; the no-terminal-LN ResMLP ablation establishes terminal LayerNorm as causally necessary for diagnostic~(b) on residual ResMLP and (with the BatchNorm CNN) shows that activation growth can persist without gradient-floor collapse; the dataset is CIFAR-10; and the BP-plus-penalty comparison is a lower bound, not a full decomposition. In the evaluation-methodology line of \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}, FA papers should report BP-reference validity, layerwise credit quality, and a frozen-blocks depth-utilization baseline as separate axes, not a single headline.
 
+\paragraph{Open questions and concrete next experiments.} The mechanism story in Section~\ref{sec:mode2} treats Mode~1 as a plausible downstream symptom of Mode~2 rather than a parallel, independently destructive failure, but the audit data is also formally consistent with a fully parallel reading. A direct test would measure per-block forward-state-change content along the training trajectory and check whether per-block decrease in test loss tracks per-block credit usefulness (e.g.\ nudging-test loss change) more tightly than it tracks per-block angular agreement with the BP gradient; a complementary test would substitute the random feedback $B_l$ with a high-quality credit signal (sparse, learned to predict the BP gradient, or weight-transport-restored \`a la \citet{akrout2019deep}) at fixed $\|f_l\|$ and check whether activation growth still appears, which would falsify the Mode~2~$\to$~Mode~1 reading by exhibiting Mode~1 in the absence of Mode~2. Beyond the mechanism question, a wider-scope replication would extend the same audit to additional datasets (CIFAR-100, Tiny-ImageNet) and architectures outside the residual ResMLP / ViT-Mini family, which would calibrate how broadly the protocol's binary detectors generalize past the audited regime; the protocol code in Appendix~\ref{app:reference_impl} is structured to make these extensions a configuration change rather than a new experimental design.
+
 \begin{thebibliography}{10}
 
 \bibitem[Paleka et~al.(2026)Paleka, Goel, Geiping, and Tramèr]{paleka2026pitfalls}
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 20:51:04 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 20:51:04 -0500
commit	5995929511404ba3e0b8b4f1bfef69dbf291c7a9 (patch)
tree	78ed5677f26799533511ff69eb0262dc52ca579d /paper
parent	29c2396ee6480e94d4543cb603587a4cc7b640cd (diff)