summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 16:21:16 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 16:21:16 -0500
commit8fda0e0042d04343e0f4e6cdc04ba5e927f69064 (patch)
treec1b40f99ee466493ac8c93eb75327cc5a9c62209
parent9343b29f358cb963dd224d9524e7fd55e1a8b05b (diff)
§6 polish: clarify Figure 5 (decision_utility) is in Appendix D, not main text
After moving Figure 5 from §6 to Appendix D in v2.23, §6 ¶2 still said 'Figure~\ref{fig:decision_utility} makes the decision value explicit' which would render as 'Figure 5 makes...' but Figure 5 is now in the appendix. Reader on p8 looking for Figure 5 nearby would not find it. Added explicit '(Appendix~\ref{app:all_validations})' parenthetical right after the figure ref so the reader knows where to look. Audit of all other figure refs (Figures 1-4 in main text): - fig:audit_hero (Figure 1, §2) → refs in §1/§2 main text ✓ - fig:temporal_cross_arch (Figure 2, §5) → refs in §3/§5 ✓ - fig:penalty_rescue (Figure 3, §5) → refs in §4/§5/§7 ✓ - fig:cross_arch_summary (Figure 4, §5) → refs in §5/§6 ✓ All clean. Main content still 9 pages.
-rw-r--r--paper/main.pdfbin496433 -> 496499 bytes
-rw-r--r--paper/main.tex2
2 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.pdf b/paper/main.pdf
index af404f1..e681ca9 100644
--- a/paper/main.pdf
+++ b/paper/main.pdf
Binary files differ
diff --git a/paper/main.tex b/paper/main.tex
index 1e561c4..edf3717 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -173,7 +173,7 @@ Diag. & Measurement & Default threshold & Role \\
\end{tabular}}
\end{table}
-The point of the protocol is not to add plots; it is to prevent a specific class of false conclusions. For this paper, the minimal protocol is four checks: per-layer activation scale via max-per-block growth, deepest hidden BP gradient floor, meaningful-regime per-layer credit quality, and an architecture-matched frozen-blocks baseline (Table~\ref{tab:protocol_def}). The first two ask whether the reference quantity is still valid; the third asks whether, once validity is restored, the deep blocks receive useful directions; and the fourth asks whether the trained depth is doing better than a model whose residual blocks were never trained at all. Figure~\ref{fig:decision_utility} makes the decision value explicit: accuracy alone walks back $0/5$ audited methods, accuracy plus headline $\Gamma$ still walks back $0/5$, and the full protocol walks back $3/5$ by flagging DFA, State Bridge, and Credit Bridge, with diagnostics (a), (b), and (d) each independently sufficient for binary detection on those failures. On our audit, these checks catch failures that accuracy plus aggregate alignment miss completely.
+The point of the protocol is not to add plots; it is to prevent a specific class of false conclusions. For this paper, the minimal protocol is four checks: per-layer activation scale via max-per-block growth, deepest hidden BP gradient floor, meaningful-regime per-layer credit quality, and an architecture-matched frozen-blocks baseline (Table~\ref{tab:protocol_def}). The first two ask whether the reference quantity is still valid; the third asks whether, once validity is restored, the deep blocks receive useful directions; and the fourth asks whether the trained depth is doing better than a model whose residual blocks were never trained at all. Figure~\ref{fig:decision_utility} (Appendix~\ref{app:all_validations}) makes the decision value explicit: accuracy alone walks back $0/5$ audited methods, accuracy plus headline $\Gamma$ still walks back $0/5$, and the full protocol walks back $3/5$ by flagging DFA, State Bridge, and Credit Bridge, with diagnostics (a), (b), and (d) each independently sufficient for binary detection on those failures. On our audit, these checks catch failures that accuracy plus aggregate alignment miss completely.
The protocol is conservative in a specific sense: it preserves BP and EP as evidence-bearing controls and walks back only claims that fail measurement-validity or depth-utilization checks. Diagnostics (a) and (b) have sharp empirical calibration gaps in the audited regime (Appendix~\ref{app:threshold_sweep}), diagnostic (c) is a sub-mode discriminator computed as the mean pairwise cosine of the per-batch-averaged BP-grad direction at the chosen layer across $K{\geq}8$ disjoint $128$-sample minibatches (high values, $0.5$--$0.99$, indicate drift-dominated reference vectors; healthy per-sample credit gives $0.05$--$0.18$), and diagnostic (d) uses a deliberately weak $2$pp margin as a context check rather than a theorem about useful depth. The Section~\ref{sec:mode2} cross-method cosine-versus-accuracy dissociation reinforces the necessity of keeping all four diagnostics separate: Credit Bridge, State Bridge, and DFA differ by more than $4\times$ in deep-layer alignment under the same penalty rescue without tracking final accuracy in the same direction, so aligning an alternative credit rule with the BP gradient is not a substitute for checking depth utilization against a matched shallow baseline.