diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 16:21:16 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 16:21:16 -0500 |
| commit | 8fda0e0042d04343e0f4e6cdc04ba5e927f69064 (patch) | |
| tree | c1b40f99ee466493ac8c93eb75327cc5a9c62209 /paper | |
| parent | 9343b29f358cb963dd224d9524e7fd55e1a8b05b (diff) | |
§6 polish: clarify Figure 5 (decision_utility) is in Appendix D, not main text
After moving Figure 5 from §6 to Appendix D in v2.23, §6 ¶2 still said
'Figure~\ref{fig:decision_utility} makes the decision value explicit'
which would render as 'Figure 5 makes...' but Figure 5 is now in the
appendix. Reader on p8 looking for Figure 5 nearby would not find it.
Added explicit '(Appendix~\ref{app:all_validations})' parenthetical
right after the figure ref so the reader knows where to look.
Audit of all other figure refs (Figures 1-4 in main text):
- fig:audit_hero (Figure 1, §2) → refs in §1/§2 main text ✓
- fig:temporal_cross_arch (Figure 2, §5) → refs in §3/§5 ✓
- fig:penalty_rescue (Figure 3, §5) → refs in §4/§5/§7 ✓
- fig:cross_arch_summary (Figure 4, §5) → refs in §5/§6 ✓
All clean. Main content still 9 pages.
Diffstat (limited to 'paper')
| -rw-r--r-- | paper/main.pdf | bin | 496433 -> 496499 bytes | |||
| -rw-r--r-- | paper/main.tex | 2 |
2 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.pdf b/paper/main.pdf Binary files differindex af404f1..e681ca9 100644 --- a/paper/main.pdf +++ b/paper/main.pdf diff --git a/paper/main.tex b/paper/main.tex index 1e561c4..edf3717 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -173,7 +173,7 @@ Diag. & Measurement & Default threshold & Role \\ \end{tabular}} \end{table} -The point of the protocol is not to add plots; it is to prevent a specific class of false conclusions. For this paper, the minimal protocol is four checks: per-layer activation scale via max-per-block growth, deepest hidden BP gradient floor, meaningful-regime per-layer credit quality, and an architecture-matched frozen-blocks baseline (Table~\ref{tab:protocol_def}). The first two ask whether the reference quantity is still valid; the third asks whether, once validity is restored, the deep blocks receive useful directions; and the fourth asks whether the trained depth is doing better than a model whose residual blocks were never trained at all. Figure~\ref{fig:decision_utility} makes the decision value explicit: accuracy alone walks back $0/5$ audited methods, accuracy plus headline $\Gamma$ still walks back $0/5$, and the full protocol walks back $3/5$ by flagging DFA, State Bridge, and Credit Bridge, with diagnostics (a), (b), and (d) each independently sufficient for binary detection on those failures. On our audit, these checks catch failures that accuracy plus aggregate alignment miss completely. +The point of the protocol is not to add plots; it is to prevent a specific class of false conclusions. For this paper, the minimal protocol is four checks: per-layer activation scale via max-per-block growth, deepest hidden BP gradient floor, meaningful-regime per-layer credit quality, and an architecture-matched frozen-blocks baseline (Table~\ref{tab:protocol_def}). The first two ask whether the reference quantity is still valid; the third asks whether, once validity is restored, the deep blocks receive useful directions; and the fourth asks whether the trained depth is doing better than a model whose residual blocks were never trained at all. Figure~\ref{fig:decision_utility} (Appendix~\ref{app:all_validations}) makes the decision value explicit: accuracy alone walks back $0/5$ audited methods, accuracy plus headline $\Gamma$ still walks back $0/5$, and the full protocol walks back $3/5$ by flagging DFA, State Bridge, and Credit Bridge, with diagnostics (a), (b), and (d) each independently sufficient for binary detection on those failures. On our audit, these checks catch failures that accuracy plus aggregate alignment miss completely. The protocol is conservative in a specific sense: it preserves BP and EP as evidence-bearing controls and walks back only claims that fail measurement-validity or depth-utilization checks. Diagnostics (a) and (b) have sharp empirical calibration gaps in the audited regime (Appendix~\ref{app:threshold_sweep}), diagnostic (c) is a sub-mode discriminator computed as the mean pairwise cosine of the per-batch-averaged BP-grad direction at the chosen layer across $K{\geq}8$ disjoint $128$-sample minibatches (high values, $0.5$--$0.99$, indicate drift-dominated reference vectors; healthy per-sample credit gives $0.05$--$0.18$), and diagnostic (d) uses a deliberately weak $2$pp margin as a context check rather than a theorem about useful depth. The Section~\ref{sec:mode2} cross-method cosine-versus-accuracy dissociation reinforces the necessity of keeping all four diagnostics separate: Credit Bridge, State Bridge, and DFA differ by more than $4\times$ in deep-layer alignment under the same penalty rescue without tracking final accuracy in the same direction, so aligning an alternative credit rule with the BP gradient is not a substitute for checking depth utilization against a matched shallow baseline. |
