1 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.tex b/paper/main.tex
index b40156e..6ba45ce 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -175,7 +175,7 @@ Diag. & Measurement & Default threshold & Role \\
 
 The point of the protocol is not to add plots; it is to prevent a specific class of false conclusions. For this paper, the minimal protocol is four checks: per-layer activation scale via max-per-block growth, deepest hidden BP gradient floor, meaningful-regime per-layer credit quality, and an architecture-matched frozen-blocks baseline (Table~\ref{tab:protocol_def}). The first two ask whether the reference quantity is still valid; the third asks whether, once validity is restored, the deep blocks receive useful directions; and the fourth asks whether the trained depth is doing better than a model whose residual blocks were never trained at all. Figure~\ref{fig:decision_utility} (Appendix~\ref{app:all_validations}) makes the decision value explicit: accuracy alone walks back $0/5$ audited methods, accuracy plus headline $\Gamma$ still walks back $0/5$, and the full protocol walks back $3/5$ by flagging DFA, State Bridge, and Credit Bridge, with diagnostics (a), (b), and (d) each independently sufficient for binary detection on those failures. On our audit, these checks catch failures that accuracy plus aggregate alignment miss completely.
 
-The protocol is conservative in a specific sense: it preserves BP and EP as evidence-bearing controls and walks back only claims that fail measurement-validity or depth-utilization checks. Diagnostics (a) and (b) have sharp empirical calibration gaps in the audited regime (Appendix~\ref{app:threshold_sweep}), diagnostic (c) is a sub-mode discriminator computed as the mean pairwise cosine of the per-batch-averaged BP-grad direction at the chosen layer across $K{\geq}8$ disjoint $128$-sample minibatches (high values, $0.5$--$0.99$, indicate drift-dominated reference vectors; healthy per-sample credit gives $0.05$--$0.18$), and diagnostic (d) uses a deliberately weak $2$pp margin as a context check rather than a theorem about useful depth. The Section~\ref{sec:mode2} cross-method cosine-versus-accuracy dissociation reinforces the necessity of keeping all four diagnostics separate: Credit Bridge, State Bridge, and DFA differ by more than $4\times$ in deep-layer alignment under the same penalty rescue without tracking final accuracy in the same direction, so aligning an alternative credit rule with the BP gradient is not a substitute for checking depth utilization against a matched shallow baseline.
+The protocol is conservative in a specific sense: it preserves BP and EP as evidence-bearing controls and walks back only claims that fail measurement-validity or depth-utilization checks. Diagnostics (a) and (b) have sharp empirical calibration gaps in the audited regime (Appendix~\ref{app:threshold_sweep}), diagnostic (c) is a sub-mode discriminator computed as the mean pairwise cosine of the per-batch-averaged BP-grad direction at the chosen layer across $K{\geq}8$ disjoint $128$-sample minibatches (in our 5-method audit, healthy methods cluster near zero with all six BP/EP values in $[-0.04,+0.12]$, while drift-dominated cases reach high tails up to $+0.99$, and $5/9$ degenerate values exceed the $0.30$ default cutoff), and diagnostic (d) uses a deliberately weak $2$pp margin as a context check rather than a theorem about useful depth. The Section~\ref{sec:mode2} cross-method cosine-versus-accuracy dissociation reinforces the necessity of keeping all four diagnostics separate: Credit Bridge, State Bridge, and DFA differ by more than $4\times$ in deep-layer alignment under the same penalty rescue without tracking final accuracy in the same direction, so aligning an alternative credit rule with the BP gradient is not a substitute for checking depth utilization against a matched shallow baseline.
 
 \section{Discussion, Limits, Conclusion}
 \label{sec:discussion}