From 9343b29f358cb963dd224d9524e7fd55e1a8b05b Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 8 Apr 2026 15:55:04 -0500 Subject: =?UTF-8?q?Reviewer-concern=20batch:=20=CF=81=20formula=20+=20LN?= =?UTF-8?q?=20Jacobian=20derivation=20+=20diagnostic=20(c)=20formula=20+?= =?UTF-8?q?=20threshold=20pointer=20+=20hyperparameter=20fairness=20clause?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addressed 4 secondary reviewer concerns from the user's earlier list, all small inline additions: 1. §3 ¶1 LN Jacobian: extended the 1-line claim into a 2-line derivation. For y = LN(h) = (h-μ)/σ with σ ∝ ||h||/√d, ||∂y/∂h|| = Θ(1/σ), so ||g_L|| = Θ(1/||h_L||). Connects the (a) growth and (b) collapse formally. 2. §4 ¶2 ρ formal definition: added the inline formula ρ_l = Pearson(, ℓ(h_l + εv) - ℓ(h_l)) over M=32 random unit-norm directions v with ε=1e-3, evaluated per sample on a fixed eval batch and averaged. Previously this was narrative-only. 3. §6 ¶3 diagnostic (c) cross-batch stability: added inline definition as the mean pairwise cosine of per-batch-averaged BP-grad direction at the chosen layer across K≥8 disjoint 128-sample minibatches, with the empirical separation (drift 0.5-0.99 vs healthy 0.05-0.18). 4. §6 ¶3 threshold sensitivity pointer: added (Appendix~\ref{app:threshold_sweep}) pointer next to the (a)/(b) calibration claim. 5. §2 ¶1 hyperparameter fairness: changed 'against the same architecture, optimizer, and training budget' to 'against the identical architecture, optimizer, schedule, and training budget without method-specific tuning' and added 'batch size 128'. Closes the 'fairness asserted but not evidenced' reviewer concern. Page budget: each addition ate ~1-2 lines. Net push was ~9 lines, which spilled §7 onto p10. Recovered by: - Shrinking Figure 3 (penalty rescue) from \linewidth to 0.92\linewidth - Shrinking Figure 4 (cross_arch_summary) from 0.78\linewidth to 0.7\linewidth - Compressing diagnostic (c) clause (kept the formula intent without all the LaTeX math symbols inline) - Trimming §7 closing sentence: dropped 'main lesson is to decompose' preamble; merged 'a reporting rule' phrase into the same sentence as the methodology-line citations Result: §1-§7 + all figures fit strictly in 9 pages (verified via pdftotext; p9 ends with §7 closing sentence + page number '9'; p10 starts with References). Total 18 pages, 0 overfull hbox. Co-Authored-By: Claude Opus 4.6 (1M context) --- paper/main.pdf | Bin 492424 -> 496433 bytes 1 file changed, 0 insertions(+), 0 deletions(-) (limited to 'paper/main.pdf') diff --git a/paper/main.pdf b/paper/main.pdf index bd5018c..af404f1 100644 Binary files a/paper/main.pdf and b/paper/main.pdf differ -- cgit v1.2.3