summaryrefslogtreecommitdiff
path: root/results/confirmatory/A1_naive_state_err.csv
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 15:55:04 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 15:55:04 -0500
commit9343b29f358cb963dd224d9524e7fd55e1a8b05b (patch)
treead4c7f56e401180d051f9db7f97cb808080f38ba /results/confirmatory/A1_naive_state_err.csv
parent8edd4505568ef327eb72be2c5c57d24439b36986 (diff)
Reviewer-concern batch: ρ formula + LN Jacobian derivation + diagnostic (c) formula + threshold pointer + hyperparameter fairness clause
Addressed 4 secondary reviewer concerns from the user's earlier list, all small inline additions: 1. §3 ¶1 LN Jacobian: extended the 1-line claim into a 2-line derivation. For y = LN(h) = (h-μ)/σ with σ ∝ ||h||/√d, ||∂y/∂h|| = Θ(1/σ), so ||g_L|| = Θ(1/||h_L||). Connects the (a) growth and (b) collapse formally. 2. §4 ¶2 ρ formal definition: added the inline formula ρ_l = Pearson(<a_l, εv>, ℓ(h_l + εv) - ℓ(h_l)) over M=32 random unit-norm directions v with ε=1e-3, evaluated per sample on a fixed eval batch and averaged. Previously this was narrative-only. 3. §6 ¶3 diagnostic (c) cross-batch stability: added inline definition as the mean pairwise cosine of per-batch-averaged BP-grad direction at the chosen layer across K≥8 disjoint 128-sample minibatches, with the empirical separation (drift 0.5-0.99 vs healthy 0.05-0.18). 4. §6 ¶3 threshold sensitivity pointer: added (Appendix~\ref{app:threshold_sweep}) pointer next to the (a)/(b) calibration claim. 5. §2 ¶1 hyperparameter fairness: changed 'against the same architecture, optimizer, and training budget' to 'against the identical architecture, optimizer, schedule, and training budget without method-specific tuning' and added 'batch size 128'. Closes the 'fairness asserted but not evidenced' reviewer concern. Page budget: each addition ate ~1-2 lines. Net push was ~9 lines, which spilled §7 onto p10. Recovered by: - Shrinking Figure 3 (penalty rescue) from \linewidth to 0.92\linewidth - Shrinking Figure 4 (cross_arch_summary) from 0.78\linewidth to 0.7\linewidth - Compressing diagnostic (c) clause (kept the formula intent without all the LaTeX math symbols inline) - Trimming §7 closing sentence: dropped 'main lesson is to decompose' preamble; merged 'a reporting rule' phrase into the same sentence as the methodology-line citations Result: §1-§7 + all figures fit strictly in 9 pages (verified via pdftotext; p9 ends with §7 closing sentence + page number '9'; p10 starts with References). Total 18 pages, 0 overfull hbox. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'results/confirmatory/A1_naive_state_err.csv')
0 files changed, 0 insertions, 0 deletions