diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 15:55:04 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 15:55:04 -0500 |
| commit | 9343b29f358cb963dd224d9524e7fd55e1a8b05b (patch) | |
| tree | ad4c7f56e401180d051f9db7f97cb808080f38ba /results/confirmatory/clean_sparsity/cifar_dfa_s1024.json | |
| parent | 8edd4505568ef327eb72be2c5c57d24439b36986 (diff) | |
Reviewer-concern batch: ρ formula + LN Jacobian derivation + diagnostic (c) formula + threshold pointer + hyperparameter fairness clause
Addressed 4 secondary reviewer concerns from the user's earlier list, all
small inline additions:
1. §3 ¶1 LN Jacobian: extended the 1-line claim into a 2-line derivation.
For y = LN(h) = (h-μ)/σ with σ ∝ ||h||/√d, ||∂y/∂h|| = Θ(1/σ),
so ||g_L|| = Θ(1/||h_L||). Connects the (a) growth and (b) collapse
formally.
2. §4 ¶2 ρ formal definition: added the inline formula
ρ_l = Pearson(<a_l, εv>, ℓ(h_l + εv) - ℓ(h_l)) over M=32 random
unit-norm directions v with ε=1e-3, evaluated per sample on a fixed
eval batch and averaged. Previously this was narrative-only.
3. §6 ¶3 diagnostic (c) cross-batch stability: added inline definition as
the mean pairwise cosine of per-batch-averaged BP-grad direction at
the chosen layer across K≥8 disjoint 128-sample minibatches, with
the empirical separation (drift 0.5-0.99 vs healthy 0.05-0.18).
4. §6 ¶3 threshold sensitivity pointer: added (Appendix~\ref{app:threshold_sweep})
pointer next to the (a)/(b) calibration claim.
5. §2 ¶1 hyperparameter fairness: changed 'against the same architecture,
optimizer, and training budget' to 'against the identical architecture,
optimizer, schedule, and training budget without method-specific tuning'
and added 'batch size 128'. Closes the 'fairness asserted but not
evidenced' reviewer concern.
Page budget: each addition ate ~1-2 lines. Net push was ~9 lines, which
spilled §7 onto p10. Recovered by:
- Shrinking Figure 3 (penalty rescue) from \linewidth to 0.92\linewidth
- Shrinking Figure 4 (cross_arch_summary) from 0.78\linewidth to 0.7\linewidth
- Compressing diagnostic (c) clause (kept the formula intent without all
the LaTeX math symbols inline)
- Trimming §7 closing sentence: dropped 'main lesson is to decompose'
preamble; merged 'a reporting rule' phrase into the same sentence as
the methodology-line citations
Result: §1-§7 + all figures fit strictly in 9 pages (verified via pdftotext;
p9 ends with §7 closing sentence + page number '9'; p10 starts with
References). Total 18 pages, 0 overfull hbox.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'results/confirmatory/clean_sparsity/cifar_dfa_s1024.json')
0 files changed, 0 insertions, 0 deletions
