diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 14:29:42 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 14:29:42 -0500 |
| commit | 528a240f4d8288ad626781ac86e4f0c8941cfe76 (patch) | |
| tree | bb6a9f8eb0e7ba27ea578bc0ec5dbe32cd828fec | |
| parent | 3a35c6e0636aa78737847f92712b2dc36bad3618 (diff) | |
Tables 1/2/3: revert to tabular + wrap in resizebox (per user feedback)
Previous tabularx conversion caused visual problems:
- Table 1: 'lcccc' inside tabularx{\linewidth} did not stretch, leaving a
blank column on the right side of the page
- Table 2: 'p{0.18}LLL' forced originally-single-line cells to wrap into
multiple lines (e.g., 'Vanilla DFA, early epoch' and 'cos_deep=... rho_deep=...'
split across 2 lines each)
- Table 3: 'p{0.06}L p{0.16} p{0.22}' similarly compressed single-line rows
Fix: revert all 3 tables to original plain 'tabular' (Table 1 lcccc,
Table 2 lccc, Table 3 llll) and wrap each in \resizebox{\linewidth}{!}{...}.
This:
- Stretches Table 1 to full text width (no blank right column)
- Shrinks Table 2's originally-wide content uniformly so all rows stay
single-line
- Shrinks Table 3 similarly so (a)/(b)/(c)/(d) rows are single-line
Tables 4-9 (appendix) keep their tabularx treatment since they fit cleanly.
Result: 0 overfull hbox, main content still exactly 9 pages, first 4
references now fit on p9 as well.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -rw-r--r-- | paper/main.pdf | bin | 484726 -> 484918 bytes | |||
| -rw-r--r-- | paper/main.tex | 15 |
2 files changed, 9 insertions, 6 deletions
diff --git a/paper/main.pdf b/paper/main.pdf Binary files differindex 26f73c0..70915bb 100644 --- a/paper/main.pdf +++ b/paper/main.pdf diff --git a/paper/main.tex b/paper/main.tex index a372af8..fa9c12b 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -48,7 +48,8 @@ We begin with the smallest setting in which all methods can be compared head-to- \small \caption{Main audit table for the 4-block $d{=}256$ pre-LayerNorm ResMLP on CIFAR-10. The row and column structure is fixed here; fill from the three-seed audit output.} \label{tab:main_audit} -\begin{tabularx}{\linewidth}{@{}lcccc@{}} +\resizebox{\linewidth}{!}{% +\begin{tabular}{lcccc} \toprule Method & Test acc. & Headline $\Gamma$ & Status-quo verdict & Protocol verdict \\ \midrule @@ -58,7 +59,7 @@ DFA & $0.306 \pm 0.006$ & $0.10$ & trustworthy & walked back \\ State Bridge & $0.205 \pm 0.032$ & $0.005$ & trustworthy & walked back \\ Credit Bridge & $0.289 \pm 0.026$ & $0.07$ & trustworthy & walked back \\ \bottomrule -\end{tabularx} +\end{tabular}} \end{table} By the field's usual criteria, the non-BP methods appear to train to nontrivial accuracy and report nonzero alignment. In Table~\ref{tab:main_audit}, DFA reaches $0.306 \pm 0.006$ test accuracy with headline $\Gamma{=}0.10$, State Bridge reaches $0.205 \pm 0.032$ with $\Gamma{=}0.005$, and Credit Bridge reaches $0.289 \pm 0.026$ with $\Gamma{=}0.07$; none of these rows looks like an obvious invalidation if one is reading the usual pair of final accuracy and aggregate alignment in the style of prior FA reporting \citep{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct}. Even the absolute scale does not itself force a walk-back, because all three methods are plainly above chance and all three report positive headline alignment rather than a visibly broken or undefined quantity. That reading is exactly what the rest of the paper overturns. @@ -104,7 +105,8 @@ The penalty intervention first matters as a rescue of the measurement regime. Wh \small \caption{Two-mode validation table built around the intervention and disambiguation results.} \label{tab:mode_validation} -\begin{tabularx}{\linewidth}{@{}>{\raggedright\arraybackslash}p{0.18\linewidth}LLL@{}} +\resizebox{\linewidth}{!}{% +\begin{tabular}{lccc} \toprule Condition & Deep-layer alignment signal & Measurement regime & Interpretation \\ \midrule @@ -113,7 +115,7 @@ Vanilla DFA, converged & $\overline{\cos}_{deep}{=}{-}0.022$, $\overline{\rho}_{ Penalized DFA, $\lambda{=}10^{-2}$ & $\overline{\cos}_{deep}{=}+0.155{\pm}0.025$, $\overline{\rho}_{deep}{=}+0.080{\pm}0.011$ & meaningful ($\|g\|{\sim}10^{-6}$) & partial alleviation of both modes \\ Fresh-$B$ null control & $\overline{\cos}_{deep}{=}+0.002{\pm}0.022$ ($n{=}20$ draws) & meaningful & training-specific adaptation check \\ \bottomrule -\end{tabularx} +\end{tabular}} \end{table} Once the reference vector is meaningful again, the deep layers no longer sit exactly at null. At $\lambda{=}10^{-2}$, penalized DFA reaches a three-seed deep-layer mean cosine of $+0.155 \pm 0.025$ and deep perturbation correlation of $+0.080 \pm 0.011$, whereas vanilla DFA is essentially zero on both metrics in the deep blocks, consistent with prior concerns that alternative feedback can fail by supplying poor credit directions even before full collapse \citep{bartunov2018assessing,moskovitz2018feedback,crafton2019backpropagation,refinetti2023aligning}. The null calibration rules out the interpretation that this recovered signal is merely measurement noise: on the same penalized checkpoint, replacing the training-time feedback matrices with 20 fresh random $B_l$ draws gives a deep cosine of only $+0.002 \pm 0.022$, with per-layer standard deviations of $0.013$--$0.023$, all within noise of zero (Table~\ref{tab:mode_validation}). The $\lambda$ sweep sharpens the dissociation further: at $\lambda{=}10^{-4}$, Mode~1 is already alleviated, with $\|h_L\|{=}2.4\times 10^4$ and $\|g_L\|{=}6.3\times 10^{-7}$, but deep cosine remains $-0.022$, while at $\lambda{=}10^{-2}$ it rises to $+0.165$ and deep $\rho$ to $+0.091$ (Figure~\ref{fig:penalty_rescue}). The improvement is real, but it is only partial. @@ -153,7 +155,8 @@ The reporting protocol begins with measurement validity. Before any FA paper rep \small \caption{Protocol definition table. Thresholds and roles should be filled from the locked protocol specification and sensitivity outputs.} \label{tab:protocol_def} -\begin{tabularx}{\linewidth}{@{}>{\raggedright\arraybackslash}p{0.06\linewidth}L>{\raggedright\arraybackslash}p{0.16\linewidth}>{\raggedright\arraybackslash}p{0.22\linewidth}@{}} +\resizebox{\linewidth}{!}{% +\begin{tabular}{llll} \toprule Diag. & Measurement & Default threshold & Role \\ \midrule @@ -162,7 +165,7 @@ Diag. & Measurement & Default threshold & Role \\ (c) & Cross-batch direction stability of normalized BP gradients & $> 0.30$ & sub-mode discriminator \\ (d) & Frozen-blocks baseline margin for trained blocks over random blocks & $< 2$pp & depth-utilization check \\ \bottomrule -\end{tabularx} +\end{tabular}} \end{table} The point of the protocol is not to add plots; it is to prevent a specific class of false conclusions. For this paper, the minimal protocol is four checks: per-layer activation scale via max-per-block growth, deepest hidden BP gradient floor, meaningful-regime per-layer credit quality, and an architecture-matched frozen-blocks baseline (Table~\ref{tab:protocol_def}). The first two ask whether the reference quantity is still valid; the third asks whether, once validity is restored, the deep blocks receive useful directions; and the fourth asks whether the trained depth is doing better than a model whose residual blocks were never trained at all. Figure~\ref{fig:decision_utility} makes the decision value explicit: accuracy alone walks back $0/5$ audited methods, accuracy plus headline $\Gamma$ still walks back $0/5$, and the full protocol walks back $3/5$ by flagging DFA, State Bridge, and Credit Bridge, with diagnostics (a), (b), and (d) each independently sufficient for binary detection on those failures. On our audit, these checks catch failures that accuracy plus aggregate alignment miss completely. |
