From 68cfa13af2f026b7ff388aae4420eba0f0db804a Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 8 Apr 2026 05:18:38 -0500 Subject: =?UTF-8?q?Add=20depth-sweep=20evidence=20to=20=C2=A75=20+=20Appen?= =?UTF-8?q?dix=20H=20from=20existing=20d=3D512=20L=3D2,4,6,8,12=20data?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The cifar_depth_scan_s42 results were already on disk but not surfaced in the paper. Across L in {2,4,6,8,12} on the d=512 ResMLP, DFA's layer-0 cosine stays in [+0.39,+0.40] and its mean deep cosine stays within [-0.005,+0.000], while BP retains a deep cosine of +0.94 even at L=12. This rules out the 'too deep to receive useful credit' explanation: making the network shallower does not reach the deep blocks any better. - ยง5 paragraph 4: one-sentence depth-invariance summary citing the new appendix - New Appendix H: Depth-Sweep Layerwise Profiles, with full table Co-Authored-By: Claude Opus 4.6 (1M context) --- paper/main.pdf | Bin 448286 -> 452349 bytes 1 file changed, 0 insertions(+), 0 deletions(-) (limited to 'paper/main.pdf') diff --git a/paper/main.pdf b/paper/main.pdf index b13f9af..6cdc1d4 100644 Binary files a/paper/main.pdf and b/paper/main.pdf differ -- cgit v1.2.3