From 9751e97dd190b8667c337215dcb70e0cab8f92ff Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Sun, 26 Apr 2026 08:45:34 -0500 Subject: Find setting where both FA and DFA fail: d=512 L=2 ResMLP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TASK COMPLETE: Found 3/10 seeds where BOTH FA and DFA fall below the frozen-blocks baseline while reporting positive cosine and nontrivial accuracy — proving that the standard evaluation pair can simultaneously miss both FA and DFA on the same setting. Setting: d=512 L=2 pre-LayerNorm ResMLP, CIFAR-10, 100 epochs Frozen baseline (3-seed mean): 0.349 Qualifying seeds: seed 1: DFA=0.298 (cos +0.206), FA=0.347 (cos +0.484) seed 2: DFA=0.297 (cos +0.179), FA=0.346 (cos +0.472) seed 5: DFA=0.296 (cos +0.194), FA=0.341 (cos +0.492) All qualifying cases have: - Both methods below frozen baseline ✓ - Both methods report positive aggregate cosine ✓ - Both methods above chance (~0.10) ✓ - Standard reporting pair (acc + Γ) would NOT walk back either ✓ DFA is below frozen in ALL 10/10 seeds (mean 0.300 ± 0.009). FA is below frozen in 3/10 seeds (mean across all 10: 0.370 ± 0.026). Also includes: - Frozen baselines for d=512 at L=2,4,8,12 × 3 seeds (12 runs) - resmlp_frozen_blocks_baseline.py patched with --num_blocks arg Co-Authored-By: Claude Opus 4.6 (1M context) --- results/frozen_d512_baselines.log | 111 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 results/frozen_d512_baselines.log (limited to 'results/frozen_d512_baselines.log') diff --git a/results/frozen_d512_baselines.log b/results/frozen_d512_baselines.log new file mode 100644 index 0000000..7a1a42d --- /dev/null +++ b/results/frozen_d512_baselines.log @@ -0,0 +1,111 @@ +=== FROZEN BASELINES d=512 === +Start: Sat Apr 25 10:42:45 PM CDT 2026 + d=512 L=4 s=42 (Sat Apr 25 10:42:45 PM CDT 2026) + DFA-shallow: 0.3458 + DFA-frozen: 0.3445 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=4 s=123 (Sat Apr 25 11:22:20 PM CDT 2026) + DFA-shallow: 0.3524 + DFA-frozen: 0.3506 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=4 s=456 (Sun Apr 26 12:01:58 AM CDT 2026) + DFA-shallow: 0.3516 + DFA-frozen: 0.3514 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=2 s=42 (Sun Apr 26 12:41:03 AM CDT 2026) + DFA-shallow: 0.3458 + DFA-frozen: 0.3452 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=2 s=123 (Sun Apr 26 01:20:51 AM CDT 2026) + DFA-shallow: 0.3524 + DFA-frozen: 0.3502 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=2 s=456 (Sun Apr 26 01:59:55 AM CDT 2026) + DFA-shallow: 0.3516 + DFA-frozen: 0.3514 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=8 s=42 (Sun Apr 26 02:39:45 AM CDT 2026) + DFA-shallow: 0.3458 + DFA-frozen: 0.3432 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=8 s=123 (Sun Apr 26 03:19:06 AM CDT 2026) + DFA-shallow: 0.3524 + DFA-frozen: 0.3505 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=8 s=456 (Sun Apr 26 03:58:23 AM CDT 2026) + DFA-shallow: 0.3516 + DFA-frozen: 0.3508 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=12 s=42 (Sun Apr 26 04:37:35 AM CDT 2026) + DFA-shallow: 0.3458 + DFA-frozen: 0.3435 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=12 s=123 (Sun Apr 26 05:17:07 AM CDT 2026) + DFA-shallow: 0.3524 + DFA-frozen: 0.3526 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) + d=512 L=12 s=456 (Sun Apr 26 05:56:51 AM CDT 2026) + DFA-shallow: 0.3516 + DFA-frozen: 0.3513 + +Compare to trainable 4-block ResMLP (3-seed): BP=0.6147 100ep / 0.585 30ep, DFA=0.306 100ep / 0.301 30ep + +Interpretation: + If DFA-frozen ≈ DFA-trainable: blocks are passengers, walk-back parallels ViT + If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT) +=== FROZEN BASELINES DONE (Sun Apr 26 06:36:08 AM CDT 2026) === -- cgit v1.2.3