Device: cuda:0, seed=42, epochs=100 === BP shallow (ResMLP num_blocks=0), seed=42 === n_params: 789770 (789770 trainable) [BP-shallow] ep 1: test_acc=0.3469 [BP-shallow] ep 10: test_acc=0.3740 [BP-shallow] ep 20: test_acc=0.3640 [BP-shallow] ep 30: test_acc=0.3514 [BP-shallow] ep 40: test_acc=0.3523 [BP-shallow] ep 50: test_acc=0.3618 [BP-shallow] ep 60: test_acc=0.3721 [BP-shallow] ep 70: test_acc=0.3734 [BP-shallow] ep 80: test_acc=0.3811 [BP-shallow] ep 90: test_acc=0.3883 [BP-shallow] ep 100: test_acc=0.3872 FINAL BP-shallow: 0.3872 === BP frozen-blocks (ResMLP num_blocks=4, blocks frozen), seed=42 === n_params: 1318154 (789770 trainable) [BP-frozen] ep 1: test_acc=0.3520 [BP-frozen] ep 10: test_acc=0.3668 [BP-frozen] ep 20: test_acc=0.3510 [BP-frozen] ep 30: test_acc=0.3530 [BP-frozen] ep 40: test_acc=0.3570 [BP-frozen] ep 50: test_acc=0.3626 [BP-frozen] ep 60: test_acc=0.3644 [BP-frozen] ep 70: test_acc=0.3766 [BP-frozen] ep 80: test_acc=0.3840 [BP-frozen] ep 90: test_acc=0.3847 [BP-frozen] ep 100: test_acc=0.3890 FINAL BP-frozen-blocks: 0.3890 === DFA shallow (ResMLP num_blocks=0), seed=42 === n_params: 789770 (789770 trainable) [DFA-shallow] ep 1: test_acc=0.3219 [DFA-shallow] ep 10: test_acc=0.3410 [DFA-shallow] ep 20: test_acc=0.3375 [DFA-shallow] ep 30: test_acc=0.3356 [DFA-shallow] ep 40: test_acc=0.3456 [DFA-shallow] ep 50: test_acc=0.3427 [DFA-shallow] ep 60: test_acc=0.3434 [DFA-shallow] ep 70: test_acc=0.3452 [DFA-shallow] ep 80: test_acc=0.3479 [DFA-shallow] ep 90: test_acc=0.3463 [DFA-shallow] ep 100: test_acc=0.3469 FINAL DFA-shallow: 0.3469 === DFA frozen-blocks (ResMLP num_blocks=4, blocks frozen), seed=42 === n_params: 1318154 (789770 trainable) [DFA-frozen] ep 1: test_acc=0.3255 [DFA-frozen] ep 10: test_acc=0.3376 [DFA-frozen] ep 20: test_acc=0.3414 [DFA-frozen] ep 30: test_acc=0.3434 [DFA-frozen] ep 40: test_acc=0.3422 [DFA-frozen] ep 50: test_acc=0.3399 [DFA-frozen] ep 60: test_acc=0.3422 [DFA-frozen] ep 70: test_acc=0.3493 [DFA-frozen] ep 80: test_acc=0.3474 [DFA-frozen] ep 90: test_acc=0.3448 [DFA-frozen] ep 100: test_acc=0.3460 FINAL DFA-frozen-blocks: 0.3460 === ResMLP frozen/shallow baseline summary, seed=42 === BP-shallow: 0.3872 BP-frozen: 0.3890 DFA-shallow: 0.3469 DFA-frozen: 0.3460 Compare to trainable 4-block ResMLP (3-seed mean): BP=0.609, DFA=0.308 Interpretation: If DFA-frozen ≈ DFA-trainable (0.308): blocks are passengers, walk-back parallels ViT If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT)