results/resmlp_frozen_blocks_s456.log


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

Device: cuda:0, seed=456, epochs=100

=== BP shallow (ResMLP num_blocks=0), seed=456 ===
  n_params: 789770 (789770 trainable)
  [BP-shallow] ep 1: test_acc=0.3545
  [BP-shallow] ep 10: test_acc=0.3636
  [BP-shallow] ep 20: test_acc=0.3572
  [BP-shallow] ep 30: test_acc=0.3514
  [BP-shallow] ep 40: test_acc=0.3629
  [BP-shallow] ep 50: test_acc=0.3623
  [BP-shallow] ep 60: test_acc=0.3711
  [BP-shallow] ep 70: test_acc=0.3766
  [BP-shallow] ep 80: test_acc=0.3875
  [BP-shallow] ep 90: test_acc=0.3875
  [BP-shallow] ep 100: test_acc=0.3876
FINAL BP-shallow: 0.3876

=== BP frozen-blocks (ResMLP num_blocks=4, blocks frozen), seed=456 ===
  n_params: 1318154 (789770 trainable)
  [BP-frozen] ep 1: test_acc=0.3593
  [BP-frozen] ep 10: test_acc=0.3696
  [BP-frozen] ep 20: test_acc=0.3515
  [BP-frozen] ep 30: test_acc=0.3541
  [BP-frozen] ep 40: test_acc=0.3574
  [BP-frozen] ep 50: test_acc=0.3567
  [BP-frozen] ep 60: test_acc=0.3724
  [BP-frozen] ep 70: test_acc=0.3777
  [BP-frozen] ep 80: test_acc=0.3861
  [BP-frozen] ep 90: test_acc=0.3894
  [BP-frozen] ep 100: test_acc=0.3881
FINAL BP-frozen-blocks: 0.3881

=== DFA shallow (ResMLP num_blocks=0), seed=456 ===
  n_params: 789770 (789770 trainable)
  [DFA-shallow] ep 1: test_acc=0.3246
  [DFA-shallow] ep 10: test_acc=0.3453
  [DFA-shallow] ep 20: test_acc=0.3426
  [DFA-shallow] ep 30: test_acc=0.3498
  [DFA-shallow] ep 40: test_acc=0.3431
  [DFA-shallow] ep 50: test_acc=0.3549
  [DFA-shallow] ep 60: test_acc=0.3494
  [DFA-shallow] ep 70: test_acc=0.3534
  [DFA-shallow] ep 80: test_acc=0.3494
  [DFA-shallow] ep 90: test_acc=0.3507
  [DFA-shallow] ep 100: test_acc=0.3519
FINAL DFA-shallow: 0.3519

=== DFA frozen-blocks (ResMLP num_blocks=4, blocks frozen), seed=456 ===
  n_params: 1318154 (789770 trainable)
  [DFA-frozen] ep 1: test_acc=0.3283
  [DFA-frozen] ep 10: test_acc=0.3427
  [DFA-frozen] ep 20: test_acc=0.3425
  [DFA-frozen] ep 30: test_acc=0.3481
  [DFA-frozen] ep 40: test_acc=0.3329
  [DFA-frozen] ep 50: test_acc=0.3425
  [DFA-frozen] ep 60: test_acc=0.3519
  [DFA-frozen] ep 70: test_acc=0.3556
  [DFA-frozen] ep 80: test_acc=0.3507
  [DFA-frozen] ep 90: test_acc=0.3508
  [DFA-frozen] ep 100: test_acc=0.3510
FINAL DFA-frozen-blocks: 0.3510

=== ResMLP frozen/shallow baseline summary, seed=456 ===
  BP-shallow:    0.3876
  BP-frozen:     0.3881
  DFA-shallow:   0.3519
  DFA-frozen:    0.3510

Compare to trainable 4-block ResMLP (3-seed mean): BP=0.609, DFA=0.308

Interpretation:
  If DFA-frozen ≈ DFA-trainable (0.308): blocks are passengers, walk-back parallels ViT
  If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT)