blob: 238c9554d03bf52316ad87efc9fc714e5f6ddafa (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
Device: cuda:0, seed=42, epochs=100
=== BP shallow (ResMLP num_blocks=0), seed=42 ===
n_params: 789770 (789770 trainable)
[BP-shallow] ep 1: test_acc=0.3469
[BP-shallow] ep 10: test_acc=0.3740
[BP-shallow] ep 20: test_acc=0.3640
[BP-shallow] ep 30: test_acc=0.3514
[BP-shallow] ep 40: test_acc=0.3523
[BP-shallow] ep 50: test_acc=0.3618
[BP-shallow] ep 60: test_acc=0.3721
[BP-shallow] ep 70: test_acc=0.3734
[BP-shallow] ep 80: test_acc=0.3811
[BP-shallow] ep 90: test_acc=0.3883
[BP-shallow] ep 100: test_acc=0.3872
FINAL BP-shallow: 0.3872
=== BP frozen-blocks (ResMLP num_blocks=4, blocks frozen), seed=42 ===
n_params: 1318154 (789770 trainable)
[BP-frozen] ep 1: test_acc=0.3520
[BP-frozen] ep 10: test_acc=0.3668
[BP-frozen] ep 20: test_acc=0.3510
[BP-frozen] ep 30: test_acc=0.3530
[BP-frozen] ep 40: test_acc=0.3570
[BP-frozen] ep 50: test_acc=0.3626
[BP-frozen] ep 60: test_acc=0.3644
[BP-frozen] ep 70: test_acc=0.3766
[BP-frozen] ep 80: test_acc=0.3840
[BP-frozen] ep 90: test_acc=0.3847
[BP-frozen] ep 100: test_acc=0.3890
FINAL BP-frozen-blocks: 0.3890
=== DFA shallow (ResMLP num_blocks=0), seed=42 ===
n_params: 789770 (789770 trainable)
[DFA-shallow] ep 1: test_acc=0.3219
[DFA-shallow] ep 10: test_acc=0.3410
[DFA-shallow] ep 20: test_acc=0.3375
[DFA-shallow] ep 30: test_acc=0.3356
[DFA-shallow] ep 40: test_acc=0.3456
[DFA-shallow] ep 50: test_acc=0.3427
[DFA-shallow] ep 60: test_acc=0.3434
[DFA-shallow] ep 70: test_acc=0.3452
[DFA-shallow] ep 80: test_acc=0.3479
[DFA-shallow] ep 90: test_acc=0.3463
[DFA-shallow] ep 100: test_acc=0.3469
FINAL DFA-shallow: 0.3469
=== DFA frozen-blocks (ResMLP num_blocks=4, blocks frozen), seed=42 ===
n_params: 1318154 (789770 trainable)
[DFA-frozen] ep 1: test_acc=0.3255
[DFA-frozen] ep 10: test_acc=0.3376
[DFA-frozen] ep 20: test_acc=0.3414
[DFA-frozen] ep 30: test_acc=0.3434
[DFA-frozen] ep 40: test_acc=0.3422
[DFA-frozen] ep 50: test_acc=0.3399
[DFA-frozen] ep 60: test_acc=0.3422
[DFA-frozen] ep 70: test_acc=0.3493
[DFA-frozen] ep 80: test_acc=0.3474
[DFA-frozen] ep 90: test_acc=0.3448
[DFA-frozen] ep 100: test_acc=0.3460
FINAL DFA-frozen-blocks: 0.3460
=== ResMLP frozen/shallow baseline summary, seed=42 ===
BP-shallow: 0.3872
BP-frozen: 0.3890
DFA-shallow: 0.3469
DFA-frozen: 0.3460
Compare to trainable 4-block ResMLP (3-seed mean): BP=0.609, DFA=0.308
Interpretation:
If DFA-frozen ≈ DFA-trainable (0.308): blocks are passengers, walk-back parallels ViT
If DFA-frozen << DFA-trainable: ResMLP DFA actually trains the blocks (interesting contrast with ViT)
|